Title :
A Method for the Construction of a Probabilistic Hierarchical Structure Based on a Statistical Analysis of a Large-scale Corpus
Author :
Terai, Asuka ; Liu, Bin ; Nakagawa, Masanori
Author_Institution :
Tokyo Inst. of Technol., Tokyo
Abstract :
The purpose of this study is to develop a method of constructing a probabilistic hierarchical structure based on a statistical analysis of a Japanese corpus using a combination of Kameya and Sato´s statistical language analysis and Rose´s model. First, the co-occurrence frequencies of adjectives and nouns are calculated from a Japanese corpus based on modification relations. Second, latent classes are extracted from a statistical language analysis of the cooccurrence data. Third, the centroid vectors of the latent classes are calculated from the analysis results and a probabilistic hierarchical structure of the latent classes is constructed by utilizing Rose´s model. Finally, the conditional probabilities of the categories given the latent classes are computed as the association probabilities of the concepts to the categories and the conditional probabilities of the categories given the concepts are computed as the association probabilities of the concepts to the categories.
Keywords :
computational linguistics; statistical analysis; Japanese corpus; conditional probabilities; cooccurrence frequencies; large-scale corpus; probabilistic hierarchical structure; statistical language analysis; Costs; Data mining; Frequency; Humans; Information analysis; Information technology; Large-scale systems; Natural languages; Probability; Statistical analysis;
Conference_Titel :
Semantic Computing, 2007. ICSC 2007. International Conference on
Conference_Location :
Irvine, CA
Print_ISBN :
978-0-7695-2997-4
DOI :
10.1109/ICSC.2007.60