Title :
Class-Based Smoothing to Estimate the Probability of Domain Terms
Author :
Cai, Xiaobai ; Fan, Xiaozhong
Author_Institution :
Beijing Inst. of Technol., Beijing
Abstract :
This paper proposes a method to estimate the probability of a special kind of domain term, namely the probability of an anatomy noun appearing as a part or modifier of a disease named phrase, which is used for the sparse data smoothing of disease named phrase recognition. The method is to estimate the probabilities in terms of senses from a semantic hierarchy, and exploit the fact that the terms can be grouped into classes based on interrelated semantic senses. The class-based smoothing re-creates terms co-occurrence frequencies based on the information provided by a semantic hierarchy, in order to estimate the frequencies of candidate string occurring in an argument position. In this paper, the semantic hierarchy comes from the modularizing or partitioning of anatomy ontology. The modularizing method is to extract maximum spanning sub-trees, under restrictions, from the ontology that expresses foundational anatomical objects and relations. Through the partitioning, some sub-models are extracted. The sub-models form the foundation of the semantic hierarchy. A procedure is carried out that makes a "tree cut" model on the hierarchy structure as a back-off model to estimate probability distribution of terms. The determinative criterion of the "tree cut" is introduced according to chi-squared statistic and freedom degree two parameters.
Keywords :
medical computing; semantic networks; smoothing methods; anatomy ontology; class-based smoothing; domain terms; phrase recognition; semantic hierarchy; sparse data smoothing; tree cut model; Anatomy; Computer aided instruction; Data mining; Diseases; Frequency estimation; Natural language processing; Ontologies; Probability distribution; Smoothing methods; Taxonomy;
Conference_Titel :
Complex Medical Engineering, 2007. CME 2007. IEEE/ICME International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-1077-4
Electronic_ISBN :
978-1-4244-1078-1
DOI :
10.1109/ICCME.2007.4381753