Title :
Mining the whole set of person names from the Tibetan Web
Author :
Jiang, Tao ; Yu, Hongzhi
Author_Institution :
State Key Lab. of Nat. Languages, Inf. Technol., Northwest Univ. for Nat., Lanzhou, China
Abstract :
Along with the rapid development of Tibetan language information and Tibetan Web in recent years, personal information becomes a main focus of researchers. While, due to the complexity of the Web information, the extraction of person names is difficult, especially in Tibetan Web. This paper presents a rule-based approach, which is based on the case-auxiliary words and lexicon, to extract the person name from the Tibetan Web. According to the grammar information and statistical rules, we have developed a person name extraction system, which is used for the Tibetan Web. We design a series of experiments to evaluate the performance of the system, and the evaluation results are satisfactory.
Keywords :
Internet; data mining; document handling; knowledge based systems; Tibetan Web; Tibetan language information; case-auxiliary lexicon; case-auxiliary words; grammar information; person names extraction; person names mining; personal information; rule-based approach; statistical rules; Data mining; Information technology; Internet; Laboratories; Probability; Search engines; Statistical analysis; Statistics; Text processing; Training data; Tibetan language; Web; case-auxiliary words; person name extraction;
Conference_Titel :
Computer Science and Information Technology, 2009. ICCSIT 2009. 2nd IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-4519-6
Electronic_ISBN :
978-1-4244-4520-2
DOI :
10.1109/ICCSIT.2009.5234752