مرکز منطقه ای اطلاع رساني علوم و فناوري - Mining the whole set of person names from the Tibetan Web

DocumentCode :

3315622

Title :

Mining the whole set of person names from the Tibetan Web

Author :

Jiang, Tao ; Yu, Hongzhi

Author_Institution :

State Key Lab. of Nat. Languages, Inf. Technol., Northwest Univ. for Nat., Lanzhou, China

fYear :

2009

fDate :

8-11 Aug. 2009

Firstpage :

Lastpage :

Abstract :

Along with the rapid development of Tibetan language information and Tibetan Web in recent years, personal information becomes a main focus of researchers. While, due to the complexity of the Web information, the extraction of person names is difficult, especially in Tibetan Web. This paper presents a rule-based approach, which is based on the case-auxiliary words and lexicon, to extract the person name from the Tibetan Web. According to the grammar information and statistical rules, we have developed a person name extraction system, which is used for the Tibetan Web. We design a series of experiments to evaluate the performance of the system, and the evaluation results are satisfactory.

Keywords :

Internet; data mining; document handling; knowledge based systems; Tibetan Web; Tibetan language information; case-auxiliary lexicon; case-auxiliary words; grammar information; person names extraction; person names mining; personal information; rule-based approach; statistical rules; Data mining; Information technology; Internet; Laboratories; Probability; Search engines; Statistical analysis; Statistics; Text processing; Training data; Tibetan language; Web; case-auxiliary words; person name extraction;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Science and Information Technology, 2009. ICCSIT 2009. 2nd IEEE International Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4244-4519-6

Electronic_ISBN :

978-1-4244-4520-2

Type :

conf

DOI :

10.1109/ICCSIT.2009.5234752

Filename :

5234752

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3315622