Title :
Relation discovery by named entity recognition from Tibetan websites
Author :
Yu, Hongzhi ; Jiang, Tao ; Zhang, Bing ; Chen, Xinyi
Author_Institution :
State Key Lab. of Nat. Languages Inf. Technol., Northwest Univ. for Nat., Lanzhou, China
Abstract :
Discovering the significant relations embedded in the Web pages would be very useful for community discovery. In this paper, we propose an unsupervised method for relation discovery from Tibetan Web pages, which is based on co-occurrences of named entities in the pages. In order to find the relation, a rule-based named entity recognition algorithm has been proposed. Our experiment shows that the algorithm has got a high precision and recall by using 30.2 megabyte plain text from three large Tibetan Web sites. And we also give a relation strength formula combining the co-occurrence frequency and personal information, thus the relation in the Web pages can easily be fond according to the value of the relation strength.
Keywords :
data mining; information retrieval; natural language processing; social networking (online); unsupervised learning; Tibetan Web pages; Tibetan Web sites; community network discovery; information retrieval; named entity co-occurrence frequency; personal information; relation discovery; relation strength formula; rule-based named entity recognition algorithm; social network discovery; unsupervised method; Data processing; Encoding; Frequency; Information technology; Internet; Java; Laboratories; Libraries; Markup languages; Web pages;
Conference_Titel :
Web Society, 2009. SWS '09. 1st IEEE Symposium on
Conference_Location :
Lanzhou
Print_ISBN :
978-1-4244-4157-0
Electronic_ISBN :
978-1-4244-4158-7
DOI :
10.1109/SWS.2009.5271789