DocumentCode :
3678059
Title :
Entity Linking and Name Disambiguation in Chinese Micro-Blogs
Author :
Li Li;YunLong Guo;Yu Xiang;Xiao Xu;WeiGang Zeng
Author_Institution :
Dept. of Comput. Sci., Southwest Univ., Chongqing, China
fYear :
2014
Firstpage :
838
Lastpage :
843
Abstract :
The amounts of data on social networks have been increasing sharply with the development of Web 2.0. Extracting social media content for the construction and extension of the knowledge base, mainly through remove ambiguities of entities from microblogs, has attracted attention from both academia and industry. Understanding Chinese microblogs is challenging because of the inherent features of Chinese language, the in- formal usage of the language and the wide variety of content it covers. In this paper, we focus on entity disambiguation in Chinese microblogs. A Web crawler is first developed to collect relevant information from both Baidu Encyclopedia and Chinese Wikipedia. The creation of the entity dictionary is based on the exiting mapping rules obtained from Baidu search engine by the newly developed crawler. A novel disambiguation strategy including a clustering algorithm based on Newman fast algorithm is proposed along with a label disambiguation algorithm. We then evaluate our approach on the Chinese microblog data set. The experimental result achieved 89.34% in terms of accuracy, which is 4.35% better than the best result of 84.99% (of all participating teams). Our approach is promising in identifying entity links and discovering the potential links in Chinese microblogs.
Keywords :
"Clustering algorithms","Encyclopedias","Dictionaries","Conferences","Joining processes","Knowledge based systems","Accuracy"
Publisher :
ieee
Conference_Titel :
Ubiquitous Intelligence and Computing, 2014 IEEE 11th Intl Conf on and IEEE 11th Intl Conf on and Autonomic and Trusted Computing, and IEEE 14th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UTC-ATC-ScalCom)
Type :
conf
DOI :
10.1109/UIC-ATC-ScalCom.2014.142
Filename :
7307051
Link To Document :
بازگشت