DocumentCode :
3704169
Title :
Semi-Supervised Multiple Disambiguation
Author :
Kambiz Ghoorchian;Fatemeh Rahimian;Sarunas Girdzijauskas
Author_Institution :
Sch. of Electr. Eng., R. Inst. of Technol., Stockholm, Sweden
Volume :
2
fYear :
2015
Firstpage :
88
Lastpage :
95
Abstract :
Determining the true entity behind an ambiguous word is an NP-Hard problem known as Disambiguation. Previous solutions often disambiguate a single ambiguous mention across multiple documents. They assume each document contains only a single ambiguous word and a rich set of unambiguous context words. However, nowadays we require fast disambiguation of short texts (like news feeds, reviews or Tweets) with few context words and multiple ambiguous words. In this research we focus on Multiple Disambiguation (MD) in contrast to Single Disambiguation (SD). Our solution is inspired by a recent algorithm developed for SD. The algorithm categorizes documents by first, transferring them into a graph and then, clustering the graph based on its topological structure. We changed the graph-based document-modeling of the algorithm, to account for MD. Also, we added a new parameter that controls the resolution of the clustering. Then, we used a supervised sampling approach for merging the clusters when appropriate. Our algorithm, compared with the original model, achieved 10% higher quality in terms of F1-Score using only 4% sampling from the dataset.
Keywords :
"Context","Clustering algorithms","Image color analysis","Electrical engineering","Electronic mail","Algorithm design and analysis","Skin"
Publisher :
ieee
Conference_Titel :
Trustcom/BigDataSE/ISPA, 2015 IEEE
Type :
conf
DOI :
10.1109/Trustcom.2015.566
Filename :
7345479
Link To Document :
بازگشت