Title :
Weighted hybrid features to resolve mixed entities
Author :
Lee, Ingyu ; On, Byung-Won
Author_Institution :
Troy Univ., Troy, AL, USA
Abstract :
With the popularity of Internet, tremendous amount of unstructured document information is available to access. Extracting related information from huge unstructured documents is a very difficult task. Especially, confusion can occur by synonym and polysemy, miss spelling, abbreviation, etc. To resolve those confusion is known as an Entity Resolution problem. Clustering algorithms have been popularly used to resolve mixed entities. However, most researches focus on one feature of an entity such as co-author lists or paper titles. In this paper, we are proposing a weighted hybrid feature scheme to distinguish mixed entities among unstructured documents. Experimental results show that weighted hybrid approach improves the accuracy and efficiency.
Keywords :
Internet; document handling; pattern clustering; Internet; abbreviation; clustering algorithms; entity resolution problem; miss spelling; mixed entities; polysemy; synonym; unstructured document information; weighted hybrid features; Accuracy; Cadaver; Clustering algorithms; Educational institutions; Matrix converters; Terminology; Vectors; data mining; feature selections; mixed entity resolution; web document clustering;
Conference_Titel :
Digital Information Management (ICDIM), 2011 Sixth International Conference on
Conference_Location :
Melbourn, QLD
Print_ISBN :
978-1-4577-1538-9
DOI :
10.1109/ICDIM.2011.6093351