DocumentCode :
2968897
Title :
Achieving Classification and Clustering in One Shot Lesson Learned from Labeling Anonymous Datasets
Author :
Ahmed, Emdad
Author_Institution :
Dept. of Comput. Sci., Integration Inf. Lab., Wayne State Univ., Detroit, MI, USA
fYear :
2010
fDate :
22-24 Sept. 2010
Firstpage :
228
Lastpage :
231
Abstract :
This paper presents an algorithm LadsComplete which is able to automatically assign label for HTML tabular web data, depending on syntactical similarities between elements of the table. We categorize columns into three types: Disjoint Set Column (DSC), Repeated Prefix / Suffix Column (RPS) and Numeric Column (NUM). For labeling DSC column, our method rely on hits count from web search engine. Experimental results from large number of sites in different domains and subjective evaluation show that the proposed algorithm works fairly well. We hypothesize that our algorithm LadsComplete will do a good job for autonomous label assignment. We are NOT aware of any such prior work that address to connect two orthogonal research viz. wrapper generation and label extraction for value added services such as online comparison shopping.
Keywords :
Internet; pattern classification; pattern clustering; search engines; LadsComplete algorithm; Web search engine; anonymous dataset labeling; disjoint set column type; numeric column type; pattern classification; pattern clustering; repeated prefix-suffix column type; Books; Data mining; Engines; HTML; Labeling; Motion pictures; Web search; HTML Table; Hidden Web; Web Form; Wrapper;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on
Conference_Location :
Pittsburgh, PA
Print_ISBN :
978-1-4244-7912-2
Electronic_ISBN :
978-0-7695-4154-9
Type :
conf
DOI :
10.1109/ICSC.2010.81
Filename :
5629134
Link To Document :
بازگشت