DocumentCode :
2369909
Title :
On precision and recall of multi-attribute data extraction from semistructured sources
Author :
Yang, Guizhen ; Mukherjee, Saikat ; Ramakrishnan, I.V.
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Buffalo, NY, USA
fYear :
2003
fDate :
19-22 Nov. 2003
Firstpage :
395
Lastpage :
402
Abstract :
Machine learning techniques for data extraction from semistructured sources exhibit different precision and recall characteristics. However to date the formal relationship between learning algorithms and their impact on these two metrics remains unexplored. We propose a formalization of precision and recall of extraction and investigates the complexity-theoretic aspects of learning algorithms for multiattribute data extraction based on this formalism. We show that there is a tradeoff between precision/recall of extraction and computational efficiency and present experimental results to demonstrate the practical utility of these concepts in designing scalable data extraction algorithms for improving recall without compromising on precision.
Keywords :
Internet; computational complexity; data mining; learning (artificial intelligence); Internet; complexity-theoretic aspects; machine learning algorithms; multiattribute data extraction; semistructured sources; Animals; Computational efficiency; Computer science; Data engineering; Data mining; Hospitals; Labeling; Machine learning; Machine learning algorithms; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
Type :
conf
DOI :
10.1109/ICDM.2003.1250945
Filename :
1250945
Link To Document :
بازگشت