Title :
Research on web topic detection based on domain lexicon
Author :
Zhao Zhibin ; Jia Yanfeng ; Bao Yubin
Author_Institution :
Sch. of Inf. Sci. & Eng., Northeastern Univ., Shenyang, China
Abstract :
Web topic detection is a crucial prerequisite to web-based data integration and also a key component for Vertical Search Engine. So, it attracts much attention from not only the industry but also the literature. In this paper, we proposed a domain-lexicon-based framework for Web topic detection. In our framework, we extracted the topical features from the web page first. Next, we employed Vector Space Model(VSM) and Support Vector Machine (SVM) to compute the topical relevance between the Web page features and the domain that the user prefers so as to conclude whether the web page satisfies the user´s request. Vector Space Model is suitable for the domains where the corresponding domain lexicons need to be updated frequently. Oppositely, Support Vector Machine is suitable for the domains where the corresponding domain lexicons are relatively unchangeable. Moreover, in this work we also explored the mechanism of domain lexicon updating, which can guarantee the accuracy and freshness of the domain lexicon. Finally, we conducted extensive experiment to test our framework and analyze how the domain lexicon affects the judgement result.
Keywords :
Internet; data integration; feature extraction; relevance feedback; search engines; support vector machines; text analysis; SVM; VSM; Web page features; Web topic detection; Web-based data integration; domain lexicon updating mechanism; domain lexicon-based framework; support vector machine; topical feature extraction; topical relevance computation; user preference; vector space model; vertical search engine; Data integration; Educational institutions; Electronic mail; Feature extraction; Support vector machines; Vectors; Web pages; data integration; domain lexicon; text classification; topic detection; vertical search;
Conference_Titel :
Control and Decision Conference (CCDC), 2013 25th Chinese
Conference_Location :
Guiyang
Print_ISBN :
978-1-4673-5533-9
DOI :
10.1109/CCDC.2013.6561583