Title :
Building A Document Class Hierarchy for Obtaining More Proper Bibliographies from Web
Author :
Wang, Daling ; Yu, Ge ; Hu, Minghan ; Bao, Yubin ; Zhang, Meng
Author_Institution :
Sch. of Inf. Sci. & Eng., Northeastern Univ., Shenyang
Abstract :
In order for researchers in scientific and technological fields to find more proper information resources on Web, an auxiliary search structure is proposed, which is a class hierarchy of documents built based on the keywords of the documents. To cover the contents of the document properly, the keywords are extracted by means of mining maximal sequential frequent phrases. In this paper, the concept of maximal sequential frequent phrase is defined, and the corresponding mining algorithm is designed and implemented. The experiments show that keywords extraction using maximal sequential frequent phrase has better F-measure than that of using traditional TFIDF weight. Moreover, compared with previous works, our extended class hierarchy tree represents a relationship hierarchy either between keywords themselves or between keywords and documents, by which the queries on different professional levels can be supported
Keywords :
Internet; data mining; search engines; text analysis; TFIDF weight; World Wide Web; auxiliary search structure; bibliographies; document class hierarchy; document keywords; information resources; keyword extraction; maximal sequential frequent phrase mining; Algorithm design and analysis; Bibliographies; Books; Data mining; Information resources; Information science; Internet; Proposals; Search engines; Writing;
Conference_Titel :
Web Information Retrieval and Integration, 2005. WIRI '05. Proceedings. International Workshop on Challenges in
Conference_Location :
Tokyo
Print_ISBN :
0-7695-2414-1
DOI :
10.1109/WIRI.2005.13