Title :
Extracting Document Semantics for Semantic Header
Author :
Wang, Tao ; Desai, Bipin C.
Author_Institution :
Dept. of Comput. Sci., Concordia Univ., Montreal, Que.
Abstract :
Accurate indexing and cataloguing of electronic information on the Internet is the foundation for precise retrieval. Most existing search systems, however, tend to generate misses and false hits due to the fact that they attempt to match the specified search terms in the target information resources without considering context. It is clear that using traditional keyword-based methods for representing semantics of information items has become a major obstacle to high precision. The notion of semantic header proposed previously captures the semantics of information resources that takes into account the logical structure of an information item. The contents of semantic header may be used by modern search systems to help locate an appropriate information item with minimum effort. In this paper, we present a system, called automatic semantic header generator (ASHG), for generating five key components of the semantic header. Finally, we evaluate the system with two sets of documents, and analyze the corresponding results
Keywords :
Internet; information retrieval; text analysis; Internet; automatic semantic header generator; document semantic extraction; text categorization; Computer science; Data mining; Indexing; Information analysis; Information resources; Information retrieval; Internet; Search engines; Text categorization; Web search; Semantics extraction; meta-data structure; text categorization;
Conference_Titel :
Electrical and Computer Engineering, 2006. CCECE '06. Canadian Conference on
Conference_Location :
Ottawa, Ont.
Print_ISBN :
1-4244-0038-4
Electronic_ISBN :
1-4244-0038-4
DOI :
10.1109/CCECE.2006.277719