Title :
Evolutionary learning of Web-document structure for information retrieval
Author :
Kim, Sun ; Zhang, Byoung-Tak
Author_Institution :
Dept. of Comput. Sci. & Eng., Seoul Nat. Univ., South Korea
Abstract :
Web documents have a number of tags indicating the structure of documents. The tag information can be utilized to improve the performance of document retrieval systems. The authors propose an approach to retrieve Web documents using HTML tags and then use a genetic algorithm to adapt the tag weights. This method uses a modified similarity measure based on the tag weights. A genetic learning method is used to select the tags for retrieval and get the optimal tag weights. We evaluate our method via experiments on conference pages and TREC document sets. The experimental results show that the tag weights are well trained by the proposed algorithm in accordance with the importance factors for retrieval. The proposed method has achieved about 10% improvement in retrieval accuracy
Keywords :
document handling; genetic algorithms; hypermedia markup languages; information resources; information retrieval; HTML tags; TREC document sets; Web document retrieval; Web-document structure; conference pages; document retrieval systems; evolutionary learning; genetic algorithm; genetic learning method; importance factors; information retrieval; modified similarity measure; optimal tag weights; retrieval accuracy; tag information; tag weights; Artificial intelligence; Computer science; Genetic algorithms; HTML; Information retrieval; Learning systems; Search engines; Sun; Testing; Web search;
Conference_Titel :
Evolutionary Computation, 2001. Proceedings of the 2001 Congress on
Conference_Location :
Seoul
Print_ISBN :
0-7803-6657-3
DOI :
10.1109/CEC.2001.934334