DocumentCode :
2429902
Title :
A semantic information retrieval model for focused crawling
Author :
Osuna-Ontiveros, Daniel ; Lopez-Arevalo, Ivan ; Sosa-Sosa, Victor
Author_Institution :
Inf. Technol. Lab., CINVESTAV - IPN, Tamaulipas, Mexico
fYear :
2011
fDate :
19-21 Oct. 2011
Firstpage :
285
Lastpage :
289
Abstract :
Nowadays, users of computers store a lot of information on the Web. For this reason, the Internet is a good place to search information on any subject. Due to the large amount of information, some users would search information on specific websites that they consider interesting (e.g. www.wikipedia.com, news sites, etc.). Traditional models represent webpages by using the frequency of terms or the structure of links in order to assign weight to terms of webpages. This paper presents a semantic information retrieval to represent specific websites. This proposal integrates text mining algorithms based on natural language processing and traditional representation models with the aim to improve the quality of webpages recovered by searching. Each webpage of the website is represented as a vector of topics, instead of a vector of terms. In a similar way, the query is represented as a vector of topics. Thus, a similarity measure can be applied over this vector and vectors of documents to retrieve the most relevant documents.
Keywords :
Internet; Web sites; document handling; information retrieval; natural language processing; search problems; Internet; Web sites; document vector; focused crawling; natural language processing; search information; semantic information retrieval model; text mining algorithms; web page representation; Computational modeling; Google; Information retrieval; Mathematical model; Semantics; Text mining; Vectors; Semantic Web; Semantic representation model; Web search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Next Generation Web Services Practices (NWeSP), 2011 7th International Conference on
Conference_Location :
Salamanca
Print_ISBN :
978-1-4577-1125-1
Type :
conf
DOI :
10.1109/NWeSP.2011.6088192
Filename :
6088192
Link To Document :
بازگشت