DocumentCode :
2000070
Title :
Anatomy of a News Archive and Search Engine (Optimized for Persian Web)
Author :
Khalifehsoltani, Sayed Nasir ; Vahdani, Ali ; Moallemi, Reza
Author_Institution :
Dept. of Comput. Eng., SheikhBahaee Univ., Isfahan
fYear :
2009
fDate :
27-29 April 2009
Firstpage :
1361
Lastpage :
1366
Abstract :
News search engines are a class of search engines which professionally monitor the web news. These engines usually provide their contents through extraction of news feeds. But news feeds are not fully supported by all news sources, especially the Persian ones. Another way is indexing the content of news pages where the results are less adequately accurate due to the misrecognition of news structure. In this article we offer the architecture of a news search engine which extracts, archives structured news content and then performs complementary processes such as indexing and classifying of news which has been optimized for Persian language. Using the structured text of news, we reached higher precision in complementary processes.
Keywords :
indexing; information retrieval; search engines; Persian Web; Persian language; Web news; news archive anatomy; news feeds extraction; news pages content indexing; news search engines; news structure misrecognition; Anatomy; Computerized monitoring; Cost accounting; Data mining; Feeds; Indexing; Information technology; Resource description framework; Search engines; XML; Automatic News Classifying; Information Extraction; News Archiving; News Search Engine; Text Indexing; Web Page Processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology: New Generations, 2009. ITNG '09. Sixth International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-3770-2
Electronic_ISBN :
978-0-7695-3596-8
Type :
conf
DOI :
10.1109/ITNG.2009.264
Filename :
5070816
Link To Document :
بازگشت