DocumentCode :
2506911
Title :
A heuristic approach for recognizing a document´s language used for the Internet search engine GETESS
Author :
Dusterhoft, A. ; Gröticke, S.
Author_Institution :
Dept. of Comput. Sci., Rostock Univ., Germany
fYear :
2000
fDate :
2000
Firstpage :
133
Lastpage :
137
Abstract :
The authors illustrate how Internet documents can be automatically analyzed in order to identify the document´s language. This language knowledge is then used for the Internet search engine, GETESS. The aim of the language classification heuristics is to ensure that documents with the same content, but different languages (e.g. in German and English), will not simultaneously be presented to the user as search results. The GETESS search engine only provides the results in the language relevant to the user. Consequently, the search-result set is narrower and more appropriately fits the needs of the user
Keywords :
Internet; document handling; information retrieval; linguistics; search engines; English; GETESS; German; Internet documents; Internet search engine; document language recognition; heuristic approach; language classification heuristics; language knowledge; search results; search-result set; user needs; Computer architecture; Computer graphics; Computer science; Databases; Information analysis; Knowledge representation; Natural languages; Ontologies; Search engines; Web and internet services;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications, 2000. Proceedings. 11th International Workshop on
Conference_Location :
London
ISSN :
1529-4188
Print_ISBN :
0-7695-0680-1
Type :
conf
DOI :
10.1109/DEXA.2000.875016
Filename :
875016
Link To Document :
بازگشت