Title :
A heuristic approach for recognizing a document´s language used for the Internet search engine GETESS
Author :
Dusterhoft, A. ; Gröticke, S.
Author_Institution :
Dept. of Comput. Sci., Rostock Univ., Germany
Abstract :
The authors illustrate how Internet documents can be automatically analyzed in order to identify the document´s language. This language knowledge is then used for the Internet search engine, GETESS. The aim of the language classification heuristics is to ensure that documents with the same content, but different languages (e.g. in German and English), will not simultaneously be presented to the user as search results. The GETESS search engine only provides the results in the language relevant to the user. Consequently, the search-result set is narrower and more appropriately fits the needs of the user
Keywords :
Internet; document handling; information retrieval; linguistics; search engines; English; GETESS; German; Internet documents; Internet search engine; document language recognition; heuristic approach; language classification heuristics; language knowledge; search results; search-result set; user needs; Computer architecture; Computer graphics; Computer science; Databases; Information analysis; Knowledge representation; Natural languages; Ontologies; Search engines; Web and internet services;
Conference_Titel :
Database and Expert Systems Applications, 2000. Proceedings. 11th International Workshop on
Conference_Location :
London
Print_ISBN :
0-7695-0680-1
DOI :
10.1109/DEXA.2000.875016