A heuristic approach for recognizing a document´s language used for the Internet search engine GETESS

Author

Dusterhoft, A. ; Gröticke, S.

Author_Institution

Dept. of Comput. Sci., Rostock Univ., Germany

fYear

2000

fDate

2000

Firstpage

133

Lastpage

137

Abstract

The authors illustrate how Internet documents can be automatically analyzed in order to identify the document´s language. This language knowledge is then used for the Internet search engine, GETESS. The aim of the language classification heuristics is to ensure that documents with the same content, but different languages (e.g. in German and English), will not simultaneously be presented to the user as search results. The GETESS search engine only provides the results in the language relevant to the user. Consequently, the search-result set is narrower and more appropriately fits the needs of the user

Keywords

Internet; document handling; information retrieval; linguistics; search engines; English; GETESS; German; Internet documents; Internet search engine; document language recognition; heuristic approach; language classification heuristics; language knowledge; search results; search-result set; user needs; Computer architecture; Computer graphics; Computer science; Databases; Information analysis; Knowledge representation; Natural languages; Ontologies; Search engines; Web and internet services;

fLanguage

English

Publisher

ieee

Conference_Titel

Database and Expert Systems Applications, 2000. Proceedings. 11th International Workshop on

Conference_Location

London

ISSN

1529-4188

Print_ISBN

0-7695-0680-1

Type

conf

DOI

10.1109/DEXA.2000.875016

Filename

875016