DocumentCode :
2548102
Title :
A word stemming algorithm for the Spanish language
Author :
Honrado, A. ; Leon, Ruben ; O´Donnel, R. ; Sinclair, Duncan
Author_Institution :
Lab. de Linguistics Inf., Univ. Autonoma de Madrid, Spain
fYear :
2000
fDate :
2000
Firstpage :
139
Lastpage :
145
Abstract :
The paper describes a word stemming algorithm for the Spanish language. Experiments in document retrieval regarding English text suggest that word stemming based on morphological analysis does not generally or consistently outperform ad-hoc hand tuned algorithms such as that proposed by M. Porter (1980). It is difficult to produce a Porter style algorithm for a romantic language such as Spanish, however due to the greater grammatical complexity and due to the fact that inflection often causes changes to the root of words, not just to their endings (as is mostly the case with English). In general terms, the difficulty consists of producing an algorithm which can cope with the additional complexity of Spanish morphology whilst preserving the simplicity of a Porter style algorithm. One such algorithm is presented. The algorithm combines dictionary look-ups with some 300 stemming and intermediate reduction rules
Keywords :
dictionaries; natural languages; table lookup; word processing; English text; Porter style algorithm; Spanish language; Spanish morphology; ad-hoc hand tuned algorithms; dictionary look-ups; document retrieval; grammatical complexity; inflection; intermediate reduction rules; morphological analysis; word roots; word stemming algorithm; Algorithm design and analysis; Dictionaries; Information retrieval; Information science; Morphology; Natural languages; Uninterruptible power systems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on
Conference_Location :
A Curuna
Print_ISBN :
0-7695-0746-8
Type :
conf
DOI :
10.1109/SPIRE.2000.878189
Filename :
878189
Link To Document :
بازگشت