Title of article :
Non-Word Identification or Spell Checking Without a Dictionary
Author/Authors :
Donald C. Comeau and W. John Wilbur، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2004
Pages :
9
From page :
169
To page :
177
Abstract :
MEDLINE is a collection of more than 12 million references and abstracts covering recent life science literature. With its continued growth and cutting-edge terminology, spell-checking with a traditional lexicon based approach requires significant additional manual followup. In this work, an internal corpus based context quality rating , frequency, and simple misspelling transformations are used to rank words from most likely to be misspellings to least likely. Eleven-point average precisions of 0.891 have been achieved within a class of 42,340 all alphabetic words having an score less than 10. Our models predict that 16,274 or 38% of these words are misspellings. Based on test data, this result has a recall of 79% and a precision of 86%. In other words, spell checking can be done by statistics instead of with a dictionary. As an application we examine the time history of low words in MEDLINE titles and abstracts.
Journal title :
Journal of the American Society for Information Science and Technology
Serial Year :
2004
Journal title :
Journal of the American Society for Information Science and Technology
Record number :
843785
Link To Document :
بازگشت