Title of article
Non-Word Identification or Spell Checking Without a Dictionary
Author/Authors
Donald C. Comeau and W. John Wilbur، نويسنده ,
Issue Information
ماهنامه با شماره پیاپی سال 2004
Pages
9
From page
169
To page
177
Abstract
MEDLINE is a collection of more than 12 million references
and abstracts covering recent life science literature.
With its continued growth and cutting-edge terminology,
spell-checking with a traditional lexicon based
approach requires significant additional manual followup.
In this work, an internal corpus based context quality
rating , frequency, and simple misspelling transformations
are used to rank words from most likely to be
misspellings to least likely. Eleven-point average precisions
of 0.891 have been achieved within a class of
42,340 all alphabetic words having an score less than
10. Our models predict that 16,274 or 38% of these
words are misspellings. Based on test data, this result
has a recall of 79% and a precision of 86%. In other
words, spell checking can be done by statistics instead
of with a dictionary. As an application we examine the
time history of low words in MEDLINE titles and
abstracts.
Journal title
Journal of the American Society for Information Science and Technology
Serial Year
2004
Journal title
Journal of the American Society for Information Science and Technology
Record number
843785
Link To Document