• Title of article

    Non-Word Identification or Spell Checking Without a Dictionary

  • Author/Authors

    Donald C. Comeau and W. John Wilbur، نويسنده ,

  • Issue Information
    ماهنامه با شماره پیاپی سال 2004
  • Pages
    9
  • From page
    169
  • To page
    177
  • Abstract
    MEDLINE is a collection of more than 12 million references and abstracts covering recent life science literature. With its continued growth and cutting-edge terminology, spell-checking with a traditional lexicon based approach requires significant additional manual followup. In this work, an internal corpus based context quality rating , frequency, and simple misspelling transformations are used to rank words from most likely to be misspellings to least likely. Eleven-point average precisions of 0.891 have been achieved within a class of 42,340 all alphabetic words having an score less than 10. Our models predict that 16,274 or 38% of these words are misspellings. Based on test data, this result has a recall of 79% and a precision of 86%. In other words, spell checking can be done by statistics instead of with a dictionary. As an application we examine the time history of low words in MEDLINE titles and abstracts.
  • Journal title
    Journal of the American Society for Information Science and Technology
  • Serial Year
    2004
  • Journal title
    Journal of the American Society for Information Science and Technology
  • Record number

    843785