Title of article :
Estimation of quality of service in spelling correction using Kullback–Leibler divergence
Author/Authors :
Varol، نويسنده , , Cihan and Bayrak، نويسنده , , Coskun، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2011
Pages :
6
From page :
6307
To page :
6312
Abstract :
In order to assist the companies dealing with data preparation problems, an approach is developed to handle the dirty data. Cleaning the customer records and producing the desired results require different set of effective tools and sequences such as the near miss strategy and phonetic structure and edit-distance to provide a suggestion table. The selection of the best match is verified and validated by the frequency of presence in the 20th century’s Census Bureau statistics. Although, the conducted experiments resulted in better correction rates over the well known ASPELL, JSpell HTML and Ajax Spell Checkers, another remaining challenge is to introduce an estimation of quality factor for our Personal Name Recognizing Strategy Model (PNRS) to distinguish between submitted original names and suggested name estimations from PNRS. Here, we implement a statistical distance metrics for a quality measure by computing the Kullback–Leibler distance (K–L). K–L distance can be used to measure this distance between probability density function of original names and probability density function of suggested names estimated from the PNRS to assess/validate to what degree our edit distance strategy has been successful in correcting names. All submitted names as inputs of the PNRS model were taken in a maximum edit distance of 2 with respect to the original name. Kullback–Leibler distance will be an indicator of name recognizing quality.
Keywords :
Natural language , Edit-distance , census , Kullback–Leibler divergence , Phonetic Strategy , Spelling correction
Journal title :
Expert Systems with Applications
Serial Year :
2011
Journal title :
Expert Systems with Applications
Record number :
2349320
Link To Document :
بازگشت