مرکز منطقه ای اطلاع رساني علوم و فناوري - Word discrimination based on bigram co-occurrences

DocumentCode :

1580315

Title :

Word discrimination based on bigram co-occurrences

Author :

El-Nasan, Adnan ; Veeramachaneni, Sriharsha ; Nagy, George

Author_Institution :

DocLab, Rensselaer Polytech. Inst., Troy, NY, USA

fYear :

2001

fDate :

6/23/1905 12:00:00 AM

Firstpage :

149

Lastpage :

153

Abstract :

Very few pairs of English words share exactly the same letter bigrams. This linguistic property can be exploited to bring lexical context into the classification stage of a word recognition system. The lexical n-gram matches between every word in a lexicon and a subset of reference words can be precomputed. If a match function can detect matching segments of at least n-gram length from the feature representation of words, then an unknown word can be recognized by determining the subset of reference words having an n-gram match at the feature level with the unknown word. We show that with a reasonable number of reference words, bigrams represent the best compromise between the recall ability of single letters and the precision of trigrams. Our simulations indicate that using a longer reference list can compensate errors in feature extraction. The algorithm is fast enough, even with a slow processor, for human-computer interaction

Keywords :

document image processing; feature extraction; image matching; linguistics; optical character recognition; English words; OCR; bigram co-occurrences; classification; feature extraction; feature representation; human-computer interaction; lexical context; lexical n-gram matches; linguistic property; reference list; segment matching; word discrimination; word recognition system; Degradation; Dictionaries; Entropy; Feature extraction; Matrix converters; Optical character recognition software; Probability; Statistics; Viterbi algorithm; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on

Conference_Location :

Seattle, WA

Print_ISBN :

0-7695-1263-1

Type :

conf

DOI :

10.1109/ICDAR.2001.953773

Filename :

953773

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1580315