• DocumentCode
    116055
  • Title

    Word level correction in Gujarati document using probabilistic approach

  • Author

    Patel, Dhruv B. ; Goswami, Mukesh M.

  • Author_Institution
    Dept. of Inf. Technol., Dharmsinh Desai Univ., Nadiad, India
  • fYear
    2014
  • fDate
    6-8 March 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Post processing is an important part of any document processing system. There are two ways of post processing. First word level correction and second sentence level correction in document. The word level is performed in two ways first, finding error and finding dictionary by most similar word. That is called dictionary based approach. Another method to find most probable word is known as probabilistic approach. In order to generate the probabilistic model which includes unigram, bigram, trigram, online resources from various Gujarati newspaper websites are used. The proposed system will use models like Naïve Bayes and Hidden Markov Model to correct word level error. The system will be tested on synthetic dataset which is generated by adding random word level error in the actual document.
  • Keywords
    Bayes methods; Web sites; hidden Markov models; word processing; Gujarati document; Gujarati newspaper Web sites; bigram; dictionary based approach; document processing system; first word level correction; hidden Markov model; naive Bayes model; post processing; probabilistic approach; random word level error; second sentence level correction; synthetic dataset; trigram; unigram; Context; Crawlers; Dictionaries; Error correction; Hidden Markov models; Optical character recognition software; Probabilistic logic; Hidden Markov Model; Naïve Bayes; Probabilistic graphical model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Green Computing Communication and Electrical Engineering (ICGCCEE), 2014 International Conference on
  • Conference_Location
    Coimbatore
  • Type

    conf

  • DOI
    10.1109/ICGCCEE.2014.6921395
  • Filename
    6921395