Title :
Application of document spelling checker for Bahasa Indonesia
Author :
Aqsath, R.N. ; Kamayani, Mia ; Reinanda, Ridho ; Simbolon, Simon ; Soleh, Moch Yusup ; Purwarianti, Ayu
Author_Institution :
Sch. of Electr. & Inf. Eng., Bandung Inst. of Technol., Bandung, Indonesia
Abstract :
The needs of document spelling checker of Bahasa Indonesia is highly required. Unfortunately, there is no available application of document spelling checker for Bahasa Indonesia. The existing researches on Indonesian spelling checker have not developed into a complete document spelling checker. Here in this research, we compare several methods employed for Indonesian spelling checker especially in the word error detection and analyzed best methods employed in the building of Indonesian document spelling checker application. The main idea is to employ a complete word list as the reference. The Indonesian document spelling checker consists of 5 main components, namely document preprocess, word error detection, word error correction, word candidate ranking, and user feedback. The document preprocess is to process the document into a list of unique word which will be analyzed further in the spelling checker. In the word error detection, a binary search and hashing are used to do the searching faster. In the word error correction, the forward reverse and a similarity measure score are employed. In the candidate ranking, HMM is used to select the best correct word candidate. Using 13,000 words as the lexicon resource and 10 documents as the tested documents, the experimental results achieved 93.7% accuracy. The errors are caused by the word absence in the lexicon resource and the special repetition word form.
Keywords :
document handling; hidden Markov models; natural language processing; Bahasa Indonesia; HMM; Indonesian document; Indonesian spelling checker; document preprocess; document spelling checker application; hidden markov model; word candidate ranking; word error correction; word error detection; Accuracy; Dictionaries; Forward error correction; Hidden Markov models; Vocabulary;
Conference_Titel :
Advanced Computer Science and Information System (ICACSIS), 2011 International Conference on
Conference_Location :
Jakarta
Print_ISBN :
978-1-4577-1688-1