DocumentCode
672862
Title
Developing corpus management system for Bahasa Indonesia the “Perisalah” project
Author
Uliniansyah, Teduh ; Riza, Hammam ; Riandi, Oskar
Author_Institution
Inf. & Comput. Syst., ICT Center (PTIK) Agency for the Assessment & Applic. of Technol., Jakarta, Indonesia
fYear
2013
fDate
25-27 Nov. 2013
Firstpage
1
Lastpage
4
Abstract
This paper present a report on the research and development of Indonesian corpus management system as part of the speech summarization system (Perisalah). The continuous improvement of the speech recognition for Indonesian language, require a better and larger monolingual corpus. We will discuss our method on building speech recognition. The system is equipped with a capability to handle variation of speech input, a more natural mode of communication between the system and the users. We discuss data contained in our text corpus and the corpus management system, mainly on how to handle sentence segmentation and unknown words (typos).
Keywords
audio databases; continuous improvement; natural language processing; research and development management; speech recognition; text analysis; Bahasa Indonesia; Indonesian corpus management system; Indonesian language; Perisalah project; continuous improvement; monolingual corpus; research and development; sentence segmentation; speech recognition; speech summarization system; text corpus; Adaptation models; Buildings; Data models; Dictionaries; Speech; Speech processing; Speech recognition; Corpus management system; bahasa Indonesia; natural language; speech processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference
Conference_Location
Gurgaon
Type
conf
DOI
10.1109/ICSDA.2013.6709887
Filename
6709887
Link To Document