Developing corpus management system for Bahasa Indonesia the “Perisalah” project

Author

Uliniansyah, Teduh ; Riza, Hammam ; Riandi, Oskar

Author_Institution

Inf. & Comput. Syst., ICT Center (PTIK) Agency for the Assessment & Applic. of Technol., Jakarta, Indonesia

fYear

2013

fDate

25-27 Nov. 2013

Firstpage

1

Lastpage

4

Abstract

This paper present a report on the research and development of Indonesian corpus management system as part of the speech summarization system (Perisalah). The continuous improvement of the speech recognition for Indonesian language, require a better and larger monolingual corpus. We will discuss our method on building speech recognition. The system is equipped with a capability to handle variation of speech input, a more natural mode of communication between the system and the users. We discuss data contained in our text corpus and the corpus management system, mainly on how to handle sentence segmentation and unknown words (typos).

Keywords

audio databases; continuous improvement; natural language processing; research and development management; speech recognition; text analysis; Bahasa Indonesia; Indonesian corpus management system; Indonesian language; Perisalah project; continuous improvement; monolingual corpus; research and development; sentence segmentation; speech recognition; speech summarization system; text corpus; Adaptation models; Buildings; Data models; Dictionaries; Speech; Speech processing; Speech recognition; Corpus management system; bahasa Indonesia; natural language; speech processing;

fLanguage

English

Publisher

ieee

Conference_Titel

Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference

Conference_Location

Gurgaon

Type

conf

DOI

10.1109/ICSDA.2013.6709887

Filename

6709887