• DocumentCode
    672862
  • Title

    Developing corpus management system for Bahasa Indonesia the “Perisalah” project

  • Author

    Uliniansyah, Teduh ; Riza, Hammam ; Riandi, Oskar

  • Author_Institution
    Inf. & Comput. Syst., ICT Center (PTIK) Agency for the Assessment & Applic. of Technol., Jakarta, Indonesia
  • fYear
    2013
  • fDate
    25-27 Nov. 2013
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    This paper present a report on the research and development of Indonesian corpus management system as part of the speech summarization system (Perisalah). The continuous improvement of the speech recognition for Indonesian language, require a better and larger monolingual corpus. We will discuss our method on building speech recognition. The system is equipped with a capability to handle variation of speech input, a more natural mode of communication between the system and the users. We discuss data contained in our text corpus and the corpus management system, mainly on how to handle sentence segmentation and unknown words (typos).
  • Keywords
    audio databases; continuous improvement; natural language processing; research and development management; speech recognition; text analysis; Bahasa Indonesia; Indonesian corpus management system; Indonesian language; Perisalah project; continuous improvement; monolingual corpus; research and development; sentence segmentation; speech recognition; speech summarization system; text corpus; Adaptation models; Buildings; Data models; Dictionaries; Speech; Speech processing; Speech recognition; Corpus management system; bahasa Indonesia; natural language; speech processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference
  • Conference_Location
    Gurgaon
  • Type

    conf

  • DOI
    10.1109/ICSDA.2013.6709887
  • Filename
    6709887