• DocumentCode
    591475
  • Title

    Statistical analysis of Hindi BTEC speech database

  • Author

    Arora, Samarth ; Arora, Kavita ; Aggarwal, Shubhashis Sengupta

  • Author_Institution
    CDAC, Noida, India
  • fYear
    2012
  • fDate
    9-12 Dec. 2012
  • Firstpage
    157
  • Lastpage
    162
  • Abstract
    The BTEC (Basic Travel Expression Corpus) is developed by NICT, Japan and has a wide-coverage of basic Japanese travel expressions with English counterparts for the purpose of using it as the basic data for developing high quality speech translation system. The English counterpart of this corpus has been translated Hindi manually. It is used for development of English-Hindi speech translation system. In this paper, we present the statistical analysis of this translated Hindi BTEC corpus. Besides that, the translation methodology adopted in development of the corpus is also described. The statistical evaluations performed in the experiments, provide information of distribution of sentences, words, various phonemes and their growth behavior which provide direction for future enhancement of the corpus.
  • Keywords
    language translation; natural language processing; speech processing; statistical analysis; word processing; Basic Travel Expression Corpus; English-Hindi speech translation system quality; Hindi BTEC speech database corpus translation; Japanese travel expressions; NICT; growth behavior; phoneme distribution information; sentence distribution information; statistical analysis; word distribution information; Sampling methods; Shape; Sociology; Speech; Statistical analysis; Vocabulary; Corpus statistics; Hindi BTEC; Speech Corpus;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Speech Database and Assessments (Oriental COCOSDA), 2012 International Conference on
  • Conference_Location
    Macau
  • Print_ISBN
    978-1-4673-2811-1
  • Electronic_ISBN
    978-1-4673-2812-8
  • Type

    conf

  • DOI
    10.1109/ICSDA.2012.6422480
  • Filename
    6422480