• DocumentCode
    3079840
  • Title

    Preprocessors in NLP applications: In the context of English to Malayalam Machine Translation

  • Author

    Sunil, R. ; Jayan, V. ; Bhadran, V.K.

  • Author_Institution
    Language Technol. Centre, Centre for Dev. of Adv. Comput. (C-DAC), Trivandrum, India
  • fYear
    2012
  • fDate
    7-9 Dec. 2012
  • Firstpage
    221
  • Lastpage
    226
  • Abstract
    Preprocessing the input text is an essential component in a Natural Language Processing (NLP) system. We are discussing the relevance of the preprocessors in the context of Machine Translation system developed by us based on AnglaBharati Technology. Whenever we come across with text for translation we encounter with the special formats in an input text and getting its appropriate translation is a difficult task. Sometimes they may not have definite grammatical structure and may not be able to handle using a language rule. This paper present a strategy to identify the special formats in English text like date, currency, number, time, quotes, acronym, parenthesis, etc for a rule based English Malayalam Machine Aided Translation system. AnglaBharati is a pattern directed rule based system with context free grammar like structure for English which generates a pseudo target for group of Indian languages. Preprocessor is one of the main modules in this translation System. Here it manipulates the English input text to produce an input which is more suitable for an engine to generate appropriate translation. Extensive research is carried out in this area to disambiguate and process the input text in order to get more suitable output from the translation engine.
  • Keywords
    language translation; AnglaBharati technology; English; Malayalam; NLP applications; Natural Language Processing system; machine translation; preprocessors; pseudo target; Context; Data preprocessing; Engines; Helium; Knowledge based systems; Natural language processing; Terminology; AnglaBharati; machine translation; preprocessor;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    India Conference (INDICON), 2012 Annual IEEE
  • Conference_Location
    Kochi
  • Print_ISBN
    978-1-4673-2270-6
  • Type

    conf

  • DOI
    10.1109/INDCON.2012.6420619
  • Filename
    6420619