• DocumentCode
    258693
  • Title

    Lexical normalization model for noisy SMS text

  • Author

    Jose, Greety ; Raj, Nisha S.

  • Author_Institution
    Dept. of Comput. Sci., SCMS Sch. of Eng. & Technol., Ernakulam, India
  • fYear
    2014
  • fDate
    17-18 Dec. 2014
  • Firstpage
    57
  • Lastpage
    62
  • Abstract
    In day to day life, digital mediated interactions and communications being an important constituent. The expeditious growth of electronic communications such as E-mails, micro blogs, SMS and chats etc has fabricated extensively noisy forms of text. It predominantly in young urbanités. The tremendous growth of noises in text are due to a variety of factors, such as the small number of characters allowed per text messages (160 characters is allowed per SMS and 140 characters allowed per tweets), inventing new abbreviations, using non standard orthographic forms, phonetic substitution etc. In this paper we introduce a lexical normalization model for cleaning the noisy texts. The normalization is based on the channelized database. The model will capture the user interaction for improving the model accuracy. Precursory evaluation shows that the channel model will normalize the noisy word to their standard peer with 97.5 % accuracy.
  • Keywords
    electronic messaging; text analysis; E-mails; channelized database; electronic communications; lexical normalization model; micro blogs; natural language processing; noisy SMS text; phonetic substitution; short message services; Computational modeling; Databases; Hidden Markov models; Natural language processing; Noise; Noise measurement; Standards; Lexical Normalization; Machine Translation; Natural Language Processing; Noisy words; Non-noisy word; SMS; Social Media; Text Normalization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Systems and Communications (ICCSC), 2014 First International Conference on
  • Conference_Location
    Trivandrum
  • Print_ISBN
    978-1-4799-6012-5
  • Type

    conf

  • DOI
    10.1109/COMPSC.2014.7032621
  • Filename
    7032621