• DocumentCode
    3465933
  • Title

    Lexical and Discourse Analysis of Online Chat Dialog

  • Author

    Forsyth, Eric N. ; Martell, Craig H.

  • Author_Institution
    Naval Postgraduate Sch., Monterey
  • fYear
    2007
  • fDate
    17-19 Sept. 2007
  • Firstpage
    19
  • Lastpage
    26
  • Abstract
    One of the ultimate goals of natural language processing (NLP) systems is understanding the meaning of what is being transmitted, irrespective of the medium (e.g., written versus spoken) or the form (e.g., static documents versus dynamic dialogues). Although much work has been done in traditional language domains such as speech and static written text, little has yet been done in the newer communication domains enabled by the Internet, e.g., online chat and instant messaging. This is in part due to the fact that there are no annotated chat corpora available to the broader research community. The purpose of this research is to build a chat corpus, tagged with lexical (token part-of-speech labels), syntactic (post parse tree), and discourse (post classification) information. Such a corpus can then be used to develop more complex, statistical-based NLP applications that perform tasks such as author profiling, entity identification, and social network analysis.
  • Keywords
    Internet; electronic messaging; natural language processing; Internet; broader research community; chat corpora; discourse analysis; discourse information; instant messaging; lexical analysis; natural language processing systems; online chat dialog; post classification information; post parse tree information; statistical-based NLP applications; syntactic; token part-of-speech labels; Classification tree analysis; Computer science; Internet; Natural language processing; Natural languages; Privacy; Protection; Speech; XML; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Computing, 2007. ICSC 2007. International Conference on
  • Conference_Location
    Irvine, CA
  • Print_ISBN
    978-0-7695-2997-4
  • Type

    conf

  • DOI
    10.1109/ICSC.2007.55
  • Filename
    4338328