• DocumentCode
    3190391
  • Title

    High-Speed Identification of Language and Script

  • Author

    Ratner, Alan ; Loui, Ron

  • fYear
    2007
  • fDate
    28-31 Oct. 2007
  • Firstpage
    563
  • Lastpage
    568
  • Abstract
    Humans communicate with text in thousands of languages, in dozens of scripts, and a wide variety of binary codes. There is a need to identify the language, script and code of this text to enable follow-on processing such as transcoding, translation, transliteration, routing and prioritization. This paper deals with the implementation of real-time language and script identification on high-speed hardware (principally a ternary content addressable memory) capable of processing network data streams at several gigabits per second.
  • Keywords
    Associative memory; Background noise; Conferences; Data mining; Field programmable gate arrays; Hardware; Humans; Java; Natural languages; Pattern matching;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on
  • Conference_Location
    Omaha, NE
  • Print_ISBN
    978-0-7695-3019-2
  • Electronic_ISBN
    978-0-7695-3033-8
  • Type

    conf

  • DOI
    10.1109/ICDMW.2007.117
  • Filename
    4476723