• DocumentCode
    419636
  • Title

    Noisy text categorization

  • Author

    Vinciarelli, Alessandro

  • Author_Institution
    Dalle Molle Inst. for Perceptual Artificial Intelligence, Switzerland
  • Volume
    2
  • fYear
    2004
  • fDate
    23-26 Aug. 2004
  • Firstpage
    554
  • Abstract
    This work presents a system for the categorization of noisy texts. Noisy means any text obtained through an extraction process (affected by errors) from media different than digital texts. We show that, even with an average word error rate of around 50%, the categorization performance loss with respect to the clean version of the same documents is negligible.
  • Keywords
    text analysis; word processing; average word error rate; categorization performance loss; digital texts; noisy text categorization; Data mining; Databases; Error analysis; Handwriting recognition; Information retrieval; Performance loss; Speech recognition; Support vector machines; Text categorization; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
  • ISSN
    1051-4651
  • Print_ISBN
    0-7695-2128-2
  • Type

    conf

  • DOI
    10.1109/ICPR.2004.1334303
  • Filename
    1334303