• DocumentCode
    330299
  • Title

    Learning to extract and classify names from text

  • Author

    Fox, Heidi ; Schwartz, Richard ; Stone, Rebecca ; Weischedel, Ralph ; Gadz, Walter

  • Author_Institution
    GTE/BBN Technol., Cambridge, MA, USA
  • Volume
    2
  • fYear
    1998
  • fDate
    11-14 Oct 1998
  • Firstpage
    1668
  • Abstract
    A requirement of virtually all analytic tools, such as timeline and spatial analysis, is structured data; however, much data is in text, an unstructured form. This article presents a new technology to bridge the gap between data buried in text and the requirement of structured data for analysis. The outcome should be an easy-to-maintain information technology component to support DoD and law enforcement applications. Our new approach uses statistical pattern recognition to learn to find data that is locally identifiable, e.g., that is not highly dependent on contexts. Examples are person names, organization names, locations, dates, times, monetary amounts, phone numbers, addresses, and social security numbers. The paper describes the statistical model employed, compares and contrasts the approach to previous approaches, numerically evaluates the adequacy of the technology on Government-supplied data, and illustrates the kind of examples needed for the system to learn to recognize the data desired from examples in documents
  • Keywords
    learning (artificial intelligence); pattern classification; public administration; statistical analysis; text analysis; DoD applications; Government-supplied data; addresses; dates; information technology component; law enforcement applications; learning; locations; monetary amounts; name classification; name extraction; organization names; personal names; social security numbers; statistical model; statistical pattern recognition; telephone numbers; text; times; Bridges; Data analysis; Data mining; Data security; Databases; Information analysis; Information technology; Law enforcement; Natural languages; Performance analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics, 1998. 1998 IEEE International Conference on
  • Conference_Location
    San Diego, CA
  • ISSN
    1062-922X
  • Print_ISBN
    0-7803-4778-1
  • Type

    conf

  • DOI
    10.1109/ICSMC.1998.728133
  • Filename
    728133