• DocumentCode
    2643137
  • Title

    Entity refinement using latent semantic indexing

  • Author

    Bradford, R.B.

  • Author_Institution
    Agilex Technol., Chantilly, VA, USA
  • fYear
    2010
  • fDate
    23-26 May 2010
  • Firstpage
    126
  • Lastpage
    128
  • Abstract
    Automated extraction of named entities is an important text analysis task. In addition to recognizing the occurrence of entity names, it is important to be able to label those names by type. Most entity extraction techniques categorize extracted entities into a few basic types, such as PERSON, ORGANIZATION, and LOCATION. This paper presents an approach for generating more fine-grained subdivisions of entity type. The technique of latent semantic indexing (LSI) is used to provide semantic context as an indicator of likely entity subtype. Tests were carried out on a collection of 5.5 million English-language news articles. At modest levels of recall, the accuracy of sub-type assignment was comparable to the accuracy with which the gross type was assigned by a state-of-the-art commercial entity extraction software package.
  • Keywords
    Application software; Classification algorithms; Data mining; Hidden Markov models; Indexing; Kernel; Large scale integration; Ontologies; Software packages; Testing; LSI; entity extraction; entity refinement; entity tagging; latent semantic indexing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligence and Security Informatics (ISI), 2010 IEEE International Conference on
  • Conference_Location
    Vancouver, BC, Canada
  • Print_ISBN
    978-1-4244-6444-9
  • Type

    conf

  • DOI
    10.1109/ISI.2010.5484765
  • Filename
    5484765