• DocumentCode
    480741
  • Title

    Name Disambiguation Boosted by Latent Topics from Web Directories

  • Author

    Vu, Quang Minh ; Takasu, Atsuhiro ; Adachi, Jun

  • Author_Institution
    Nat. Inst. of Inf., Tokyo
  • Volume
    1
  • fYear
    2008
  • fDate
    9-12 Dec. 2008
  • Firstpage
    697
  • Lastpage
    703
  • Abstract
    Search results for personal name queries often contain documents relevant to several people as a personal name is often shared by several people. In order to differentiate people in these search results, it is required to extract contexts relevant to people in documents. However, since Web documents are noisy and the texts related to people might be short, it is difficult to extract contexts of people effectively. We propose a new method that uses web directories as additional information in order to recognize topic terms in documents more easily and to extract contexts of people more effectively. First, we apply latent Dirichlet allocation method to extract latent topics in Web directories. Then, the extracted topics are used to recognize topics contained in name ambiguity documents so that common context measurements can be calculated more effectively. Our experiments, conducted with documents of real people in the Web and several well-known Web directories, show that our approach disambiguates personal names better than some other conventional approaches like vector space model approach and named entity recognition approach.
  • Keywords
    Internet; document handling; information retrieval; Web directory; Web documents; context measurements; latent Dirichlet allocation; latent topics; name ambiguity documents; name disambiguation; named entity recognition; personal name query; vector space model approach; Context modeling; Data mining; Feature extraction; Frequency; Informatics; Intelligent agent; Linear discriminant analysis; Search engines; Web sites; World Wide Web; Personal name disambiguation; document similarity; knowledge base; latent Dirichlet allocation; latent topic extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-0-7695-3496-1
  • Type

    conf

  • DOI
    10.1109/WIIAT.2008.171
  • Filename
    4740532