• DocumentCode
    677175
  • Title

    A hybrid method for detecting outdated information in Wikipedia infoboxes

  • Author

    Thong Tran ; Cao, Thi H.

  • Author_Institution
    Ho Chi Minh City Univ. of Technol., Ho Chi Minh City, Vietnam
  • fYear
    2013
  • fDate
    10-13 Nov. 2013
  • Firstpage
    97
  • Lastpage
    102
  • Abstract
    Wikipedia has grown fast and become a major information resource for users as well as for many knowledge bases derived from it. However it is still edited manually while the world is changing rapidly. In this paper, we propose a method to detect outdated attribute values in Wikipedia infoboxes by using facts extracted from the general Web. Our proposed method extracts new information by combining pattern-based approach with entity-search-based approach to deal with the diversity of natural language presentation forms of facts on the Web. Our experimental results show that the achieved accuracies of the proposed method are 70% and 82% respectively on the chief-executive-officer attribute and the number-of-employees attribute in company infoboxes. It significantly improves the accuracy of the single pattern-based or entity-search-based method. The results also reveal the striking truth about the outdated status of Wikipedia.
  • Keywords
    Internet; Web sites; natural language processing; Wikipedia infoboxes; company infoboxes; entity-search-based approach; fact extraction; information resource; knowledge bases; natural language presentation forms; outdated information detection; pattern-based approach; Companies; Data mining; Electronic publishing; Encyclopedias; Internet; Web pages; Entity Search; Information Extraction; Pattern Learning; Wikipedia Update;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2013 IEEE RIVF International Conference on
  • Conference_Location
    Hanoi
  • Print_ISBN
    978-1-4799-1349-7
  • Type

    conf

  • DOI
    10.1109/RIVF.2013.6719874
  • Filename
    6719874