• DocumentCode
    1618724
  • Title

    Generating information-rich taxonomy from Wikipedia

  • Author

    Yamada, Ichiro ; Hashimoto, Chikara ; Oh, Jong-Hoon ; Torisawa, Kentaro ; Kuroda, Kow ; Saeger, Stijn De ; Tsuchida, Masaaki ; Kazama, Junichi

  • Author_Institution
    MASTAR Project, Nat. Inst. of Inf. & Commun. Technol., Keihanna, Japan
  • fYear
    2010
  • Firstpage
    97
  • Lastpage
    104
  • Abstract
    Even though hyponymy relation acquisition has been extensively studied, “how informative such acquired hyponymy relations are” has not been sufficiently discussed. We found that the hypernyms in automatically acquired hyponymy relations were often too vague or ambiguous to specify the meaning of their hyponyms. For instance, hypernym work is vague and ambiguous in hyponymy relations work/Avatar and work/The Catcher in the Rye. In this paper, we propose a simple method of generating intermediate concepts of hyponymy relations that can make such (vague) hypernyms more specific. Our method generates such an information-rich hyponymy relation as work / work by film director / work by James Cameron / Avatar from the less informative relation work/Avatar. Furthermore, the generated relation work by film director/Avatar can be paraphrased into a new relation movie/Avatar. Experiments showed that our method successfully acquired 2,719,441 enriched hyponymy relations with one intermediate concept with 0.853 precision and another 6,347,472 hyponymy relations with 0.786 precision.
  • Keywords
    Internet; linguistics; James Cameron; Wikipedia; avatar; hypernym work; hyponymy relation acquisition; information rich taxonomy generation; informative relation work; Avatars; Cities and towns; Educational institutions; Electronic publishing; Encyclopedias; Internet;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Universal Communication Symposium (IUCS), 2010 4th International
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-7821-7
  • Type

    conf

  • DOI
    10.1109/IUCS.2010.5666764
  • Filename
    5666764