• DocumentCode
    2772345
  • Title

    Large Scale Relation Acquisition Using Class Dependent Patterns

  • Author

    Saeger, Stijn De ; Torisawa, Kentaro ; Kazama, Junichi ; Kuroda, Kow ; Murata, Masaki

  • Author_Institution
    Language Infrastruct. Group, Nat. Inst. of Inf. & Commun. Technol. (NICT), Seika, Japan
  • fYear
    2009
  • fDate
    6-9 Dec. 2009
  • Firstpage
    764
  • Lastpage
    769
  • Abstract
    This paper proposes a minimally supervised method for acquiring high-level semantic relations such as causality and prevention from the Web. Our method learns linguistic patterns that express causality such as ¿x gave rise to y¿, and uses them to extract causal noun pairs like (global warming, malaria epidemic) from sentences like ¿global warming gave rise to a new malaria epidemic¿. The novelty of our method lies in the use of semantic word classes acquired by large scale clustering for learning class dependent patterns. We demonstrate the effectiveness of this class based approach on three large-scale relation mining tasks from 50 million Japanese Web pages. In two of these tasks we obtained more than 30,000 relation instances with over 80% precision, outperforming a state-of-the-art system by a large margin.
  • Keywords
    Internet; Web sites; causality; data acquisition; data mining; learning (artificial intelligence); natural language processing; pattern clustering; Japanese Web pages; World Wide Web; causality; class dependent patterns; high-level semantic relations; large scale clustering; large scale relation acquisition; large-scale relation mining tasks; linguistic patterns; semantic word classes; supervised method; Communications technology; Data mining; Diseases; Frequency; Global warming; Information retrieval; Large-scale systems; Sea measurements; Text mining; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
  • Conference_Location
    Miami, FL
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-5242-2
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2009.140
  • Filename
    5360308