• DocumentCode
    480719
  • Title

    Growing Fields of Interest - Using an Expand and Reduce Strategy for Domain Model Extraction

  • Author

    Thomas, Christopher ; Mehra, Pankaj ; Brooks, Roger ; Sheth, Amit

  • Author_Institution
    HP Labs., Palo Alto, CA
  • Volume
    1
  • fYear
    2008
  • fDate
    9-12 Dec. 2008
  • Firstpage
    496
  • Lastpage
    502
  • Abstract
    Domain hierarchies are widely used as models underlying information retrieval tasks. Formal ontologies and taxonomies enrich such hierarchies further with properties and relationships but require manual effort; therefore they are costly to maintain, and often stale. Folksonomies and vocabularies lack rich category structure. Classification and extraction require the coverage of vocabularies and the alterability of folksonomies and can largely benefit from category relationships and other properties. With Doozer, a program for building conceptual models of information domains, we want to bridge the gap between the vocabularies and Folksonomies on the one side and the rich, expert-designed ontologies and taxonomies on the other. Doozer mines Wikipedia to produce tight domain hierarchies, starting with simple domain descriptions. It also adds relevancy scores for use in automated classification of information. The output model is described as a hierarchy of domain terms that can be used immediately for classifiers and IR systems or as a basis for manual or semi-automatic creation of formal ontologies.
  • Keywords
    data mining; information retrieval; ontologies (artificial intelligence); pattern classification; search engines; Doozer; Wikipedia; category relationships; domain model extraction; expert-designed ontologies; folksonomies; formal ontologies; information classification; information retrieval tasks; taxonomies; vocabularies; Bridges; Buildings; Data mining; Encyclopedias; Information retrieval; Intelligent agent; Ontologies; Taxonomy; Vocabulary; Wikipedia; Domain Model creation; Expand and Reduce; Taxonomy extraction; Wikipedia;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-0-7695-3496-1
  • Type

    conf

  • DOI
    10.1109/WIIAT.2008.358
  • Filename
    4740498