• DocumentCode
    525701
  • Title

    An empirical study on harmonizing classification precision using IE patterns

  • Author

    Soon, Lay-Ki ; Hwang, Kyu-Baek ; Lee, Sang Ho

  • Author_Institution
    Fac. of Inf. Technol., Multimedia Univ., Selangor, Malaysia
  • fYear
    2010
  • fDate
    23-25 June 2010
  • Firstpage
    251
  • Lastpage
    256
  • Abstract
    Web pages are conventionally represented by the words found within the contents for classification purpose. However, word-based web page representation suffers several limitations such as synonymy and homonymy. Motivated by the limitations of word-based representation, we explore the potential of representing web pages using information extraction patterns, in addition to words that are identified within the web contents. In this paper, we share the results as well as the findings learned from our experiments. Our empirical study conducted using WebKB dataset indicates that the addition of information extraction patterns in web page representation helps to improve the classification precision, especially in the categories which have much diversified web content.
  • Keywords
    Computer science; Crawlers; Data mining; Information retrieval; Information technology; Multimedia computing; Parallel programming; Search engines; Web mining; Web pages; information extraction; information retrieval; web classification; web mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering and Data Mining (SEDM), 2010 2nd International Conference on
  • Conference_Location
    Chengdu, China
  • Print_ISBN
    978-1-4244-7324-3
  • Electronic_ISBN
    978-89-88678-22-0
  • Type

    conf

  • Filename
    5542915