• DocumentCode
    408302
  • Title

    A hybrid classifier approach for Web retrieved documents classification

  • Author

    Bot, Razvan Stefan ; Wu, Yi-fang Brook ; Chen, Xin ; Li, Quanzhi

  • Author_Institution
    Dept. of Inf. Syst., New Jersey Inst. of Technol., Newark, NJ, USA
  • Volume
    1
  • fYear
    2004
  • fDate
    5-7 April 2004
  • Firstpage
    326
  • Abstract
    The paper presents a hybrid technique for the classification of Web returned hits into concept hierarchies. The technique involves a combination of manual and automatic classifiers. At first, all Web returned documents are assigned to human defined categories using manual classifiers, and then automatic classifiers are used to generate a concept hierarchy for each of these categories. The results of the evaluation reveal the following: (a) for polysemous queries, our system is able to generate meaningful categories corresponding to (but not limited to), the different semantic facets of the queries; (b) as expected, for non-polysemous queries the system generates fewer categories; (c) the hierarchy precision of the concept hierarchies generated for polysemous queries is found to be significantly better when compared to the one obtained using a baseline system.
  • Keywords
    Internet; classification; information retrieval; Web retrieved document classification; Web returned documents; Web returned hits; World Wide Web; automatic classifiers; baseline system; concept hierarchies; concept hierarchy generation; hierarchy precision; human defined categories; hybrid classification; hybrid classifier; information retrieval; manual classifiers; nonpolysemous queries; polysemous queries; query semantic facets; Humans; Information filtering; Information retrieval; Information systems; Information technology; Internet; Manuals; Paper technology; Scattering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on
  • Print_ISBN
    0-7695-2108-8
  • Type

    conf

  • DOI
    10.1109/ITCC.2004.1286474
  • Filename
    1286474