• DocumentCode
    2983022
  • Title

    Multi-factor matching method for basic information of science and technology experts based on Web mining

  • Author

    Zhou, Pei ; Zhu, Quanyin

  • Author_Institution
    Fac. of Comput. Eng., Huaiyin Inst. of Technol., Huaiyin, China
  • fYear
    2012
  • fDate
    22-24 June 2012
  • Firstpage
    718
  • Lastpage
    720
  • Abstract
    The accuracy rate of information extracting by Web mining is not high because of the diversity and complexity of Web page. In order to increase the accuracy rate of information extracting by Web mining for building the science and technology basic information system, a novel multi-factor matching is proposed in this paper. The proposed method integrates the position of every word among the keywords corpus in normalized text and the multi-factor matching method between keywords corpus and normalized text which extracted from Web page by URL. The extracted results include the name, sex, birth, hometown and professional title of science and technology experts respectively. Experiments show that the accuracy rates obtain 95.64 percent and the recall rates achieve 99.69 percent respectively. The results show as by proposed method can satisfied the application requirements.
  • Keywords
    Internet; Web sites; data mining; information retrieval; scientific information systems; text analysis; URL; Web mining; Web page complexity; Web page diversity; information extraction; keywords corpus; multifactor matching method; normalized text; science and technology basic information system; science and technology experts birth; science and technology experts hometown; science and technology experts name; science and technology experts professional title; science and technology experts sex; Web mining; keywords corpus; multi-factor matching; normalized text; science and technology experts;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering and Service Science (ICSESS), 2012 IEEE 3rd International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4673-2007-8
  • Type

    conf

  • DOI
    10.1109/ICSESS.2012.6269567
  • Filename
    6269567