• DocumentCode
    2191532
  • Title

    An Information Extraction Method for Digitized Textbooks of Traditional Chinese Medicine

  • Author

    Zhu, Wenhao ; Bai, Shunlai ; Zhang, Bofeng ; Xu, Weimin ; Wei, Daming

  • Author_Institution
    Sch. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
  • fYear
    2010
  • fDate
    June 29 2010-July 1 2010
  • Firstpage
    1645
  • Lastpage
    1648
  • Abstract
    Digital libraries have shouldered the mission of preserving and spreading human culture in the era of information. However, knowledge extraction for digital libraries is not well studied, and that holds back the role promotion of digital libraries from information collector to knowledge provider. This paper presents an ontology-based approach, which extracts detailed attributes of Traditional Chinese Medicine (TCM) from digitized textbooks. According to the characters of digitized textbooks, we propose an extraction ontology that is compatible with both textbook extraction and TCM theory. To improve extraction tolerance for OCR errors, we extract features of different aspects. Finally, a structured pattern based extraction method is adopted to minimize extraction supervision. The result shows that our method is a practical and robust exploration to address the problem of information extraction for digitized textbooks of TCM.
  • Keywords
    data mining; digital libraries; feature extraction; ontologies (artificial intelligence); optical character recognition; text analysis; OCR errors; digital libraries; digitized textbooks; extraction ontology; extraction supervision; extraction tolerance; features extract; human culture; information collector; information extraction; knowledge extraction; knowledge provider; pattern based extraction; traditional Chinese medicine; Books; Catalogs; Data mining; Feature extraction; Libraries; Ontologies; Support vector machines; Digital Libraries; Information Extraction; Traditional Chinese Medicine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on
  • Conference_Location
    Bradford
  • Print_ISBN
    978-1-4244-7547-6
  • Type

    conf

  • DOI
    10.1109/CIT.2010.291
  • Filename
    5577950