• DocumentCode
    583151
  • Title

    Cross Language Information Extraction for Digitized Textbooks of Specific Domains

  • Author

    Zhu, Wenhao ; Luo, Laihu ; Ju, Chaoyou ; Zhang, Bofeng

  • Author_Institution
    Sch. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
  • fYear
    2012
  • fDate
    27-29 Oct. 2012
  • Firstpage
    1114
  • Lastpage
    1118
  • Abstract
    While the influence of the digitization movement is getting wider and wider, more and more countries have initiated their own digital library projects to preserve the culture by digitize millions of books. Together with all kinds of digital resources, such as videos, audios, images etc., the digital library can provide advanced services far more than reading and browsing. Information extraction is one of the fundamental methods to get structured information out of the digital books. Therefore, due to its importance for content integration and knowledge discovery, information extraction for different languages is becoming a key problem for the development of digital library. In this paper, we present a domain-related information extraction framework that suits for digitized textbooks of different languages. To achieve cross language adaptation, we introduce language independent features and simple language dependent features that bind with domain characters to generate extractors. Finally, we present two preliminary experiments to show the feasibility of this framework.
  • Keywords
    data mining; digital libraries; electronic publishing; content integration; cross language information extraction; digital books; digital library projects; digital resources; digitization movement; digitized textbooks; domain-related information extraction framework; knowledge discovery; language independent features; simple language dependent features; specific domains; Data mining; Electronic publishing; Encyclopedias; Feature extraction; Information retrieval; Libraries; Cross Language; Digitized Textbook; Information Extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology (CIT), 2012 IEEE 12th International Conference on
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-1-4673-4873-7
  • Type

    conf

  • DOI
    10.1109/CIT.2012.226
  • Filename
    6392063