• DocumentCode
    1665232
  • Title

    Extracting and cleaning data from semi-structure Chinese texts

  • Author

    Fan, Guang-yuan ; Long, Shun ; Zhu, Weiheng

  • Author_Institution
    Department of Computer Science, Jinan University, Guangzhou, China
  • fYear
    2011
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Data mining helps to uncover valuable information from raw data in large volume. However, the latter usually comes in text instead of structured form, and contains noise which makes analysis difficult. Therefore, it is of vital importance to extract and clean raw data before in-depth analysis are applied. This paper presents a new approach to data extraction and cleaning from semi-structured Chinese texts. Experimental results show that it can effectively prepare data for mining.
  • Keywords
    Artificial intelligence; Cleaning; Data mining; HTML; Internet; Security; XML; data transfer; data warehouse; information extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    E -Business and E -Government (ICEE), 2011 International Conference on
  • Conference_Location
    Shanghai, China
  • Print_ISBN
    978-1-4244-8691-5
  • Type

    conf

  • DOI
    10.1109/ICEBEG.2011.5884487
  • Filename
    5884487