DocumentCode :
1665232
Title :
Extracting and cleaning data from semi-structure Chinese texts
Author :
Fan, Guang-yuan ; Long, Shun ; Zhu, Weiheng
Author_Institution :
Department of Computer Science, Jinan University, Guangzhou, China
fYear :
2011
Firstpage :
1
Lastpage :
4
Abstract :
Data mining helps to uncover valuable information from raw data in large volume. However, the latter usually comes in text instead of structured form, and contains noise which makes analysis difficult. Therefore, it is of vital importance to extract and clean raw data before in-depth analysis are applied. This paper presents a new approach to data extraction and cleaning from semi-structured Chinese texts. Experimental results show that it can effectively prepare data for mining.
Keywords :
Artificial intelligence; Cleaning; Data mining; HTML; Internet; Security; XML; data transfer; data warehouse; information extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
E -Business and E -Government (ICEE), 2011 International Conference on
Conference_Location :
Shanghai, China
Print_ISBN :
978-1-4244-8691-5
Type :
conf
DOI :
10.1109/ICEBEG.2011.5884487
Filename :
5884487
Link To Document :
بازگشت