DocumentCode :
1843677
Title :
Data extraction and cleansing of semi-structured Chinese texts
Author :
Zhu, Wei-heng ; Long, Shun
Author_Institution :
Dept. of Comput. Sci., Jinan Univ., Guangzhou, China
Volume :
1
fYear :
2011
fDate :
13-15 May 2011
Firstpage :
726
Lastpage :
729
Abstract :
The rapid growth of data mining generates an ever-increasing demand for automatic information extraction from Chinese texts. However, existing approaches in this domain focus on well-structured Chinese texts and therefore have difficulties in dealing with semi-structured Chinese texts which do not conform to strict syntactic structures. We propose in this paper an approach to semi-automatic data extraction and cleansing for these texts. Preliminary experimental results show that, with modest manual intervention, it can effectively extract information from raw semi-structured Chinese texts collected from e-business applications.
Keywords :
business data processing; data mining; information retrieval; natural language processing; text analysis; automatic information extraction; data mining; e-business application; semiautomatic data extraction; semistructured Chinese text; text cleansing; Data mining; Data warehouses; Manuals; Merchandise; Semantics; Syntactics; Terminology; Chinese; data cleansing; data extraction; manual intervention; semi-structured text;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Business Management and Electronic Information (BMEI), 2011 International Conference on
Conference_Location :
Guangzhou
Print_ISBN :
978-1-61284-108-3
Type :
conf
DOI :
10.1109/ICBMEI.2011.5917038
Filename :
5917038
Link To Document :
بازگشت