Title :
Data extraction and cleansing of semi-structured Chinese texts
Author :
Zhu, Wei-heng ; Long, Shun
Author_Institution :
Dept. of Comput. Sci., Jinan Univ., Guangzhou, China
Abstract :
The rapid growth of data mining generates an ever-increasing demand for automatic information extraction from Chinese texts. However, existing approaches in this domain focus on well-structured Chinese texts and therefore have difficulties in dealing with semi-structured Chinese texts which do not conform to strict syntactic structures. We propose in this paper an approach to semi-automatic data extraction and cleansing for these texts. Preliminary experimental results show that, with modest manual intervention, it can effectively extract information from raw semi-structured Chinese texts collected from e-business applications.
Keywords :
business data processing; data mining; information retrieval; natural language processing; text analysis; automatic information extraction; data mining; e-business application; semiautomatic data extraction; semistructured Chinese text; text cleansing; Data mining; Data warehouses; Manuals; Merchandise; Semantics; Syntactics; Terminology; Chinese; data cleansing; data extraction; manual intervention; semi-structured text;
Conference_Titel :
Business Management and Electronic Information (BMEI), 2011 International Conference on
Conference_Location :
Guangzhou
Print_ISBN :
978-1-61284-108-3
DOI :
10.1109/ICBMEI.2011.5917038