Title :
PEWP: Process extraction based on word position in documents
Author :
Yuchen Chen ; ZhiJun Ding ; Haichun Sun
Author_Institution :
Dept. of Comput. Sci. & Technol., Tongji Univ., Shanghai, China
fDate :
Sept. 29 2014-Oct. 1 2014
Abstract :
Search engine is a popular and beneficial tool to help people quickly find required information. However, some sequence information, such as “What should be prepared for applying visas”, “Where can apply visas” and “How long could get visas”, often can´t be integrally got from traditional search engine. But this sequence information is helpful to give great instructions to make people understand the steps of doing things. In this paper, the method of PEWP can automatically obtain the step sequence information based on the idea of process and text mining, considering both word position and frequency at the same time. The experiment makes a comparison between PEWP and topic extraction, and the results show PEWP is better, which is almost strict-sort and recall rate nearly to 71% at average.
Keywords :
data mining; feature extraction; search engines; text analysis; word processing; PEWP; process extraction based on word position in documents; search engine; text mining; Cleaning; Context; Licenses; Registers; Standards; Text mining; process extraction; text mining; word position;
Conference_Titel :
Digital Information Management (ICDIM), 2014 Ninth International Conference on
Conference_Location :
Phitsanulok
DOI :
10.1109/ICDIM.2014.6991399