DocumentCode
1773946
Title
PEWP: Process extraction based on word position in documents
Author
Yuchen Chen ; ZhiJun Ding ; Haichun Sun
Author_Institution
Dept. of Comput. Sci. & Technol., Tongji Univ., Shanghai, China
fYear
2014
fDate
Sept. 29 2014-Oct. 1 2014
Firstpage
135
Lastpage
140
Abstract
Search engine is a popular and beneficial tool to help people quickly find required information. However, some sequence information, such as “What should be prepared for applying visas”, “Where can apply visas” and “How long could get visas”, often can´t be integrally got from traditional search engine. But this sequence information is helpful to give great instructions to make people understand the steps of doing things. In this paper, the method of PEWP can automatically obtain the step sequence information based on the idea of process and text mining, considering both word position and frequency at the same time. The experiment makes a comparison between PEWP and topic extraction, and the results show PEWP is better, which is almost strict-sort and recall rate nearly to 71% at average.
Keywords
data mining; feature extraction; search engines; text analysis; word processing; PEWP; process extraction based on word position in documents; search engine; text mining; Cleaning; Context; Licenses; Registers; Standards; Text mining; process extraction; text mining; word position;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Information Management (ICDIM), 2014 Ninth International Conference on
Conference_Location
Phitsanulok
Type
conf
DOI
10.1109/ICDIM.2014.6991399
Filename
6991399
Link To Document