PEWP: Process extraction based on word position in documents

Author

Yuchen Chen ; ZhiJun Ding ; Haichun Sun

Author_Institution

Dept. of Comput. Sci. & Technol., Tongji Univ., Shanghai, China

fYear

2014

fDate

Sept. 29 2014-Oct. 1 2014

Firstpage

135

Lastpage

140

Abstract

Search engine is a popular and beneficial tool to help people quickly find required information. However, some sequence information, such as “What should be prepared for applying visas”, “Where can apply visas” and “How long could get visas”, often can´t be integrally got from traditional search engine. But this sequence information is helpful to give great instructions to make people understand the steps of doing things. In this paper, the method of PEWP can automatically obtain the step sequence information based on the idea of process and text mining, considering both word position and frequency at the same time. The experiment makes a comparison between PEWP and topic extraction, and the results show PEWP is better, which is almost strict-sort and recall rate nearly to 71% at average.

Keywords

data mining; feature extraction; search engines; text analysis; word processing; PEWP; process extraction based on word position in documents; search engine; text mining; Cleaning; Context; Licenses; Registers; Standards; Text mining; process extraction; text mining; word position;

fLanguage

English

Publisher

ieee

Conference_Titel

Digital Information Management (ICDIM), 2014 Ninth International Conference on

Conference_Location

Phitsanulok

Type

conf

DOI

10.1109/ICDIM.2014.6991399

Filename

6991399