DocumentCode :
2183506
Title :
The Partition Heuristic Information Extraction Algorithm of Unstructured Data
Author :
Cong Li ; Chengming Zou ; Luo Zhong ; Jinyang Zhu
Author_Institution :
Sch. of Comput. Sci. & Technol., Wuhan Univ. of Technol., Wuhan, China
fYear :
2013
fDate :
16-19 Dec. 2013
Firstpage :
570
Lastpage :
576
Abstract :
In this paper, we propose a method that extracts attributes of given entity from unstructured data for the field of logistics by using the idea of divide and conquer as to the characters of logistics information. After the full study of logistics information, we make a statistical analysis for the text logistics information and summarize the common attributes of text information entity. According to the different attributes and attribute values, we divided text information entity by the idea of divide and conquer. As to the entity we get from last step we make an internal processing based on segmentation method of tagging and graph. We extracted valuable attributes and attribute values from the unstructured data. Experimental results show that this method is valid for the logistics information which we achieve from a well-known logistics system.
Keywords :
divide and conquer methods; information retrieval; logistics data processing; statistical analysis; text analysis; attribute extraction; divide and conquer method; graph; partition heuristic information extraction algorithm; statistical analysis; tagging segmentation method; text information entity; text logistics information; unstructured data; Cities and towns; Data mining; Educational institutions; Information retrieval; Logistics; Statistical analysis; Vehicles; divide-and-conquer method; extraction; information; logistics information; unstructured data; words segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing and Big Data (CloudCom-Asia), 2013 International Conference on
Conference_Location :
Fuzhou
Print_ISBN :
978-1-4799-2829-3
Type :
conf
DOI :
10.1109/CLOUDCOM-ASIA.2013.104
Filename :
6821051
Link To Document :
بازگشت