DocumentCode :
2017496
Title :
Information Extraction Techniques for Postal Address Standardization
Author :
Abbasi, RabeehAyaz
Author_Institution :
Fac. of Comput., Riphah Int. Univ., Islamabad
fYear :
2005
fDate :
24-25 Dec. 2005
Firstpage :
1
Lastpage :
6
Abstract :
The unique frames of reference of humans result in various definitions of the same details. They develop addresses of same places in different ways, which might result in inconsistent format of addresses ultimately leading to misapprehensions. A major motivation for standardization of the addresses is cleansing of addresses in data warehouses. Since almost every organization deals with a variety of addresses of its customers and employees therefore, a consistent format of addresses can ensure better knowledge of the organization about its customers. This paper presents various information extraction techniques which can also be used in address standardization. It focuses on a statistical model, hidden Markov model (HMM), and two rule-based methods, RAPIER and GRID that extract information from free text. The paper also discusses some personal experience for address standardization
Keywords :
data mining; data warehouses; hidden Markov models; information retrieval; HMM; data cleansing; data warehouse; hidden Markov model; information extraction technique; postal address standardization; rule-based method; statistical model; Business; Cities and towns; Communication industry; Data mining; Data warehouses; Hidden Markov models; Humans; Standardization; Tagging; Telecommunications;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
9th International Multitopic Conference, IEEE INMIC 2005
Conference_Location :
Karachi
Print_ISBN :
0-7803-9429-1
Electronic_ISBN :
0-7803-9430-5
Type :
conf
DOI :
10.1109/INMIC.2005.334455
Filename :
4133470
Link To Document :
بازگشت