Title : 
Information Extraction Techniques for Postal Address Standardization
         
        
            Author : 
Abbasi, RabeehAyaz
         
        
            Author_Institution : 
Fac. of Comput., Riphah Int. Univ., Islamabad
         
        
        
        
        
        
            Abstract : 
The unique frames of reference of humans result in various definitions of the same details. They develop addresses of same places in different ways, which might result in inconsistent format of addresses ultimately leading to misapprehensions. A major motivation for standardization of the addresses is cleansing of addresses in data warehouses. Since almost every organization deals with a variety of addresses of its customers and employees therefore, a consistent format of addresses can ensure better knowledge of the organization about its customers. This paper presents various information extraction techniques which can also be used in address standardization. It focuses on a statistical model, hidden Markov model (HMM), and two rule-based methods, RAPIER and GRID that extract information from free text. The paper also discusses some personal experience for address standardization
         
        
            Keywords : 
data mining; data warehouses; hidden Markov models; information retrieval; HMM; data cleansing; data warehouse; hidden Markov model; information extraction technique; postal address standardization; rule-based method; statistical model; Business; Cities and towns; Communication industry; Data mining; Data warehouses; Hidden Markov models; Humans; Standardization; Tagging; Telecommunications;
         
        
        
        
            Conference_Titel : 
9th International Multitopic Conference, IEEE INMIC 2005
         
        
            Conference_Location : 
Karachi
         
        
            Print_ISBN : 
0-7803-9429-1
         
        
            Electronic_ISBN : 
0-7803-9430-5
         
        
        
            DOI : 
10.1109/INMIC.2005.334455