• DocumentCode
    2017496
  • Title

    Information Extraction Techniques for Postal Address Standardization

  • Author

    Abbasi, RabeehAyaz

  • Author_Institution
    Fac. of Comput., Riphah Int. Univ., Islamabad
  • fYear
    2005
  • fDate
    24-25 Dec. 2005
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    The unique frames of reference of humans result in various definitions of the same details. They develop addresses of same places in different ways, which might result in inconsistent format of addresses ultimately leading to misapprehensions. A major motivation for standardization of the addresses is cleansing of addresses in data warehouses. Since almost every organization deals with a variety of addresses of its customers and employees therefore, a consistent format of addresses can ensure better knowledge of the organization about its customers. This paper presents various information extraction techniques which can also be used in address standardization. It focuses on a statistical model, hidden Markov model (HMM), and two rule-based methods, RAPIER and GRID that extract information from free text. The paper also discusses some personal experience for address standardization
  • Keywords
    data mining; data warehouses; hidden Markov models; information retrieval; HMM; data cleansing; data warehouse; hidden Markov model; information extraction technique; postal address standardization; rule-based method; statistical model; Business; Cities and towns; Communication industry; Data mining; Data warehouses; Hidden Markov models; Humans; Standardization; Tagging; Telecommunications;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    9th International Multitopic Conference, IEEE INMIC 2005
  • Conference_Location
    Karachi
  • Print_ISBN
    0-7803-9429-1
  • Electronic_ISBN
    0-7803-9430-5
  • Type

    conf

  • DOI
    10.1109/INMIC.2005.334455
  • Filename
    4133470