• DocumentCode
    1625664
  • Title

    Integrating Unstructured Data into Relational Databases

  • Author

    Mansuri, Imran R. ; Sarawagi, Sunita

  • Author_Institution
    IIT Bombay
  • fYear
    2006
  • Firstpage
    29
  • Lastpage
    29
  • Abstract
    In this paper we present a system for automatically integrating unstructured text into a multi-relational database using state-of-the-art statistical models for structure extraction and matching. We show how to extend current highperforming models, Conditional Random Fields and their semi-markov counterparts, to effectively exploit a variety of recognition clues available in a database of entities, thereby significantly reducing the dependence on manually labeled training data. Our system is designed to load unstructured records into columns spread across multiple tables in the database while resolving the relationship of the extracted text with existing column values, and preserving the cardinality and link constraints of the database. We show how to combine the inference algorithms of statistical models with the database imposed constraints for optimal data integration.
  • Keywords
    Bridges; Data mining; Database systems; Inference algorithms; Machine learning; Portals; Relational databases; Resumes; Training data; Web mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2006. ICDE '06. Proceedings of the 22nd International Conference on
  • Print_ISBN
    0-7695-2570-9
  • Type

    conf

  • DOI
    10.1109/ICDE.2006.83
  • Filename
    1617397