• DocumentCode
    2294988
  • Title

    Automated Data Augmentation Services Using Text Mining, Data Cleansing and Web Crawling Techniques

  • Author

    Jacob, Matthias ; Kuscher, Alexander ; Plauth, Max ; Thiele, Christoph

  • Author_Institution
    Hasso Plattner Inst. of Software Syst. Eng., Potsdam
  • fYear
    2008
  • fDate
    6-11 July 2008
  • Firstpage
    136
  • Lastpage
    143
  • Abstract
    There is a large amount of information about celebrities spread all over the Web hidden inside innumerable news and blogs, pictures on Flickr or videos on YouTube. Having this information combined and aggregated would be of great benefit to many customers. In this document we will describe the architecture and the (business) value of a system that not only collates information pre-formatted by other Web services but also provides a self-developed named entity recognition algorithm for extracting the names of celebrities from different data sources and then processes and enriches them by our mash-up application.
  • Keywords
    Web services; data mining; software architecture; Web crawling techniques; Web services; automated data augmentation services; blogs; data cleansing; text mining; Blogs; Data mining; Feeds; Internet; Publishing; TV; Text mining; Videos; Web services; YouTube; Named Entity Recognition; REST; celebrity; data cleansing; mash-up; text mining; vipster; web 2.0; web crawling techniques; web service;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Services - Part I, 2008. IEEE Congress on
  • Conference_Location
    Honolulu, HI
  • Print_ISBN
    978-0-7695-3286-8
  • Type

    conf

  • DOI
    10.1109/SERVICES-1.2008.67
  • Filename
    4578316