Automated Data Augmentation Services Using Text Mining, Data Cleansing and Web Crawling Techniques

Author

Jacob, Matthias ; Kuscher, Alexander ; Plauth, Max ; Thiele, Christoph

Author_Institution

Hasso Plattner Inst. of Software Syst. Eng., Potsdam

fYear

2008

fDate

6-11 July 2008

Firstpage

136

Lastpage

143

Abstract

There is a large amount of information about celebrities spread all over the Web hidden inside innumerable news and blogs, pictures on Flickr or videos on YouTube. Having this information combined and aggregated would be of great benefit to many customers. In this document we will describe the architecture and the (business) value of a system that not only collates information pre-formatted by other Web services but also provides a self-developed named entity recognition algorithm for extracting the names of celebrities from different data sources and then processes and enriches them by our mash-up application.

Keywords

Web services; data mining; software architecture; Web crawling techniques; Web services; automated data augmentation services; blogs; data cleansing; text mining; Blogs; Data mining; Feeds; Internet; Publishing; TV; Text mining; Videos; Web services; YouTube; Named Entity Recognition; REST; celebrity; data cleansing; mash-up; text mining; vipster; web 2.0; web crawling techniques; web service;

fLanguage

English

Publisher

ieee

Conference_Titel

Services - Part I, 2008. IEEE Congress on

Conference_Location

Honolulu, HI

Print_ISBN

978-0-7695-3286-8

Type

conf

DOI

10.1109/SERVICES-1.2008.67

Filename

4578316