DocumentCode
2294988
Title
Automated Data Augmentation Services Using Text Mining, Data Cleansing and Web Crawling Techniques
Author
Jacob, Matthias ; Kuscher, Alexander ; Plauth, Max ; Thiele, Christoph
Author_Institution
Hasso Plattner Inst. of Software Syst. Eng., Potsdam
fYear
2008
fDate
6-11 July 2008
Firstpage
136
Lastpage
143
Abstract
There is a large amount of information about celebrities spread all over the Web hidden inside innumerable news and blogs, pictures on Flickr or videos on YouTube. Having this information combined and aggregated would be of great benefit to many customers. In this document we will describe the architecture and the (business) value of a system that not only collates information pre-formatted by other Web services but also provides a self-developed named entity recognition algorithm for extracting the names of celebrities from different data sources and then processes and enriches them by our mash-up application.
Keywords
Web services; data mining; software architecture; Web crawling techniques; Web services; automated data augmentation services; blogs; data cleansing; text mining; Blogs; Data mining; Feeds; Internet; Publishing; TV; Text mining; Videos; Web services; YouTube; Named Entity Recognition; REST; celebrity; data cleansing; mash-up; text mining; vipster; web 2.0; web crawling techniques; web service;
fLanguage
English
Publisher
ieee
Conference_Titel
Services - Part I, 2008. IEEE Congress on
Conference_Location
Honolulu, HI
Print_ISBN
978-0-7695-3286-8
Type
conf
DOI
10.1109/SERVICES-1.2008.67
Filename
4578316
Link To Document