Title :
The data deluge: Challenges and opportunities of unlimited data in statistical signal processing
Author :
Seltzer, Michael L. ; Zhang, Lei
Author_Institution :
Microsoft Res., Speech Technol. Group, Redmond, WA
Abstract :
Recently, there has been a dramatic increase of the amount of audio, video, and images created and shared on the Internet by users around the world. Much of this content is publicly available and free of cost. When viewed through the lens of pattern classification, this content can be seen as a virtually unlimited supply of training data for various statistical modeling and labeling tasks such as speech recognition and computer vision. In order to effectively exploit this data resource, significant research challenges must be addressed. In this paper, we present three significant challenges that must be solved to harness the potential of this ldquodata delugerdquo. We then describe recent work in spoken language processing and image processing that has begun to address these challenges in order to tackle large-scale classification tasks. By bringing together the work of these two communities, we hope to stimulate the cross-pollination of ideas and methods among different signal processing communities.
Keywords :
computer vision; pattern classification; speech processing; speech recognition; Internet; computer vision; data deluge; image processing; large-scale classification tasks; pattern classification; speech recognition; spoken language processing; statistical labeling; statistical modeling; statistical signal processing; Costs; Internet; Labeling; Lenses; Pattern classification; Signal processing; Speech recognition; Training data; Video sharing; Video signal processing; data deluge; multimedia search; pattern recognition; web-scale data;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2009.4960430