DocumentCode
1791623
Title
Toward personalized and scalable voice-enabled services powered by big data
Author
Jong Hoon Ahnn
Author_Institution
Cloud Res. Lab., Samsung Res. America - Silicon Valley, San Jose, CA, USA
fYear
2014
fDate
27-30 Oct. 2014
Firstpage
748
Lastpage
753
Abstract
Recently advances in ASR and big data technologies drive more personalized services in many areas of services. A speaker adaptation is one good example which requires huge computation cost in creating a personalized acoustic model and corresponding language model over hundreds millions of Samsung product users. We propose a personalized and scalable ASR system powered by the big data infrastructure which brings data-driven personalized opportunities to voice-enabled services such as voice-to-text transcriber, voice-enabled web search in a peta bytes scale. We verify the feasibility of speaker adaptation based on 107 testers´ recordings and obtain about 10% of recognition accuracy. We study an optimal set of execution environments by executing jobs running either on Hadoop 1 (59 machines) or Hadoop 2 (15 machines) cluster, and move forward performance optimization strategies: workflow compaction, file compression, best file system selection among several distributed file systems. We devise a metric for the cost of personalized model creation to compare the efficiency of one cluster with the other cluster, and it provides the estimated total execution time for the given number of machines. We finally introduce our in-house object storage and data storage design, and their high performance compared to state-of-the art systems, optimized for voice-enabled services to effectively support small and large files (1KB-100KB for speech files, 10MB for a language model, 30MB for an acoustic model).
Keywords
Big Data; data handling; distributed databases; parallel processing; speaker recognition; ASR system; Hadoop; Samsung product; big data infrastructure; data storage; distributed file systems; file compression; file system selection; peta byte scale; scalable voice-enabled services; speaker adaptation; voice-enabled web search; voice-to-text transcriber; Acoustics; Adaptation models; Computational modeling; Libraries; Measurement; Speech; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location
Washington, DC
Type
conf
DOI
10.1109/BigData.2014.7004300
Filename
7004300
Link To Document