DocumentCode
1933484
Title
APyCA: Towards the automatic subtitling of television content in Spanish
Author
Alvarez, Aitor ; del Pozo, Arantza ; Arruti, Andonin
Author_Institution
Vicomtech Res. Centre, Donostia-San Sebastian, Spain
fYear
2010
fDate
18-20 Oct. 2010
Firstpage
567
Lastpage
574
Abstract
Automatic subtitling of television content has become an approachable challenge due to the advancement of the technology involved. In addition, it has also become a priority need for many Spanish TV broadcasters, who will have to broadcast up to 90% of subtitled content by 2013 to comply with recently approved national audiovisual policies. APyCA, the prototype system described in this paper, has been developed in an attempt to automate the process of subtitling television content in Spanish through the application of state-of-the-art speech and language technologies. Voice activity detection, automatic speech recognition and alignment, discourse segment detection and speaker diarization have proved to be useful to generate time-coded colour-assigned draft transcriptions for post-editing. The productive benefit of the followed approach heavily depends on the performance of the speech recognition module, which achieves reasonable results on clean read speech but degrades as this becomes more noisy and/or spontaneous.
Keywords
natural language processing; speaker recognition; television broadcasting; text editing; video signal processing; APyCA; Spanish TV broadcasters; automatic speech recognition; automatic subtitling; discourse segment detection; language technology; national audiovisual policy; post editing; speaker diarization; television content; time coded colour assigned draft transcription; voice activity detection; Computer science; Information technology; Iron; TV;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Information Technology (IMCSIT), Proceedings of the 2010 International Multiconference on
Conference_Location
Wisla
ISSN
2157-5525
Print_ISBN
978-1-4244-6432-6
Type
conf
DOI
10.1109/IMCSIT.2010.5680055
Filename
5680055
Link To Document