DocumentCode :
2973175
Title :
Comparing automatic rich transcription for Portuguese, Spanish and English Broadcast News
Author :
Batista, Fernando ; Trancoso, Isabel ; Mamede, Nuno J.
Author_Institution :
Spoken Language Syst. Lab., INESC ID Lisboa, Lisbon, Portugal
fYear :
2009
fDate :
Nov. 13 2009-Dec. 17 2009
Firstpage :
540
Lastpage :
545
Abstract :
This paper describes and evaluates a language independent approach for automatically enriching the speech recognition output with punctuation marks and capitalization information. The two tasks are treated as two classification problems, using a maximum entropy modeling approach, which achieves results within state-of-the-art. The language independence of the approach is attested with experiments conducted on Portuguese, Spanish and English broadcast news corpora. This paper provides the first comparative study between the three languages, concerning these tasks.
Keywords :
broadcasting; linguistics; maximum entropy methods; natural language processing; speech recognition; English broadcast news; English language; Portuguese broadcast news; Portuguese language; Spanish broadcast news; Spanish language; automatic rich transcription; capitalization information; language independence; maximum entropy modeling; punctuation mark; speech recognition; Automatic speech recognition; Broadcasting; Ear; Entropy; Hidden Markov models; Model driven engineering; NIST; Natural languages; Speech analysis; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location :
Merano
Print_ISBN :
978-1-4244-5478-5
Electronic_ISBN :
978-1-4244-5479-2
Type :
conf
DOI :
10.1109/ASRU.2009.5373371
Filename :
5373371
Link To Document :
بازگشت