Automatic caption generation for video data. Time alignment between caption and acoustic signal

Author

Watanabe, K. ; Sugiyama, M.

Author_Institution

Graduate Sch. of Comput. Sci. & Eng., Aizu Univ., Fukushima, Japan

fYear

1999

fDate

1999

Firstpage

Lastpage

Abstract

This paper discusses automatic caption generation, and specifically focuses on correspondence between Japanese text and its speech data. This paper proposes the time alignment module implemented using DP matching and evaluates its performance. Optimizing weight and DP path, the caption display time gap between correct and estimated is less than 39.0 ms in the phoneme boundary. Effects of other speaker´s phoneme templates and text phrase deletion are evaluated

Keywords

handicapped aids; speech recognition; video signal processing; DP matching; Japanese text; acoustic signal; automatic caption generation; performance evaluation; phoneme templates; speech data; text phrase deletion; time alignment; video data; Acoustical engineering; Auditory system; Computer science; Costs; Data engineering; Displays; Signal generators; Speech; TV broadcasting; Timing;

fLanguage

English

Publisher

ieee

Conference_Titel

Multimedia Signal Processing, 1999 IEEE 3rd Workshop on

Conference_Location

Copenhagen

Print_ISBN

0-7803-5610-1

Type

conf

DOI

10.1109/MMSP.1999.793799

Filename

793799

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3167906