مرکز منطقه ای اطلاع رساني علوم و فناوري - Creation and Analysis of a Corpus of Text Rich Indian TV Videos

DocumentCode :

2146546

Title :

Creation and Analysis of a Corpus of Text Rich Indian TV Videos

Author :

Chattopadhyay, T. ; Sengupta, Soumik ; Sinha, Aniruddha ; Rampuria, Nisha

Author_Institution :

Innovation Lab., Tata Consultancy Services, Kolkata, India

fYear :

2011

fDate :

18-21 Sept. 2011

Firstpage :

849

Lastpage :

853

Abstract :

A lot of research is now going on to extract the context of the show to provide additional information related to the TV show. One major method to extract the context from TV is to recognize the texts from the videos which is also known as video Optical Character Recognition (VOCR). The problem of VOCR from the TV shows of a multiligual country like India is more difficult. In India still more than 90% TV viewers are using RF Cable as input to TV and nearly 90% channels have multilingual texts in the TV shows. Thus the video quality is poor in compare to the modern digital TV signals as well as different text scripts are present in a single video frame. These made the problem of Indian TV context recognition more challenging. So this paper is concerned about the construction of a video corpus of text rich Indian TV shows. The proposed database contains more than 100 videos each of nearly 10 min duration containing text in the video frame. A statistical analysis of the corpus is also presented in the paper which can be used to identify the genre of TV show. The analysis also revealed that distribution of numerals, special characters, uppercase and lower case character can be used to classify a news video frame. This corpus is useful for a wide variety of research problems namely, (i) localization of the text regions from a video frame, (ii) recognition of texts from a video frame, (iii) extraction of context from video, and (iv) performance evaluation of a video OCR system.

Keywords :

optical character recognition; statistical analysis; television; text analysis; ubiquitous computing; video signal processing; Indian TV context recognition; RF cable; VOCR; context extraction; corpus analysis; corpus creation; lower case character; modern digital TV signal; multiligual country; multilingual text region localization; statistical analysis; text recognition; text rich Indian TV shows; text rich Indian TV video; text script; video OCR system; video corpus construction; video frame; video optical character recognition; Context; Motion pictures; Optical character recognition software; Statistical analysis; TV; Text recognition; Videos; Corpus; Indian TV Video Analysis; Indian TV video; Video OCR;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition (ICDAR), 2011 International Conference on

Conference_Location :

Beijing

ISSN :

1520-5363

Print_ISBN :

978-1-4577-1350-7

Electronic_ISBN :

1520-5363

Type :

conf

DOI :

10.1109/ICDAR.2011.174

Filename :

6065431

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2146546