مرکز منطقه ای اطلاع رساني علوم و فناوري - Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images

DocumentCode :

1763737

Title :

Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images

Author :

Trung Quy Phan ; Shivakumara, Palaiahnakote ; Bhowmick, Sourav S. ; Shimiao Li ; Chew Lim Tan ; Pal, Umapada

Author_Institution :

Dept. of Comput. Sci., Nat. Univ. of Singapore, Singapore, Singapore

Volume :

Issue :

fYear :

2014

fDate :

Aug. 2014

Firstpage :

1277

Lastpage :

1287

Abstract :

Although a large number of methods for video text detection and recognition have been proposed over the past years, it is hard to find the best state-of-the-art method because of nonavailability of standard datasets, ground truth, and common evaluation measures. Therefore, in this paper, we propose a semiautomatic system for ground truth generation for video text detection and recognition, which includes English and Chinese text of different orientation. The system has a facility to allow the user to manually correct the ground truth if the automatic method produces incorrect results. We propose eleven attributes at the word level, namely: line index, word index, coordinate values of bounding box, area, content, script type, orientation information, type of text (caption/scene), condition of text (distortion/distortion free), start frame, and end frame to evaluate the performance of the method. We also introduce a new dataset that consists of 466 video frames collected from TRECVID 2005 and 2006 databases. The video frames in our dataset contain both horizontal texts (278 frames: 181 with English texts and 97 with Chinese texts) and nonhorizontal texts (188 frames: 140 English and 48 Chinese). Furthermore, the performance of the proposed system is compared with existing text detection methods by calculating measures manually and automatically to show usefulness of our semiautomatic system. The ground truth and the semiautomatic system will be released to the public.

Keywords :

natural language processing; optical character recognition; video databases; video signal processing; Chinese video text detection; TRECVID 2005 database; TRECVID 2006 database; bounding box; common evaluation measures; line index; orientation information; semiautomatic ground truth generation; text recognition; video frames; video images; word index; Accuracy; Graphics; Indexes; Optical character recognition software; Standards; Text recognition; Chinese Video text recognition; Chinese video text recognition; Ground truthing; Video text detection; Video text recognition; ground truthing; multioriented video text detection and recognition; video text detection; video text recognition;

fLanguage :

English

Journal_Title :

Circuits and Systems for Video Technology, IEEE Transactions on

Publisher :

ieee

ISSN :

1051-8215

Type :

jour

DOI :

10.1109/TCSVT.2014.2305515

Filename :

6739120

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1763737