Title :
Forensic handwritten document retrieval system
Author :
Srihari, Sargur N. ; Shi, Zhixin
Author_Institution :
Center of Excellence for Document Anal. & Recognition, State Univ. of New York, Buffalo, NY, USA
Abstract :
Document storage and retrieval capabilities of the CEDAR-FOX forensic handwritten document examination system are described. The system is designed for automated and semiautomated analysis of scanned handwritten documents. For library creation, the system provides functionalities for (i) entering document metadata, e.g., identification number, writer and other collateral information, (ii) creating a textual transcript of the image content at the word level, and (iii) including automatically extracted document level features, e.g., stroke width, slant, word gaps, as well as finer features that capture the structural characteristics of characters and words. For extracting these features the system performs page analysis, page segmentation, line separation, word segmentation and finally recognition of characters and words. The extracted features are used for writer identification by matching against a library built as a database. The system design is driven by questioned document examination with its emphasis on writer identification. Several query modalities are permitted for retrieval: (i) document level: the entire document image is the query; (ii) partial image: a region of interest (ROI) of a document; (ii) word image: which is also called word spotting; (iv) text keyword: the user can type in keywords ranging from the words in the documents, case number, person names, time and the preregistered keywords such as brief descriptions of the case. The system has been implemented using Microsoft visual C++ and tested using MySQL database system from MySQL ABTM. It provides as a graphical user interface for forensic document identification, verification and analysis.
Keywords :
digital libraries; document image processing; feature extraction; graphical user interfaces; handwriting recognition; handwritten character recognition; image retrieval; meta data; text analysis; visual databases; word processing; CEDAR-FOX forensic handwritten document examination system; Microsoft Visual C++; MySQL ABTM; MySQL database system; character recognition; document image retrieval system; document level feature extraction; document metadata; document storage; forensic document identification; graphical user interface; library creation; line separation; page analysis; page segmentation; region of interest; scanned handwritten documents; text keyword; textual transcript; word image; word segmentation; word spotting; Character recognition; Data mining; Feature extraction; Forensics; Image databases; Image segmentation; Libraries; Performance analysis; Spatial databases; System analysis and design;
Conference_Titel :
Document Image Analysis for Libraries, 2004. Proceedings. First International Workshop on
Print_ISBN :
0-7695-2088-X
DOI :
10.1109/DIAL.2004.1263248