Title of article :
Content-Based Retrieval of Historical Ottoman Documents Stored as Textual Images
Author/Authors :
E. S¸aykol، نويسنده , , A. K. Sinop، نويسنده , , U. Güdükbay، نويسنده , , ?. Ulusoy، نويسنده , , and A. E. Cetin، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2004
Abstract :
There is an accelerating demand to access the visual
content of documents stored in historical and cultural archives.
Availability of electronic imaging tools and effective image processing
techniques makes it feasible to process the multimedia data
in large databases. In this paper, a framework for content-based retrieval
of historical documents in the Ottoman Empire archives is
presented. The documents are stored as textual images, which are
compressed by constructing a library of symbols occurring in a
document, and the symbols in the original image are then replaced
with pointers into the codebook to obtain a compressed representation
of the image. The features in wavelet and spatial domain
based on angular and distance span of shapes are used to extract
the symbols. In order to make content-based retrieval in historical
archives, a query is specified as a rectangular region in an input
image and the same symbol-extraction process is applied to the
query region. The queries are processed on the codebook of documents
and the query images are identified in the resulting documents
using the pointers in textual images. The querying process
does not require decompression of images. The new content-based
retrieval framework is also applicable to many other document
archives using different scripts.
Keywords :
Angular and distance span , binary waveletdecomposition , Content-based retrieval , historical documentcompression , partial symbol-wise matching.
Journal title :
IEEE TRANSACTIONS ON IMAGE PROCESSING
Journal title :
IEEE TRANSACTIONS ON IMAGE PROCESSING