DocumentCode :
3760919
Title :
Comparing the Access to and Legibility of Japanese Language Texts in Massive Digital Libraries
Author :
Andrew Weiss;Ryan James
Author_Institution :
Oviatt Libr. California State Univ., Los Angeles, CA, USA
fYear :
2015
Firstpage :
57
Lastpage :
63
Abstract :
A random sample of 800 Japanese-language books with publication dates prior to 1943 was extracted from the OCLC World Cat database and 409 were examined. The book titles were queried in both Google Books and HathiTrust. The texts were then examined for their level of typical user access, their accuracy in metadata, and their scan quality. Despite their likely public domain status within Japan and in the United States, 0.2% (N=1) of the sampled texts were visible in Google Books as full texts. While 12.5% (N=50) of the sample were visible in HathiTrust. Within the full view texts, errors in scanning and metadata were identified, including problems with legibility (blurred characters) in 68% of visible texts, distorted content (slanted and upside-down pages) in 90%, motion or blur of turning pages captured by digital cameras in 48%, extra-textual objects (3-D items not part of text, i.e. Fingers, hands, book holders, etc.) in 94%, and use of heavily-defaced, dirty or fragile source material in 28%. The most common metadata errors were missing bibliographic information, especially missing page numbers (in 18% of texts) and incomplete tables of contents (in 22%), and problems associated with poor OCR, especially unusable keywords and common phrases (in 50% of texts) that appear to be random words, articles, and unpronounceable symbols.
Keywords :
"Google","Metadata","Libraries","Thumb","Image color analysis","Organizations"
Publisher :
ieee
Conference_Titel :
Culture and Computing (Culture Computing), 2015 International Conference on
Type :
conf
DOI :
10.1109/Culture.and.Computing.2015.51
Filename :
7433212
Link To Document :
بازگشت