• DocumentCode
    3760919
  • Title

    Comparing the Access to and Legibility of Japanese Language Texts in Massive Digital Libraries

  • Author

    Andrew Weiss;Ryan James

  • Author_Institution
    Oviatt Libr. California State Univ., Los Angeles, CA, USA
  • fYear
    2015
  • Firstpage
    57
  • Lastpage
    63
  • Abstract
    A random sample of 800 Japanese-language books with publication dates prior to 1943 was extracted from the OCLC World Cat database and 409 were examined. The book titles were queried in both Google Books and HathiTrust. The texts were then examined for their level of typical user access, their accuracy in metadata, and their scan quality. Despite their likely public domain status within Japan and in the United States, 0.2% (N=1) of the sampled texts were visible in Google Books as full texts. While 12.5% (N=50) of the sample were visible in HathiTrust. Within the full view texts, errors in scanning and metadata were identified, including problems with legibility (blurred characters) in 68% of visible texts, distorted content (slanted and upside-down pages) in 90%, motion or blur of turning pages captured by digital cameras in 48%, extra-textual objects (3-D items not part of text, i.e. Fingers, hands, book holders, etc.) in 94%, and use of heavily-defaced, dirty or fragile source material in 28%. The most common metadata errors were missing bibliographic information, especially missing page numbers (in 18% of texts) and incomplete tables of contents (in 22%), and problems associated with poor OCR, especially unusable keywords and common phrases (in 50% of texts) that appear to be random words, articles, and unpronounceable symbols.
  • Keywords
    "Google","Metadata","Libraries","Thumb","Image color analysis","Organizations"
  • Publisher
    ieee
  • Conference_Titel
    Culture and Computing (Culture Computing), 2015 International Conference on
  • Type

    conf

  • DOI
    10.1109/Culture.and.Computing.2015.51
  • Filename
    7433212