• DocumentCode
    2509367
  • Title

    An auxiliary unicode Han character lookup service based on glyph shape similarity

  • Author

    Lin, Jeng-Wei ; Lin, Feng-Sheng

  • Author_Institution
    Dept. of Inf. Manage., Tunghai Univ., Taichung, Taiwan
  • fYear
    2011
  • fDate
    12-14 Oct. 2011
  • Firstpage
    489
  • Lastpage
    492
  • Abstract
    Most legacy computer systems only well support input and display of 20,902 Han characters (Hanzis for short) encoded in Unicode 1.0. In 2010, Unicode 6.0 has encoded 75,616 Hanzis. However, it is not easy to use these newly encoded Hanzis, even in the latest computers. Most of these newly encoded Hanzis are rarely used in daily lives. Some are only used in ancient literature or individual Sinospherical countries. Users may have confusion of their glyph shapes, pronunciations, meanings, and usages. Most Chinese IMEs (input method editors) require users to have good knowledge of Hanzis. As a result, users cannot input these Hanzis. We present an auxiliary Unicode Hanzi lookup service based on glyph shape similarity. One can key in a similar Hanzi by any IME to look up the wanted Hanzi. Each Unicode Hanzi is decomposed as a glyph expression. The similarity of glyph shapes of two Hanzis is calculated based on a derived edit distance on their glyph expressions. As a result, the system provides users a convenient way to look up unfamiliar Hanzis.
  • Keywords
    computers; encoding; Chinese input method editors; Han characters; Hanzis; Sinospherical countries; Unicode 1.0; Unicode 6.0; ancient literature; auxiliary Unicode Han character lookup service; computer systems; glyph shape; Computers; Databases; Encoding; Information processing; Optical character recognition software; Shape; Han character lookup; Hanzi; Unicode; edit distance; glyph expression;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communications and Information Technologies (ISCIT), 2011 11th International Symposium on
  • Conference_Location
    Hangzhou
  • Print_ISBN
    978-1-4577-1294-4
  • Type

    conf

  • DOI
    10.1109/ISCIT.2011.6092155
  • Filename
    6092155