• DocumentCode
    690532
  • Title

    Similarities and Dissimilarities between Character Frequencies of Written Text of Melayu, English, and Indonesian Languages

  • Author

    Shah, Aamer ; Saidin, Aznan Zuhid ; Taha, Imad Fakhri ; Zeki, Akram M. ; Bhatti, Zeeshan

  • Author_Institution
    Dept. of Comput. Sci., Int. Islamic Univ. Malaysia, Kuala Lumpur, Malaysia
  • fYear
    2013
  • fDate
    23-24 Dec. 2013
  • Firstpage
    192
  • Lastpage
    194
  • Abstract
    This research paper present some statistical similarities and dissimilarities between the character frequencies of three languages, Malay, Indonesia and English. The reason for their comparison is that they all share a common symbol set (A-Z). It has been found, through investigations that statistically Malay and Indonesian language character frequencies are very close to each other. For example, character "A" "N" and "E" in both Malay and Indonesian languages have frequencies (19%, 20.4%), (10%, 9.33%) and (9%, 8.28%), respectively. However, the case of English is different, where characters "E", "T" and "A" come with three highest frequency occurring letters, respectively. An interesting observation is that in spite of some similarities and dissimilarities between the characters, all three language follow envelop of the frequencies identically rising and falling trend for all characters. Moreover, for all three languages, last four characters, "W, X, Y, Z", also exhibit lowest usage in their respective languages.
  • Keywords
    natural language processing; statistical analysis; text analysis; English language; Indonesian language; Melayu language; character frequencies; statistical dissimilarities; statistical similarities; written text; Computer science; Educational institutions; Information systems; Internet; Market research; Probability; Time-frequency analysis; Character Frequency; Indonesian; Malayu;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computer Science Applications and Technologies (ACSAT), 2013 International Conference on
  • Conference_Location
    Kuching
  • Type

    conf

  • DOI
    10.1109/ACSAT.2013.45
  • Filename
    6836574