• DocumentCode
    175136
  • Title

    Construction of Scholarly n-Gram from Huge Text Data

  • Author

    Myunggwon Hwang ; Mi-Nyeong Hwang ; Ha-Neul Yeom ; Hanmin Jung

  • Author_Institution
    Korea Inst. of Sci. & Technol. Inf. (KISTI), Daejeon, South Korea
  • fYear
    2014
  • fDate
    2-4 July 2014
  • Firstpage
    31
  • Lastpage
    35
  • Abstract
    The ultimate goal of this research is to provide n-gram data that is specialized for scholarly utilization. To this end, this paper outlines the construction of a scholarly n-gram through the processing of large text documents. Many researchers, especially non-native English language speakers, find it difficult to construct sentences and paragraphs with appropriate and disambiguated words. One of the methods that can assist them is the provision of n-gram data. A representative n-gram known as Web 1T 5-Gram Version 1, which was constructed by processing virtually all documents retrieved using Google, already exists. However, this data contain unfocused word recommendations, therefore, they are not suitable. Consequently, we are constructing a scholarly n-gram. In this paper, we demonstrate the efficiency of n-gram using Web 1T unigram and introduce and discuss the specifics of our research plan related to scholarly n-gram.
  • Keywords
    Internet; information retrieval; natural language processing; recommender systems; text analysis; English language speakers; Google; document retrieval; n-gram data; text document processing; word disambiguation; word recommendations; Context; Google; Reliability; Semantic Web; Semantics; Text categorization; Time-frequency analysis; context n-gram; personalized n-gram; scholarly n-gram; time-dependent n-gram;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2014 Eighth International Conference on
  • Conference_Location
    Birmingham
  • Print_ISBN
    978-1-4799-4333-3
  • Type

    conf

  • DOI
    10.1109/IMIS.2014.4
  • Filename
    6975437