Construction of Scholarly n-Gram from Huge Text Data

Author

Myunggwon Hwang ; Mi-Nyeong Hwang ; Ha-Neul Yeom ; Hanmin Jung

Author_Institution

Korea Inst. of Sci. & Technol. Inf. (KISTI), Daejeon, South Korea

fYear

2014

fDate

2-4 July 2014

Firstpage

31

Lastpage

35

Abstract

The ultimate goal of this research is to provide n-gram data that is specialized for scholarly utilization. To this end, this paper outlines the construction of a scholarly n-gram through the processing of large text documents. Many researchers, especially non-native English language speakers, find it difficult to construct sentences and paragraphs with appropriate and disambiguated words. One of the methods that can assist them is the provision of n-gram data. A representative n-gram known as Web 1T 5-Gram Version 1, which was constructed by processing virtually all documents retrieved using Google, already exists. However, this data contain unfocused word recommendations, therefore, they are not suitable. Consequently, we are constructing a scholarly n-gram. In this paper, we demonstrate the efficiency of n-gram using Web 1T unigram and introduce and discuss the specifics of our research plan related to scholarly n-gram.

Keywords

Internet; information retrieval; natural language processing; recommender systems; text analysis; English language speakers; Google; document retrieval; n-gram data; text document processing; word disambiguation; word recommendations; Context; Google; Reliability; Semantic Web; Semantics; Text categorization; Time-frequency analysis; context n-gram; personalized n-gram; scholarly n-gram; time-dependent n-gram;

fLanguage

English

Publisher

ieee

Conference_Titel

Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2014 Eighth International Conference on

Conference_Location

Birmingham

Print_ISBN

978-1-4799-4333-3

Type

conf

DOI

10.1109/IMIS.2014.4

Filename

6975437