DocumentCode
175136
Title
Construction of Scholarly n-Gram from Huge Text Data
Author
Myunggwon Hwang ; Mi-Nyeong Hwang ; Ha-Neul Yeom ; Hanmin Jung
Author_Institution
Korea Inst. of Sci. & Technol. Inf. (KISTI), Daejeon, South Korea
fYear
2014
fDate
2-4 July 2014
Firstpage
31
Lastpage
35
Abstract
The ultimate goal of this research is to provide n-gram data that is specialized for scholarly utilization. To this end, this paper outlines the construction of a scholarly n-gram through the processing of large text documents. Many researchers, especially non-native English language speakers, find it difficult to construct sentences and paragraphs with appropriate and disambiguated words. One of the methods that can assist them is the provision of n-gram data. A representative n-gram known as Web 1T 5-Gram Version 1, which was constructed by processing virtually all documents retrieved using Google, already exists. However, this data contain unfocused word recommendations, therefore, they are not suitable. Consequently, we are constructing a scholarly n-gram. In this paper, we demonstrate the efficiency of n-gram using Web 1T unigram and introduce and discuss the specifics of our research plan related to scholarly n-gram.
Keywords
Internet; information retrieval; natural language processing; recommender systems; text analysis; English language speakers; Google; document retrieval; n-gram data; text document processing; word disambiguation; word recommendations; Context; Google; Reliability; Semantic Web; Semantics; Text categorization; Time-frequency analysis; context n-gram; personalized n-gram; scholarly n-gram; time-dependent n-gram;
fLanguage
English
Publisher
ieee
Conference_Titel
Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2014 Eighth International Conference on
Conference_Location
Birmingham
Print_ISBN
978-1-4799-4333-3
Type
conf
DOI
10.1109/IMIS.2014.4
Filename
6975437
Link To Document