Context representation using word sequences extracted from a news corpus

Author

Sekiya, Hiroshi ; Kondo, Takeshi ; Hashimoto, Makoto ; Takagi, Tomohiro

Author_Institution

Dept. of Comput. Sci., Meiji Univ., Kanagawa, Japan

fYear

2005

fDate

26-28 June 2005

Firstpage

783

Lastpage

786

Abstract

Word meaning changes dynamically depending on context. We need to specify the context to identify this meaning. However, context varies depending on specificity of the topic and the viewpoint of the writer. In this paper, we propose that a word sequence can be used to identify context. Both contexts identified by word sequences and word sets related to the contexts are shown concretely. We used 800,000 Reuters news articles, and extracted the word sets using the confabulation model and five statistical measures as relations. We compared the measures and found that cogency and mutual information were the most effective. We demonstrate the usefulness of the word sequence to identify the context.

Keywords

statistical analysis; text analysis; confabulation model; context representation; news article; statistical measure; word meaning; word sequence; Data mining; Humans; Information processing; Mutual information; Natural language processing; Natural languages;

fLanguage

English

Publisher

ieee

Conference_Titel

Fuzzy Information Processing Society, 2005. NAFIPS 2005. Annual Meeting of the North American

Print_ISBN

0-7803-9187-X

Type

conf

DOI

10.1109/NAFIPS.2005.1548639

Filename

1548639

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2643828