DocumentCode :
1695443
Title :
Measuring semantic similarity by contextualword connections in Chinese news story segmentation
Author :
Xuecheng Nie ; Wei Feng ; Liang Wan ; Lei Xie
Author_Institution :
Sch. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
fYear :
2013
Firstpage :
8312
Lastpage :
8316
Abstract :
A lot of recent work in story segmentation focuses on developing better partitioning criteria to segment news transcripts into sequences of topically coherent stories, while simply relying on the repetition based hard word-level similarities and ignoring the semantic correlations between different words. In this paper, we propose a purely data-driven approach to measuring soft semantic word- and sentence-level similarity from a given corpus, without the guidance of linguistic knowledge, ground-truth topic labeling or story boundaries. We show that contextual word connections can help to produce semantically meaningful similarity measurement between any pair of Chinese words. Based on this, we further use a parallel all-pair SimRank algorithm to propagate such contextual similarities throughout the whole vocabulary. The resultant word semantic similarity matrix is then used to refine the classical cosine similarity measurement of sentences. Experiments on benchmark Chinese news corpora show that, story segmentation using the proposed soft semantic similarity measurement can always produce better segmentation accuracy than using the hard similarity. Specifically, we can achieve 3%-10% average F1-measure improvement to state-of-the-art NCuts based story segmentation.
Keywords :
linguistics; natural language processing; Chinese news corpora; Chinese news story segmentation; contextual word connections; cosine similarity measurement; ground-truth topic labeling; hard word-level similarities; linguistic knowledge; parallel all-pair SimRank algorithm; resultant word semantic similarity matrix; segment news transcripts; semantic correlations; soft semantic sentence-level similarity; soft semantic similarity measurement; soft semantic word-level similarity; story boundaries; Accuracy; Benchmark testing; Correlation; Educational institutions; Measurement; Semantics; Vocabulary; Semantic similarity; contextual word connections; similarity propagation; story segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639286
Filename :
6639286
Link To Document :
بازگشت