Title :
Intra-content term weighting for topic segmentation
Author :
Bouchekif, Abdessalam ; Damnati, Geraldine ; Charlet, D.
Author_Institution :
Orange Labs., Multimedia Contents Anal. Technol., Lannion, France
Abstract :
Term weighting is an important task in many applications, such as information retrieval, extraction of significant words or automatic summarization. It translates the capacity of a term to discriminate a document within a collection, or a part of a document within a whole document. This paper deals with term weighting strategies in the context of lexical cohesion based topic segmentation. The aim is to propose a term weighting method which does not require any external information data. Weights are estimated from the content itself which is considered as a collection of mono-thematic documents. Two approaches are proposed and significant improvements are observed on a rich corpus covering various formats of Broadcast News shows from 8 French TV channels.
Keywords :
document handling; information resources; Broadcast News show; French TV channels; document discrimination; intra-content term weighting; lexical cohesion based topic segmentation; mono-thematic documents; Acoustics; Computational modeling; Iterative methods; Partitioning algorithms; Speech; TV broadcasting; Visualization; Okapi; TF-IDF; Topic segmentation; lexical cohesion; term weighting;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854980