DocumentCode :
3197598
Title :
A Neural Principal Component Analysis for text based documents keywords extraction
Author :
Heni, Saber ; Ejbali, Ridha ; Zaied, Mourad ; Ben Amar, Chokri
Author_Institution :
Higher Inst. of Comput. & Multimedia of Gabes, Zrig - Gabes, Tunisia
fYear :
2011
fDate :
18-20 Dec. 2011
Firstpage :
112
Lastpage :
115
Abstract :
Information retrieval system users, such those operational on the web, usually use text modality to look not only for textual information but also for multimedia content. In order to satisfy the users requirement, information retrieval systems should have prepared a short representation of the content of each document composing the corpus, called index. This index doesn´t, so often, reflect the intended meaning of the document they represent. In this paper, we propose an approach based on a Neural Principal Component Analysis that express the maximum variance of data and extract the principal component from it, by calculating the correlation between words of each document, to determine the keywords that give out the fields of intrest of each document content.
Keywords :
data structures; feature extraction; indexing; information retrieval systems; multimedia systems; neural nets; principal component analysis; text analysis; content representation; document content; document index; information retrieval system; maximum data variance; multimedia content; neural principal component analysis; text based document keyword extraction; text modality; textual information; user requirement; Covariance matrix; Eigenvalues and eigenfunctions; Indexing; Principal component analysis; Speech; Vectors; Information retrieval; Normalized Hebbian Algorithm; Principal Component Analysis; data analysis; keywords extraction; neural networks; text based indexing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Next Generation Networks and Services (NGNS), 2011 3rd International Conference on
Conference_Location :
Hammamet
Print_ISBN :
978-1-4673-0138-1
Type :
conf
DOI :
10.1109/NGNS.2011.6142550
Filename :
6142550
Link To Document :
بازگشت