DocumentCode
107339
Title
TSVD as a Statistical Estimator in the Latent Semantic Analysis Paradigm
Author
Pilato, Giovanni ; Vassallo, Giorgio
Author_Institution
Ist. di Calcolo e Reti ad Alte Prestazioni, Palermo, Italy
Volume
3
Issue
2
fYear
2015
fDate
Jun-15
Firstpage
185
Lastpage
192
Abstract
The aim of this paper is to present a new point of view that makes it possible to give a statistical interpretation of the traditional latent semantic analysis (LSA) paradigm based on the truncated singular value decomposition (TSVD) technique. We show how the TSVD can be interpreted as a statistical estimator derived from the LSA co-occurrence relationship matrix by mapping probability distributions on Riemanian manifolds. Besides, the quality of the estimator model can be expressed by introducing a figure of merit arising from the Solomonoff approach. This figure of merit takes into account both the adherence to the sample data and the simplicity of the model. In our model, the simplicity parameter of the proposed figure of merit depends on the number of the singular values retained after the truncation process, while the TSVD estimator, according to the Hellinger distance, guarantees the minimal distance between the sample probability distribution and the inferred probabilistic model.
Keywords
matrix algebra; singular value decomposition; statistical analysis; Hellinger distance; LSA co-occurrence relationship matrix; LSA paradigm; Riemanian manifold; Solomonoff approach; TSVD technique; figure of merit; inferred probabilistic model; latent semantic analysis paradigm; probability distribution mapping; simplicity parameter; statistical estimator; truncated singular value decomposition; Computational modeling; Data models; Matrix decomposition; Probabilistic logic; Probability distribution; Semantics; Data-driven Modeling; Hellinger Distance; Hellinger distance; LSA; data-driven modeling;
fLanguage
English
Journal_Title
Emerging Topics in Computing, IEEE Transactions on
Publisher
ieee
ISSN
2168-6750
Type
jour
DOI
10.1109/TETC.2014.2385594
Filename
6995958
Link To Document