Title of article
Probabilistic Latent Semantic Indexing
Author/Authors
Hofmann، Thomas نويسنده ,
Issue Information
روزنامه با شماره پیاپی سال 1999
Pages
-4
From page
5
To page
0
Abstract
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is a.ble to deal with domain-specific synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing (LSI) by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and defines a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching methods as well as over LSI. In particular, the combination of models with different dimensionalities has proven to be adva.ntageous.
Keywords
Digital library , archival documents
Journal title
SIGIR FORUM
Serial Year
1999
Journal title
SIGIR FORUM
Record number
16793
Link To Document