• Title of article

    A Similarity-based Probability Model for Latent Semantic Indexing

  • Author/Authors

    Ding، Chris H.Q. نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 1999
  • Pages
    -57
  • From page
    58
  • To page
    0
  • Abstract
    A dual probability model is constructed for the Latent Semantic Indexing (LSI) using the cosine similarity measure. Both the document-document similarity matrix and the term-term similarity matrix naturally arise from the maximum likelihood estimation of the model parameters, and the optimal solutions are the latent semantic vectors of of LSI. Dimensionality reduction is justified by the statistical significance of latent semantic vectors as measured by the likelihood of the model. This leads to a statistical criterion for the optimal semantic diAmensions, answering a critical open question in LSI with practical importance. Thus the model establishes a statistical framework for LSI. Ambiguities related to statistical modeling of LSI are clarified.
  • Keywords
    Digital library , archival documents
  • Journal title
    SIGIR FORUM
  • Serial Year
    1999
  • Journal title
    SIGIR FORUM
  • Record number

    16794