• Title of article

    A Corpus-Based Approach to Comparative Evaluation of Statistical Term Association Measures

  • Author/Authors

    Chung، Young Mee نويسنده , , Lee، Jae Yun نويسنده ,

  • Issue Information
    ماهنامه با شماره پیاپی سال 2001
  • Pages
    -282
  • From page
    283
  • To page
    0
  • Abstract
    Statistical association measures have been widely applied in information retrieval research, usually employing a clustering of documents or terms on the basis of their relationships. Applications of the association measures for term clustering include automatic thesaurus construction and query expansion. This research evaluates the similarity of six association measures by comparing the relationship and behavior they demonstrate in various analyses of a test corpus. Analysis techniques include comparisons of highly ranked term pairs and term clusters, analyses of the correlation among the association measures using Pearsonʹs correlation coefficient and MDS mapping, and an analysis of the impact of a term frequency on the association values by means of z-score. The major findings of the study are as follows: First, the most similar association measures are mutual information and Yuleʹs coefficient of colligation Y, whereas cosine and Jaccard coefficients, as well as X^2 statistic and likelihood ratio, demonstrate quite similar behavior for terms with high frequency. Second, among all the measures, the X^2 statistic is the least affected by the frequency of terms. Third, although cosine and Jaccard coefficients tend to emphasize high frequency terms, mutual information and Yuleʹs Y seem to overestimate rare terms.
  • Keywords
    Pattern recognition , musical data acquisition , Document image analysis , optical music recognition
  • Journal title
    Journal of the American Society for Information Science and Technology
  • Serial Year
    2001
  • Journal title
    Journal of the American Society for Information Science and Technology
  • Record number

    35076