• Title of article

    Term dependence: A basis for Luhn and Zipf models

  • Author/Authors

    Robert M. Losee، نويسنده ,

  • Issue Information
    ماهنامه با شماره پیاپی سال 2001
  • Pages
    7
  • From page
    1019
  • To page
    1025
  • Abstract
    There are regularities in the statistical information provided by natural language terms about neighboring terms. We find that when phrase rank increases, moving from common to less common phrases, the value of the expected mutual information measure (EMIM) between the terms regularly decreases. Luhnʹs model suggests that midrange terms are the best index terms and relevance discriminators. We suggest reasons for this principle based on the empirical relationships shown here between the rank of terms within phrases and the average mutual information between terms, which we refer to as the Inverse Representation—EMIM principle. We also suggest an Inverse EMIM term weight for indexing or retrieval applications that is consistent with Luhnʹs distribution. An information theoretic interpretation of Zipfʹs Law is provided. Using the regularity noted here, we suggest that Zipfʹs Law is a consequence of the statistical dependencies that exist between terms, described here using information theoretic concepts.
  • Journal title
    Journal of the American Society for Information Science and Technology
  • Serial Year
    2001
  • Journal title
    Journal of the American Society for Information Science and Technology
  • Record number

    993153