• DocumentCode
    3253109
  • Title

    Information retrieval using Hellinger distance and sqrt-cos similarity

  • Author

    Shunzhi Zhu ; Lizhao Liu ; Yan Wang

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Xiamen Univ. of Technol., Xiamen, China
  • fYear
    2012
  • fDate
    14-17 July 2012
  • Firstpage
    925
  • Lastpage
    929
  • Abstract
    In this paper, we propose a similarity measurement method based on the Hellinger distance and square-root cosine. Then use Hellinger distance as the distance metric for document clustering and a new square-root cosine similarity for query information retrieval. This new similarity/distance also bridges between traditional tf_idf weighting to binary weighting in vector space model. Finally, we conduct a comparison on performance between this method and the one based on Euclidean distance and cosine similarity. And from the results, we clearly observe that the precision and recall are improved by using the sqrt-cos similarity.
  • Keywords
    document handling; query processing; vectors; Euclidean distance; binary weighting; distance metric; document clustering; information Hellinger distance; information retrieval; precision improvement; query information retrieval; recall improvement; square-root cosine similarity measurement method; tf-idf weighting; vector space model; Computational modeling; Educational institutions; Euclidean distance; Information retrieval; Probabilistic logic; Vectors; Hellinger cosine measurement; Hellinger distance; document clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science & Education (ICCSE), 2012 7th International Conference on
  • Conference_Location
    Melbourne, VIC
  • Print_ISBN
    978-1-4673-0241-8
  • Type

    conf

  • DOI
    10.1109/ICCSE.2012.6295217
  • Filename
    6295217