Title of article :
Matrix computations for information retrieval and major and outlier cluster detection
Author/Authors :
Kobayashi، نويسنده , , M and Aono، نويسنده , , M and Takeuchi، نويسنده , , H and Samukawa، نويسنده , , H، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2002
Abstract :
In this paper we introduce COV, a novel information retrieval (IR) algorithm for massive databases based on vector space modeling and spectral analysis of the covariance matrix, for the document vectors, to reduce the scale of the problem. Since the dimension of the covariance matrix depends on the attribute space and is independent of the number of documents, COV can be applied to databases that are too massive for methods based on the singular value decomposition of the document-attribute matrix, such as latent semantic indexing (LSI). In addition to improved scalability, theoretical considerations indicate that results from our algorithm tend to be more accurate than those from LSI, particularly in detecting subtle differences in document vectors. We demonstrate the power and accuracy of COV through an important topic in data mining, known as outlier cluster detection. We propose two new algorithms for detecting major and outlier clusters in databases—the first is based on LSI, and the second on COV. Our implementation studies indicate that our cluster detection algorithms outperform the basic LSI and COV algorithm in detecting outlier clusters.
Journal title :
Journal of Computational and Applied Mathematics
Journal title :
Journal of Computational and Applied Mathematics