Title :
Term Filtering with Bounded Error
Author :
Yang, Zi ; Li, Wei ; Tang, Jie ; Li, Juanzi
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Abstract :
In this paper, we consider a novel problem referred to as term filtering with bounded error to reduce the term (feature) space by eliminating terms without (or with bounded) information loss. Different from existing works, the obtained term space provides a complete view of the original term space. More interestingly, several important questions can be answered such as: 1) how different terms interact with each other and 2) how the filtered terms can be represented by the other terms. We perform a theoretical investigation of the term filtering problem and link it to the Geometric Covering By Discs problem, and prove its NP-hardness. We present two novel approaches for both lossless and lossy term filtering with bounds on the introduced error. Experimental results on multiple text mining tasks validate the effectiveness of the proposed approaches.
Keywords :
computational complexity; data mining; information filtering; NP-hardness; bounded error; discs problem; feature space reduction; geometric covering; lossless term filtering; lossy term filtering; multiple text mining tasks; term space reduction;
Conference_Titel :
Data Mining (ICDM), 2010 IEEE 10th International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9131-5
Electronic_ISBN :
1550-4786
DOI :
10.1109/ICDM.2010.131