Title :
Statistical Evaluation of Measure and Distance on Document Classification Problems in Text Mining
Author :
Goto, Masayuki ; Ishida, Takashi ; Hirasawa, Shigeichi
Author_Institution :
Musashi Inst. of Technol., Yokohama
Abstract :
This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. By formulation of statistical hypotheses test which is specified as a problem of text mining, some interesting properties can be visualized. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and this approach will give us very clear idea. The distance measure in word vector space is used to classify the documents. In this paper, the performance of distance measure is also analized from the new viewpoint of asymptotic analysis.
Keywords :
classification; data mining; statistical testing; text analysis; asymptotic statistical analysis; distance measure; document classification; statistical evaluation; statistical hypotheses test; text mining; word vector space; Extraterrestrial measurements; Frequency measurement; H infinity control; Information retrieval; Information technology; Parametric statistics; Performance analysis; Testing; Text categorization; Text mining;
Conference_Titel :
Computer and Information Technology, 2007. CIT 2007. 7th IEEE International Conference on
Conference_Location :
Aizu-Wakamatsu, Fukushima
Print_ISBN :
978-0-7695-2983-7
DOI :
10.1109/CIT.2007.171