Title :
Text data mining applied to clustering with cost effective tools
Author :
Moebes, Travis A.
Author_Institution :
Safety & Mission Assurance, Sci. Applications Int. Corp., Houston, TX, USA
Abstract :
Classification by reading of corrective action reports at NASA can be a lengthy and labor intensive process. This paper shows that a process requiring several weeks of engineer labor can be reduced to a few hours of analyst labor using commercial and in-house data mining applications. Signal processing theory is used to determine the best cluster based on text to use when searching for common cause problems. A method of determining the high-level clusters is presented, and this is followed by a new technique using Fourier transformations and cross-correlations to determine more refined low-level clusters and new information in the data. Finally, a way to apply these results to situations where the cost of lengthy decisions is different from the rewards for quick, correct decisions is discussed. By developing special in-house software, much of the text data mining can be accomplished without purchasing expensive specialized text mining tools.
Keywords :
Fourier transforms; correlation theory; data mining; decision making; search problems; signal processing; statistical analysis; text analysis; Fourier transformations; NASA; corrective action reports; high-level clusters; in-house data mining; in-house software; signal processing theory; text data mining; Aerospace safety; Costs; Data analysis; Data engineering; Data mining; Failure analysis; NASA; Performance analysis; Signal processing; Software tools; Data mining; Fast Fourier Transformation; cross-correlation; linear regression; probability; statistics;
Conference_Titel :
Systems, Man and Cybernetics, 2005 IEEE International Conference on
Print_ISBN :
0-7803-9298-1
DOI :
10.1109/ICSMC.2005.1571572