DocumentCode :
2793933
Title :
Discovering recurring anomalies in text reports regarding complex space systems
Author :
Srivastava, Ashok N. ; Zane-Ulman, Brett
Author_Institution :
NASA Ames Res. Center, Moffett Field, CA
fYear :
2005
fDate :
5-12 March 2005
Firstpage :
3853
Lastpage :
3862
Abstract :
Many existing complex space systems have a significant amount of historical maintenance and problem data bases that are stored in unstructured text forms. The problem that we address in this paper is the discovery of recurring anomalies and relationships between problem reports that may indicate larger systemic problems. We illustrate our techniques on data from discrepancy reports regarding software anomalies in the Space Shuttle. These free text reports are written by a number of different people, thus the emphasis and wording vary considerably. We test four automatic methods of anomaly detection in text that are popular in the current literature on text mining. The first method that we describe is k-means or Gaussian mixture model and its application to the term-document matrix. The second method is the Sammon nonlinear map, which projects high dimensional document vectors into two dimensions for visualization and clustering purposes. The third method is based on an analysis of the results of applying a clustering method, expectation maximization on a mixture of von Mises Fisher distributions that represents each document as a point on a high dimensional sphere. In this space, we perform clustering to obtain sets of similar documents. The results are derived from a new method known as spectral clustering, where vectors from the term-document matrix are embedded in a high dimensional space for clustering. The paper concludes with recommendations regarding the development of an operational text mining system for analysis of problem reports that arise from complex space systems. We also contrast such systems with general purpose text mining systems, illustrating the areas in which this system needs to be specified for the space domain
Keywords :
aerospace engineering; aircraft maintenance; classification; text analysis; word processing; Gaussian mixture model; Sammon nonlinear map; Space Shuttle; anomaly detection; automatic methods; complex space systems; discrepancy reports; expectation maximization; historical maintenance; k-means method; problem data bases; problem reports; recurring anomalies; software anomalies; spectral clustering; term-document matrix; text mining; text reports; von Mises Fisher distributions; Aerospace testing; Algorithm design and analysis; Data analysis; Data mining; Functional analysis; Information analysis; Manufacturing processes; Sensor systems; Text mining; Thermal sensors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Aerospace Conference, 2005 IEEE
Conference_Location :
Big Sky, MT
Print_ISBN :
0-7803-8870-4
Type :
conf
DOI :
10.1109/AERO.2005.1559692
Filename :
1559692
Link To Document :
بازگشت