DocumentCode
2793933
Title
Discovering recurring anomalies in text reports regarding complex space systems
Author
Srivastava, Ashok N. ; Zane-Ulman, Brett
Author_Institution
NASA Ames Res. Center, Moffett Field, CA
fYear
2005
fDate
5-12 March 2005
Firstpage
3853
Lastpage
3862
Abstract
Many existing complex space systems have a significant amount of historical maintenance and problem data bases that are stored in unstructured text forms. The problem that we address in this paper is the discovery of recurring anomalies and relationships between problem reports that may indicate larger systemic problems. We illustrate our techniques on data from discrepancy reports regarding software anomalies in the Space Shuttle. These free text reports are written by a number of different people, thus the emphasis and wording vary considerably. We test four automatic methods of anomaly detection in text that are popular in the current literature on text mining. The first method that we describe is k-means or Gaussian mixture model and its application to the term-document matrix. The second method is the Sammon nonlinear map, which projects high dimensional document vectors into two dimensions for visualization and clustering purposes. The third method is based on an analysis of the results of applying a clustering method, expectation maximization on a mixture of von Mises Fisher distributions that represents each document as a point on a high dimensional sphere. In this space, we perform clustering to obtain sets of similar documents. The results are derived from a new method known as spectral clustering, where vectors from the term-document matrix are embedded in a high dimensional space for clustering. The paper concludes with recommendations regarding the development of an operational text mining system for analysis of problem reports that arise from complex space systems. We also contrast such systems with general purpose text mining systems, illustrating the areas in which this system needs to be specified for the space domain
Keywords
aerospace engineering; aircraft maintenance; classification; text analysis; word processing; Gaussian mixture model; Sammon nonlinear map; Space Shuttle; anomaly detection; automatic methods; complex space systems; discrepancy reports; expectation maximization; historical maintenance; k-means method; problem data bases; problem reports; recurring anomalies; software anomalies; spectral clustering; term-document matrix; text mining; text reports; von Mises Fisher distributions; Aerospace testing; Algorithm design and analysis; Data analysis; Data mining; Functional analysis; Information analysis; Manufacturing processes; Sensor systems; Text mining; Thermal sensors;
fLanguage
English
Publisher
ieee
Conference_Titel
Aerospace Conference, 2005 IEEE
Conference_Location
Big Sky, MT
Print_ISBN
0-7803-8870-4
Type
conf
DOI
10.1109/AERO.2005.1559692
Filename
1559692
Link To Document