Title :
Machine learning of engineering diagnostic knowledge from unstructured verbatim text descriptions
Author :
Yinghao Huang ; Murphey, Yi L. ; Yao Ge
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Michigan, Dearborn, MI, USA
Abstract :
This paper presents our research in text mining for discovering important engineering fault diagnostic knowledge from unstructured and verbatim text descriptions. In particular we focus on developing machine learning algorithms for detecting documents that contain descriptions of systematic failures and root causes to the faults. We developed a machine algorithm based on entropy analysis to extract an A-word list, a list of words that are important to characterize the documents of interests, a vector space model to represent features of important documents, and a constraint based k-means clustering algorithm to generate high purity clusters for use in detecting important documents. We applied the algorithms to automotive diagnostic text data, which are unstructured and verbatim descriptions by customers and technicians that contain many typos and self-invented terms. We were able to reduce a list of 2183 words to a list of 137 important words. The classification system generated by these machine learning algorithms showed high recall and accuracy in detecting important diagnostic descriptions.
Keywords :
automotive engineering; data mining; entropy; failure analysis; fault diagnosis; learning (artificial intelligence); pattern clustering; text analysis; word processing; A-word list extraction; automotive diagnostic text data; classification system; constraint based k-means clustering algorithm; document detection; engineering diagnostic knowledge; engineering fault diagnostic knowledge; entropy analysis; machine algorithm; machine learning algorithms; systematic failures; text mining; unstructured verbatim text descriptions; vector space model; Automotive engineering; Clustering algorithms; Entropy; Machine learning algorithms; Training data; Vectors; Vehicles; engineering diagnostics; important document detection; machine learning; text mining;
Conference_Titel :
Computational Intelligence and Data Mining (CIDM), 2013 IEEE Symposium on
Conference_Location :
Singapore
DOI :
10.1109/CIDM.2013.6597216