• DocumentCode
    1135130
  • Title

    Semantic Data Mining of Short Utterances

  • Author

    Begeja, Lee ; Drucker, Harris ; Gibbon, David ; Haffner, Patrick ; Liu, Zhu ; Renger, Bernard ; Shahraray, Behzad

  • Author_Institution
    IP & Voice Services Lab, AT&T Labs.-Res., Florham Park, NJ, USA
  • Volume
    13
  • Issue
    5
  • fYear
    2005
  • Firstpage
    672
  • Lastpage
    680
  • Abstract
    This paper introduces a methodology for speech data mining along with the tools that the methodology requires. We show how they increase the productivity of the analyst who seeks relationships among the contents of multiple utterances and ultimately must link some newly discovered context into testable hypotheses about new information. While, in its simplest form, one can extend text data mining to speech data mining by using text tools on the output of a speech recognizer, we have found that it is not optimal. We show how data mining techniques that are typically applied to text should be modified to enable an analyst to do effective semantic data mining on a large collection of short speech utterances. For the purposes of this paper, we examine semantic data mining in the context of semantic parsing and analysis in a specific situation involving the solution of a business problem that is known to the analyst. We are not attempting a generic semantic analysis of a set of speech. Our tools and methods allow the analyst to mine the speech data to discover the semantics that best cover the desired solution. The coverage, in this case, yields a set of Natural Language Understanding (NLU) classifiers that serve as testable hypotheses.
  • Keywords
    data mining; pattern clustering; speech processing; generic semantic analysis; natural language understanding classifiers; semantic data mining; speech data mining; text tools; Data mining; Feedback; Information analysis; Natural languages; Productivity; Speech analysis; Speech recognition; Taxonomy; Testing; Text recognition; Classifiers; clustering; data reduction; relevance feedback; speech data mining;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2005.851875
  • Filename
    1495448