Abstract :
A functioning system is described in which spoken Fortran-like programs can be reliably interpreted by a computer. Speech is encoded into cepstral patterns using a shifting overlapping 32-ms Hamming window. A vocabulary of 26 words was selected so that the distance between different words, in their cepstral representation, was large. The syntax of individual phoneme strings in a word, and the syntax chosen for the programming language, were used to resolve ambiguous decisions. Feedback to the programmer greatly increased the reliability of the system, since, 1) mistaken decisions could be corrected, and 2) the programmer gradually learned to speak in such a way that system mistakes were greatly reduced. With two trained speakers, the recognition rate was 50-75 percent on sentences, 75-90 percent on words, and with user feedback and correction, the recognition rate was 100 percent with no more than two repetitions for long statements. The good performance achieved was due to the well-chosen vocabulary, strong syntactic support, and speaker training. The basic drawback for the system at present is that the initial training of the user is a time-consuming process. However, further improvement has since been achieved by using the initial isolated cepstral prototypes to locate new prototypes in samples of continuous speech, and then using the "continuous prototypes" for recognition. Currently, formant trajectories, derived from a pitch synchronous, linear prediction analysis, are being used, and an automatic stress analysis program provides segmentation, and guides the selection of key allophones.