Author_Institution :
Inf. Sci. Inst., Univ. of Southern California, Marina del Rey, CA, USA
Abstract :
If there is a word that strikes fear into the hearts of speech and natural-language researchers, it is “evaluation”. It´s not that we don´t like evaluation-mostly, we do-it´s just that we´ve developed this knee-jerk response: “Another evaluation? Already? But we just did an evaluation!” Well, that´s an exaggeration. But funder-promoted, common evaluations have profoundly affected our research in years past. In these evaluations, researchers agree on a number of tasks and criteria for success, then work independently on those tasks. In many cases, funding agencies provide sample inputs and outputs in advance, but withhold test data until “evaluation day”. Several months later, participating researchers gather at a specialized workshop to discuss results and techniques. Whether you´re interested in machine translation, morphology, speech-query processing, information extraction, or information retrieval, there is a common evaluation somewhere where you can test your ideas in an empirical, quantitatively scored setting. The author looks at some strengths and weaknesses of these common evaluations
Keywords :
natural languages; research and development management; speech recognition; common evaluation; evaluation; information extraction; information retrieval; machine translation; morphology; natural-language research; speech-query processing; Artificial intelligence; Data mining; Heart; Information retrieval; Intersymbol interference; Knowledge acquisition; Robustness; Software engineering; Speech analysis; Testing;