مرکز منطقه ای اطلاع رساني علوم و فناوري - Discriminative Language Modeling With Linguistic and Statistically Derived Features

DocumentCode :

1273906

Title :

Discriminative Language Modeling With Linguistic and Statistically Derived Features

Author :

Arisoy, Ebru ; Saraçlar, Murat ; Roark, Brian ; Shafran, Izhak

Author_Institution :

ACCES Dept., IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA

Volume :

Issue :

fYear :

2012

Firstpage :

540

Lastpage :

550

Abstract :

This paper focuses on integrating linguistically motivated and statistically derived information into language modeling. We use discriminative language models (DLMs) as a complementary approach to the conventional n-gram language models to benefit from discriminatively trained parameter estimates for overlapping features. In our DLM approach, relevant information is encoded as features. Feature weights are discriminatively trained using training examples and used to re-rank the N -best hypotheses of the baseline automatic speech recognition (ASR) system. In addition to presenting a more complete picture of previously proposed feature sets that extract implicit information available at lexical and sub-lexical levels using both linguistic and statistical approaches, this paper attempts to incorporate semantic information in the form of topic sensitive features. We explore linguistic features to incorporate complex morphological and syntactic language characteristics of Turkish, an agglutinative language with rich morphology, into language modeling. We also apply DLMs to our sub-lexical-based ASR system where the vocabulary is composed of sub-lexical units. Obtaining implicit linguistic information from sub-lexical hypotheses is not as straightforward as word hypotheses, so we use statistical methods to derive useful information from sub-lexical units. DLMs with linguistic and statistical features yield significant, 0.8%-1.1% absolute, improvements over our baseline word-based and sub-word-based ASR systems. The explored features can be easily extended to DLM for other languages .

Keywords :

linguistics; parameter estimation; speech recognition; N best hypotheses; baseline automatic speech recognition; discriminative language modeling; linguistic features; parameter estimates; re rank; statistically derived features; sub lexical hypotheses; Morphology; Pragmatics; Semantics; Speech recognition; Syntactics; Training; Vocabulary; Discriminative training; language modeling; morphologically rich languages; speech recognition;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2011.2162323

Filename :

5955079

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1273906