Evaluating and enhancing cross-domain rank predictability of textual entailment datasets

Author

Lee, Cheng-Wei ; Lin, Chuan-Jie ; Shima, Hideki ; Hsu, Wen-Lian

Author_Institution

Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan

fYear

2012

fDate

8-10 Aug. 2012

Firstpage

51

Lastpage

58

Abstract

Textual Entailment (TE) is the task of recognizing entailment, paraphrase, and contradiction relations between a given text pair. The goal of textual entailment research is to develop a core inference component that can be applied to various domains, such as IR or NLP. Since the domain that a TE system applies to may be different from its source domain, it is crucial to develop proper datasets for measuring the cross-domain ability of a TE system. We propose using Kendall´s tau to measure a dataset´s cross-domain rank predictability. Our analysis shows that incorporating “artificial pairs” into a dataset helps enhance its rank predictability. We also find that the completeness of guidelines has no obvious effect on the rank predictability of a dataset. To validate these findings, more investigation is needed; however these findings suggest some new directions for the creation of TE datasets in the future.

Keywords

text analysis; Kendalls tau; TE; core inference component; enhancing cross domain rank predictability; textual entailment datasets; Accuracy; Correlation; Educational institutions; Guidelines; Humans; Standards; Text recognition; Cross-Domain Evaluation; RITE; Rank Predictability; Textual Entailment;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Reuse and Integration (IRI), 2012 IEEE 13th International Conference on

Conference_Location

Las Vegas, NV

Print_ISBN

978-1-4673-2282-9

Electronic_ISBN

978-1-4673-2283-6

Type

conf

DOI

10.1109/IRI.2012.6302990

Filename

6302990