Title of article
Revisiting Cross-document Structure Theory for multi-document discourse parsing
Author/Authors
Erick Galani Maziero، نويسنده , , Maria Luc?a del Ros?rio Castro Jorge، نويسنده , , Thiago Alexandre Salgueiro Pardo، نويسنده ,
Issue Information
دوماهنامه با شماره پیاپی سال 2014
Pages
18
From page
297
To page
314
Abstract
Multi-document discourse parsing aims to automatically identify the relations among textual spans from different texts on the same topic. Recently, with the growing amount of information and the emergence of new technologies that deal with many sources of information, more precise and efficient parsing techniques are required. The most relevant theory to multi-document relationship, Cross-document Structure Theory (CST), has been used for parsing purposes before, though the results had not been satisfactory. CST has received many critics because of its subjectivity, which may lead to low annotation agreement and, consequently, to poor parsing performance. In this work, we propose a refinement of the original CST, which consists in (i) formalizing the relationship definitions, (ii) pruning and combining some relations based on their meaning, and (iii) organizing the relations in a hierarchical structure. The hypothesis for this refinement is that it will lead to better agreement in the annotation and consequently to better parsing results. For this aim, it was built an annotated corpus according to this refinement and it was observed an improvement in the annotation agreement. Based on this corpus, a parser was developed using machine learning techniques and hand-crafted rules. Specifically, hierarchical techniques were used to capture the hierarchical organization of the relations according to the proposed refinement of CST. These two approaches were used to identify the relations among texts spans and to generate multi-document annotation structure. Results outperformed other CST parsers, showing the adequacy of the proposed refinement in the theory.
Keywords
Discourse parsing , Multi-document processing , Cross-document Structure Theory , Machine Learning
Journal title
Information Processing and Management
Serial Year
2014
Journal title
Information Processing and Management
Record number
1229498
Link To Document