مرکز منطقه ای اطلاع رساني علوم و فناوري - Using Continuous Integration to organize and monitor the annotation process of domain specific corpora

DocumentCode :

163428

Title :

Using Continuous Integration to organize and monitor the annotation process of domain specific corpora

Author :

Schreiber, Markus ; Barkschat, Kai ; Kraft, Bodo

Author_Institution :

Fac. of Med. Eng. &Technomathematics, FH Aachen, Aachen, Germany

fYear :

2014

fDate :

1-3 April 2014

Firstpage :

Lastpage :

Abstract :

Applications in the World Wide Web aggregate vast amounts of information from different data sources. The aggregation process is often implemented with Extract, Transform and Load (ETL) processes. Usually ETL processes require information for aggregation available in structured formats, e. g. XML or JSON. In many cases the information is provided in natural language text which makes the application of ETL processes impractical. Due to the fact that information is provided in natural language, Information Extraction (IE) systems have been evolved. They make use of Natural Language Processing (NLP) tools to derive meaning from natural language text. State-of-the-art NLP tools apply Machine Learning methods. These NLP tools perform on newspapers with good quality, but they drop accuracy in other domains. However, to improve the quality for IE systems in specific domains often NLP tools are trained on domain specific text which is a time consuming process. This paper introduces an approach using a Continuous Integration pipeline for organizing and monitoring the annotation process on domain specific corpora.

Keywords :

Internet; information retrieval; learning (artificial intelligence); natural language processing; ETL process; JSON; NLP tool; World Wide Web; XML; annotation process; continuous integration; domain specific corpora; extract-transform-and-load process; information extraction; machine learning; natural language processing; natural language text; Biology; Communication systems; Monitoring; Natural language processing; Pipelines; Testing; Training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information and Communication Systems (ICICS), 2014 5th International Conference on

Conference_Location :

Irbid

Print_ISBN :

978-1-4799-3022-7

Type :

conf

DOI :

10.1109/IACS.2014.6841958

Filename :

6841958

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=163428