• DocumentCode
    234360
  • Title

    Towards a flexible open-source software library for multi-layered scholarly textual studies: An Arabic case study dealing with semi-automatic language processing

  • Author

    Del Grosso, Angelo Mario ; Nahli, Ouafae

  • Author_Institution
    Ist. di Linguistica Comput. “A. Zampolli” (ILC), Pisa, Italy
  • fYear
    2014
  • fDate
    20-22 Oct. 2014
  • Firstpage
    285
  • Lastpage
    290
  • Abstract
    This paper presents both the general model and a case study of the Computational and Collaborative Philology Library (CoPhiLib), an ongoing initiative underway at the Institute for Computational Linguistics (ILC) of the National Research Council (CNR), Pisa, Italy. The library, designed and organized as a reusable, abstract and open-source software component, aims at solving the needs of multi-lingual and cross-lingual analysis by exposing common Application Programming Interfaces (APIs). The core modules, coded by the Java programming language, constitute the groundwork of a Web platform designed to deal with textual scholarly needs. The Web application, implemented according to the Java Enterprise specifications, focuses on multi-layered analysis for the study of literary documents and related multimedia sources. This ambitious challenge seeks to obtain the management of textual resources, on the one hand by abstracting from current language, on the other hand by decoupling from the specific requirements of single projects. This goal is achieved thanks to methodologies declared by the “agile process”, and by putting into effect suitable use case modeling, design patterns, and component-based architectures. The reusability and flexibility of the system have been tested on an Arabic case study: the system allows users to choose the morphological engine (such as AraMorph or Al-Khalil), along with linguistic granularity (i.e. with or without declension). Finally, the application enables the construction of annotated resources for further statistical engines (training set).
  • Keywords
    Internet; Java; application program interfaces; computational linguistics; groupware; natural language processing; text analysis; API; Arabic case study; CoPhiLib; Institute for Computational Linguistics; Java enterprise specification; Java programming language; Web platform; agile process; application programming interface; component-based architecture; computational and collaborative philology library; cross-lingual analysis; design pattern; flexible open-source software library; linguistic granularity; morphological engine; multilayered analysis; multilayered scholarly textual study; multilingual analysis; semiautomatic language processing; use case modeling; Abstracts; Computer architecture; Engines; Java; Object oriented modeling; Pragmatics; Unified modeling language; API Design; Arabic Natural Language Processing; Design Patterns; Information Engineering; Text Processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Technology (CIST), 2014 Third IEEE International Colloquium in
  • Conference_Location
    Tetouan
  • Print_ISBN
    978-1-4799-5978-5
  • Type

    conf

  • DOI
    10.1109/CIST.2014.7016633
  • Filename
    7016633