• DocumentCode
    2982058
  • Title

    Analysing the evolution of the NCI Thesaurus

  • Author

    Gonç, Rafael S. ; Parsia, Bijan ; Sattler, Uli

  • Author_Institution
    Sch. of Comput. Sci., Univ. of Manchester, Manchester, UK
  • fYear
    2011
  • fDate
    27-30 June 2011
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    The National Cancer Institute (NCI) Thesaurus (NCIt) is a biomedical ontology which has been developed for over a decade. Nearly every month from 2003 through 2011, the NCI has published an updated version of the NCIt to the Web as an OWL ontology (as well as in other formats). We collected all 88 OWL versions of the NCIt available and conducted a cross-sectional study on this corpus to investigate and characterize the evolution of the NCIt. In particular, we gathered and analysed various axiom and entity statistics, and carried out a reasoner performance test over the corpus. Additionally, we extracted two complete sets of pairwise, consecutive diffs: the first set was generated by a purely syntactic difference analysis (based on OWL´s notion of “structural equivalence”); for the second set, we also checked whether the additions or removals changed the set of entailments between versions. We discovered a high level of “merely syntactic” removals and additions. We develop a categorization of such changes based on a heuristic inference of the impact of the change. As a result, not only do we get a rich, purely analytic characterization of the change history of the NCIt, but also we generate a realistic test corpus for incremental classification.
  • Keywords
    inference mechanisms; knowledge representation languages; ontologies (artificial intelligence); thesauri; NCI thesaurus; OWL ontology; Web; biomedical ontology; heuristic inference; incremental classification; national cancer institute;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer-Based Medical Systems (CBMS), 2011 24th International Symposium on
  • Conference_Location
    Bristol
  • ISSN
    1063-7125
  • Print_ISBN
    978-1-4577-1189-3
  • Type

    conf

  • DOI
    10.1109/CBMS.2011.5999163
  • Filename
    5999163