DocumentCode :
3103284
Title :
CHIC - Converting Hamburgers into Cows
Author :
Townsend, Joseph A. ; Downing, Jim ; Murray-Rust, Peter
Author_Institution :
Unilever Centre for Mol. Sci. Inf., Univ. of Cambridge, Cambridge, UK
fYear :
2009
fDate :
9-11 Dec. 2009
Firstpage :
337
Lastpage :
343
Abstract :
We have developed a methodology and workflow (CHIC) for the automatic semantification and structuring of legacy textual scientific documents. CHIC imports common document formats (PDF, DOCX and (X)HTML) and uses a number of toolkits to extract components and convert them into SciXML. This is sectioned into text-rich and data-rich streams and stand-off annotation (SAF) is created for each. Embedded domain specific objects can be converted into XML (chemical markup language). The different workflow streams can then be recombined and typically converted into RDF (resource description format).
Keywords :
XML; document handling; scientific information systems; software maintenance; (X)HTML; CHIC; DOCX; PDF; SciXML; automatic semantification; chemical markup language; data-rich streams; legacy textual scientific documents; resource description format; stand-off annotation; Chemicals; Cows; Data mining; Informatics; Markup languages; OWL; Ontologies; Resource description framework; Vehicles; XML; SAF; XML; conversion; semantics; workflow;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
e-Science, 2009. e-Science '09. Fifth IEEE International Conference on
Conference_Location :
Oxford
Print_ISBN :
978-0-7695-3877-8
Type :
conf
DOI :
10.1109/e-Science.2009.54
Filename :
5380847
Link To Document :
بازگشت