Title :
Towards a Universal, Quantifiable, and Scalable File Format Converter
Author :
McHenry, Kenton ; Kooper, Rob ; Bajcsy, Peter
Author_Institution :
Nat. Center for Supercomput. Applic., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
Abstract :
This paper addresses the problem of designing a universal file format converter. File format conversion is a necessary part of data dissemination and curation. Complete and robust converters however are hard to find and build due to the abundance of file formats, the fact that many formats are closed, and the complexities within individual format specifications. On the other hand many software applications exist that are capable of performing some degree of data conversion between a subset of the available formats. To take advantage of this we introduce a data structure called an I/O-Graph to store the available input and output formats of these applications. Based on a concept of imposed code reuse we use this to develop a service, NCSA Polyglot, which through this graph is capable of performing the larger union of conversions supported by the underlying software. The Polyglot system is designed to be easily extensible, scalable with the number of conversion requests, and inclusive of all available third party software. Given a data set of files from a particular domain, we are able to assign weights to the edges within the I/O-Graph indicating the amount of information retained during a conversion. These edge weights allow the system to then choose conversion paths with the least amount of information loss.
Keywords :
data structures; electronic data interchange; formal specification; I/O-graph; NCSA Polyglot; code reuse; data conversion; data curation; data dissemination; data structure; format specification; information loss; universal file format converter; Animation; Application software; Data conversion; Data structures; Geometry; Graphics; Material properties; Programming profession; Robustness; Software performance; 3D Data; Code Reuse; Data Dissemination; Digital Curation; File Format Conversion;
Conference_Titel :
e-Science, 2009. e-Science '09. Fifth IEEE International Conference on
Conference_Location :
Oxford
Print_ISBN :
978-0-7695-3877-8
DOI :
10.1109/e-Science.2009.28