DocumentCode :
254823
Title :
Organizing and Storing Method for Large-Scale Unstructured Data Set with Complex Content
Author :
Dongqi Wei ; Chaoling Li ; Naheman, Wumuti ; Jianxin Wei ; Junlu Yang
Author_Institution :
Xian Center of Geol. Survey, China Univ. of Geosci. (Wuhan), Xi´an, China
fYear :
2014
fDate :
4-6 Aug. 2014
Firstpage :
70
Lastpage :
76
Abstract :
At the arrival of big data era, traditional geological industries are still using the traditional way to produce and collect data, and geosciences information is represented as unstructured data in various forms. These data is often categorized together according to a relatively simple way, thus forming a number of datasets with complex internal structure. However, this is not a good expression of rich geoscience information carried by unstructured data and it is also inconvenient to express complex relationships among the information, even against to find in-depth knowledge across datasets. Meanwhile, existence forms of such data also impeded the application of advanced technological methods. In an attempt to solve the problem, this paper proposes a multi-granularity content tree model and pay-as-you-go mode to support evolvement data modeling. These features help to split the data model, position data content precisely and to expand the dimensions of the main features that described according to the data subject, and then gradually discover data contained information and relationships among the information. Considering the large size of the data features, this paper designs data persistence mode based on HBase, so as to achieve the purpose of data processing by using technologies within the Hadoop system. This article also presents data content extraction and content tree initial state algorithms under MapReduce framework, and dynamic loading and local caching algorithms of content tree, thus forming a basic extract-store-load process. An application example of the model about the geological industries is given at the end.
Keywords :
Big Data; cache storage; data structures; geology; geophysics computing; Big Data; HBase; Hadoop system; MapReduce framework; caching algorithms; complex content; content tree; geological industries; geosciences information; large-scale unstructured data set; Big data; Data models; Educational institutions; Geology; Heuristic algorithms; Industries; Object oriented modeling; Data Model; Geosciences Information; Large-scale Data; Unstructured Data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing for Geospatial Research and Application (COM.Geo), 2014 Fifth International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/COM.Geo.2014.9
Filename :
6910123
Link To Document :
بازگشت