• DocumentCode
    254823
  • Title

    Organizing and Storing Method for Large-Scale Unstructured Data Set with Complex Content

  • Author

    Dongqi Wei ; Chaoling Li ; Naheman, Wumuti ; Jianxin Wei ; Junlu Yang

  • Author_Institution
    Xian Center of Geol. Survey, China Univ. of Geosci. (Wuhan), Xi´an, China
  • fYear
    2014
  • fDate
    4-6 Aug. 2014
  • Firstpage
    70
  • Lastpage
    76
  • Abstract
    At the arrival of big data era, traditional geological industries are still using the traditional way to produce and collect data, and geosciences information is represented as unstructured data in various forms. These data is often categorized together according to a relatively simple way, thus forming a number of datasets with complex internal structure. However, this is not a good expression of rich geoscience information carried by unstructured data and it is also inconvenient to express complex relationships among the information, even against to find in-depth knowledge across datasets. Meanwhile, existence forms of such data also impeded the application of advanced technological methods. In an attempt to solve the problem, this paper proposes a multi-granularity content tree model and pay-as-you-go mode to support evolvement data modeling. These features help to split the data model, position data content precisely and to expand the dimensions of the main features that described according to the data subject, and then gradually discover data contained information and relationships among the information. Considering the large size of the data features, this paper designs data persistence mode based on HBase, so as to achieve the purpose of data processing by using technologies within the Hadoop system. This article also presents data content extraction and content tree initial state algorithms under MapReduce framework, and dynamic loading and local caching algorithms of content tree, thus forming a basic extract-store-load process. An application example of the model about the geological industries is given at the end.
  • Keywords
    Big Data; cache storage; data structures; geology; geophysics computing; Big Data; HBase; Hadoop system; MapReduce framework; caching algorithms; complex content; content tree; geological industries; geosciences information; large-scale unstructured data set; Big data; Data models; Educational institutions; Geology; Heuristic algorithms; Industries; Object oriented modeling; Data Model; Geosciences Information; Large-scale Data; Unstructured Data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing for Geospatial Research and Application (COM.Geo), 2014 Fifth International Conference on
  • Conference_Location
    Washington, DC
  • Type

    conf

  • DOI
    10.1109/COM.Geo.2014.9
  • Filename
    6910123