• DocumentCode
    48427
  • Title

    A Tensor-Based Approach for Big Data Representation and Dimensionality Reduction

  • Author

    Liwei Kuang ; Fei Hao ; Yang, L.T. ; Man Lin ; Changqing Luo ; Geyong Min

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Huazhong Univ. of Sci. & Technol., Wuhan, China
  • Volume
    2
  • Issue
    3
  • fYear
    2014
  • fDate
    Sept. 2014
  • Firstpage
    280
  • Lastpage
    291
  • Abstract
    Variety and veracity are two distinct characteristics of large-scale and heterogeneous data. It has been a great challenge to efficiently represent and process big data with a unified scheme. In this paper, a unified tensor model is proposed to represent the unstructured, semistructured, and structured data. With tensor extension operator, various types of data are represented as subtensors and then are merged to a unified tensor. In order to extract the core tensor which is small but contains valuable information, an incremental high order singular value decomposition (IHOSVD) method is presented. By recursively applying the incremental matrix decomposition algorithm, IHOSVD is able to update the orthogonal bases and compute the new core tensor. Analyzes in terms of time complexity, memory usage, and approximation accuracy of the proposed method are provided in this paper. A case study illustrates that approximate data reconstructed from the core set containing 18% elements can guarantee 93% accuracy in general. Theoretical analyzes and experimental results demonstrate that the proposed unified tensor model and IHOSVD method are efficient for big data representation and dimensionality reduction.
  • Keywords
    Big Data; approximation theory; computational complexity; data structures; database theory; singular value decomposition; tensors; Big Data process; Big Data representation; IHOSVD method; approximation accuracy; core tensor extraction; data reconstruction; dimensionality reduction; heterogeneous data; incremental high order singular value decomposition method; incremental matrix decomposition algorithm; large-scale data; memory usage; orthogonal bases; semistructured data; subtensors; tensor extension operator; tensor-based approach; time complexity; unified tensor model; variety characteristics; veracity characteristics; Approximation methods; Big data; Data models; Large-scale systems; Tensile stress; XML; HOSVD; Tensor; data representation; dimensionality reduction;
  • fLanguage
    English
  • Journal_Title
    Emerging Topics in Computing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-6750
  • Type

    jour

  • DOI
    10.1109/TETC.2014.2330516
  • Filename
    6832490