• DocumentCode
    2534211
  • Title

    Differencing data streams

  • Author

    Chawathe, Sudarshan S.

  • Author_Institution
    Dept. of Comput. Sci., Maryland Univ., College Park, MD, USA
  • fYear
    2005
  • fDate
    25-27 July 2005
  • Firstpage
    273
  • Lastpage
    284
  • Abstract
    We present external-memory algorithms for differencing large hierarchical datasets. Our methods are especially suited to streaming data with bounded differences. For input sizes m and n and maximum output (difference) size e, the I/O, RAM, and CPU costs of our algorithm rdiff are, respectively, m + n, 4e + 8, and O(MN). That is, given 4e + 8 blocks of RAM, our algorithm performs no I/O operations other than those required to read both inputs. We also present a variant of the algorithm that uses only four blocks of RAM, with I/O cost 8me + 18m + n + 6e + 5 and CPU cost O(MN).
  • Keywords
    computational complexity; data handling; random-access storage; storage management; very large databases; RAM; data streams; external-memory algorithms; large hierarchical dataset differencing; Change detection algorithms; Computer science; Costs; Database systems; Educational institutions; Random access memory; Read-write memory; Spatial databases; Testing; Warehousing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database Engineering and Application Symposium, 2005. IDEAS 2005. 9th International
  • ISSN
    1098-8068
  • Print_ISBN
    0-7695-2404-4
  • Type

    conf

  • DOI
    10.1109/IDEAS.2005.21
  • Filename
    1540917