Title :
Cache Friendly Burrows-Wheeler Inversion
Author :
Kärkkäinen, Juha ; Puglisi, Simon J.
Author_Institution :
Dept. of Comput. Sci., Univ. of Helsinki, Helsinki, Finland
Abstract :
The Burrows-Wheeler transform permutes the symbols of a string such that the permuted string can be compressed effectively with fast, simple techniques. Inversion of the transform is a bottleneck in practice. Inversion takes linear time, but, for each symbol decoded, folklore says that a random access into the transformed string (and so a CPU cache-miss) is necessary. In this paper we show how to mitigate cache misses and so speed inversion. Our main idea is to modify the standard inversion algorithm to detect and record repeated sub strings in the original string as it is recovered. Subsequent occurrences of these repetitions are then copied in a cache friendly way from the already recovered portion of the string, short cutting a series of random accesses by the standard inversion algorithm. We show experimentally that this approach leads to faster runtimes in general, and can drastically reduce inversion time for highly repetitive data.
Keywords :
cache storage; data compression; transforms; CPU cache misses; cache friendly Burrows-Wheeler transform inversion; permuted string; speed inversion; standard inversion algorithm; transformed string; Arrays; DNA; Data compression; Electronic mail; Pattern matching; Runtime; Transforms; BWT; Burrows-Wheeler transform; cache misses; data compression; suffix array;
Conference_Titel :
Data Compression, Communications and Processing (CCP), 2011 First International Conference on
Conference_Location :
Palinuro
Print_ISBN :
978-1-4577-1458-0
Electronic_ISBN :
978-0-7695-4528-8
DOI :
10.1109/CCP.2011.15