Title :
High-speed string searching against large dictionaries on the Cell/B.E. Processor
Author :
Scarpazza, Daniele Paolo ; Villa, Oreste ; Petrini, Fabrizio
Author_Institution :
Cell Solution Dept., IBM TJ Watson Res. Center, Yorktown Heights, NY
Abstract :
Our digital universe is growing, creating exploding amounts of data which need to be searched, filtered and protected. String searching is at the core of the tools we use to curb this explosion, such as search engines, network intrusion detection systems, spam filters, and anti-virus programs. But as communication speed grows, our capability to perform string searching in real-time seems to fall behind. Multi-core architectures promise enough computational power to cope with the incoming challenge, but it is still unclear which algorithms and programming models to use to unleash this power. We have parallelized a popular string searching algorithm, Aho-Corasick, on the IBM Cell/B.E. processor, with the goal of performing exact string matching against large dictionaries. In this article we propose a novel approach to fully exploit the DMA-based communication mechanisms of the Cell/B.E. to provide an unprecedented level of aggregate performance with irregular access patterns. We have discovered that memory congestion plays a crucial role in determining the performance of this algorithm. We discuss three aspects of congestion: memory pressure, layout issues and hot spots, and we present a collection of algorithmic solutions to alleviate these problems and achieve quasi-optimal performance. The implementation of our algorithm provides a worst- case throughput of 2.5 Gbps, and a typical throughput between 3.3 and 4.4 Gbps, measured on realistic scenarios with a two-processor Cell/B.E. system.
Keywords :
multiprocessing systems; string matching; text analysis; Aho-Corasick; Cell/B.E. processor; exact string matching; high-speed string searching; hot spots; irregular access patterns; large dictionaries; layout issues; memory congestion; memory pressure; multicore architecture; Aggregates; Computer architecture; Dictionaries; Digital filters; Explosions; Intrusion detection; Power system modeling; Protection; Search engines; Throughput;
Conference_Titel :
Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-1693-6
Electronic_ISBN :
1530-2075
DOI :
10.1109/IPDPS.2008.4536300