DocumentCode :
168765
Title :
Distributed Detection of Cancer Cells in High-Throughput Cellular Spike Streams
Author :
Hafeez, Abdul ; Rafique, M. Mustafa ; Butt, Ali R.
Author_Institution :
Virginia Tech, Blacksburg, VA, USA
fYear :
2014
fDate :
26-29 May 2014
Firstpage :
774
Lastpage :
783
Abstract :
Detection and identification of important biological targets such as, DNA, proteins, and diseased human cells is crucial towards early disease diagnosis and prognosis. The key to differentiate healthy cells from the diseased cells is the biophysical properties that differ significantly. Micro and nanosystems, such as solid-state micropores and nanopores, can measure and translate these properties of human cells and DNA into electrical spikes to decode useful biological insights. Nonetheless, such approaches result in large data streams that are often plagued with inherit noise and baseline wanders. Moreover, the extant detection approaches are tedious, time-consuming, and error-prone, and there is no error-resilient software that can analyze large datasets instantly. The ability to effectively process and detect biological targets in larger datasets lies in the automated and accelerated data processing strategies using state-of-the-art distributed computing systems. To this end, we propose a distributed detection framework, which collects the raw data stream on a server node that then splits/distributes the data into segments across the worker nodes. Each node reduces noise in the assigned data segment using moving-average filtering, and detects the electric spikes by comparing them against a statistical threshold (based on the mean and standard deviation of the data), in a Single Program Multiple Data (SPMD) style. Our proposed framework enables the detection of cancer cells with an accuracy of 63% in a mixture of Cancer cells, Red Blood Cells (RBCs), and White Blood Cells (WBCs), and achieves a maximum speedup of 6X over a single-node machine by processing 10 gigabytes of raw data using an 8-node cluster in less than a minute.
Keywords :
bioMEMS; bioelectric potentials; biomedical equipment; blood; cancer; cellular biophysics; data analysis; filtering theory; medical signal processing; microsensors; molecular biophysics; nanomedicine; nanoporous materials; nanosensors; porosity; proteins; signal denoising; statistical analysis; 8-node cluster; DNA; accelerated data processing strategies; assigned data segment; automated data processing strategies; baseline wanders; biological insights; biological target detection; biological target identification; biological targets; biophysical properties; data streams; dataset analysis; disease diagnosis; disease prognosis; diseased human cells; distributed cancer cell detection; distributed detection framework; electric spikes; electrical spikes; extant detection approaches; healthy cells; high-throughput cellular spike streams; inherit noise; microsystems; moving-average filtering; nanosystems; noise reduction; proteins; raw data stream collection; red blood cells; server node; single program multiple data style; single-node machine; solid-state micropores; solid-state nanopores; standard deviation; state-of-the-art distributed computing systems; statistical threshold; white blood cells; Cancer; Cells (biology); Distributed databases; Nanobioscience; Noise; Smoothing methods; Distributed computing; accelerated-diagnosis; automated cancer cell detection; solid-state micropores;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on
Conference_Location :
Chicago, IL
Type :
conf
DOI :
10.1109/CCGrid.2014.108
Filename :
6846530
Link To Document :
بازگشت