by Kouras Owzar, PhD
A proof-of-concept (POC) for a high-performance computing (HPC) platform to store, manage, and analyze large-scale genomics data for cancer research is under development by the Duke Cancer Institute’s Bioinformatics Shared Resource. This POC is being built on the Hadoop distributed-computing platform. An 8-node UCS high-performance cluster from Cisco and 100 TB E-series storage from Netapp are on loan to build the POC. Initially, the foucs is on data management and analysis of next-generation sequencing (NGS) data, with a data-analysis pipeline for RNA-seq and whole genome/exome variation. The capability for multi-omic data integration will be built in to allow the integration and analysis of genomics and proteomics/metabolomics data. Another goal of this POC is to provide the capability for integrating clinical/patient data with genomics data to enable translational research and personalized medicine. Preliminary performance testing of alignment/mapping for NGS data analysis has been completed. The initial results are very promising compared to performance on a single node.