Validation and Submission of Data and Analysis to the Gene Expression Omnibus (GEO) Repository

by Salvatore Mungal

The GEO repository, hosted by the National Center for Biotechnology Information (NCBI), is a public functional genomics repository that accepts array and sequence-based data supporting MIAME-compliant data submissions. The DCI Bioinformatics and Information Systems Shared Resource groups duplicated analysis environments in the validation process to accurately reproduce microarray data analysis before submission to the GEO repository. This validation process was performed by creating multiple environments using Debian GNU/Linux 64-bit operating system on local and networked virtual servers. The same R version used in the original analysis was installed, followed by the precise loading by version numbers of the required R packages. The data was propagated to the other environments, and the integrity was ascertained by matching MD5 checksums of the original data. The Shared Resource groups have successfully validated and submitted their first study submission to GEO with validation confirmation by all of their duplicated environments.

NCBI’s goal is to advance science and health by providing access to biomedical and genomic information.