|

|
Diabetes Genome Anatomy Project |
|
Core Labs > Informatics Core
Informatics Core
Core Director: Isaac Kohane, M.D., Ph.D. (Children's Hospital Boston)
Specific Aims
The goal of the Bioinformatics Core (BC) will be to provide web-accessible
annotation, cataloging facilities and state-of-the-art bioinformatics analyses.
This will enable researchers from all DGAP projects to maximally utilize the
gene expression, polymorphism and proteomic data sets to determine functional
dependencies among the known genes and Expressed Sequence Tags (ESTs) and
direct further biological validation of these putative dependencies. The other
Cores address this critical biological validation step. The BC will connect
data from the two-microarray cores using Affymetrix GeneChip arrays of human
and murine expressed sequences. It will also integrate proteomic data and
genotype data from the two Projects in the functional analyses, as demonstrated
in the figure:

Central to the BC is the use of a DGAP Collaboration Bus linking all parts of
this proposal and allowing data to flow freely (described in more detail
below). Data will only traverse the collaboration bus in open XML-based
formats, including converting all gene identifiers to a standard nomenclature
(such as LocusLink) and converting all microarray data to the MIAME standard
(and later MAGE-ML when better established).
The specific aims of the BC are:
Aim 1:
Establish a DGAP collaboration bus allowing free flow of data of multiple
bioinformatics types between investigators, and a web-accessible Diabetes
Genomic Research Portal (DGRP) for data entry, annotation, and analysis. The
DGRP will provide investigators from all DGAP Centers with access to
phenotypically annotated microarray expression, proteomic, and genotyping data
as well as a large set of analytic procedures with which to explore the shared
data set for further hypothesis generation. It will also provide the mechanism
for publishing the benchmark data sets to the larger world of diabetes
investigators. Because of the use of our DGAP Collaboration Bus, the portal
can be gene and protein oriented, in that users will be able to enter a gene
name or symbol and immediately find (1) all data sets in which it was measured
(even if the gene measured in the original data was not known under that
name/symbol), and/or (2) those data sets where the gene or protein was
detected. The portal will also offer registration for world-wide users. This
will allow users to registered as "interested" in a particular gene
or protein, so that when new data is deposited, those users can be immediately
notified via e-mail using "push technology" if that gene/protein is
detected in the new data.
Aim 2:
Develop "noise-aware" Benchmark Data Sets (BDS) for normal tissues
(muscle, fat, liver) of murine models and humans. These data sets will include
expression data, proteomic data from humans without glucose intolerance and
wild type mice. An important role of the BDS will be to identify the sources of
"noise" or variation and use these as one of the bases for evaluating
the significance of the hypotheses generated in Specific Aim 3. The BDS will
serve also serve as the comparison for all samples from individuals and animal
constructs with various disorders of insulin signaling and glucose metabolism
covered by the DGAP. Specifically, noise and comparative models will be made
using those data points identified as being similar. Normal distributions for
each gene, for each core, will be modeled and evaluated. For example, when the
same sample is run at both microarray cores, a gene-specific error model will
be constructed. For each gene, we will be able to construct a table reporting
reproducibility, similar to the example below.
Aim 3:
Hypothesis Generation and Candidate Gene Identification: Use clustering and
classification bioinformatics techniques across expression and proteomic
patterns to identify functional dependencies between genes (known and/or
EST's). Unsupervised machine learning techniques (e.g. clustering) will be
use to identify global and insulin and diabetes-specific regulatory pathways.
Particular emphasis will be placed on identifying critical genes in these
regulatory pathways for inclusion in a candidate gene screening by the
polymorphism identification projects of DGAP. Supervised machine learning
techniques (e.g. classification) will be applied to identify differences in the
expression profiles between the benchmark data set of unaffected
individuals/mice and individuals/mice with glucose intolerance and similarly
between the various tissue systems covered by the DGAP.
Protocols
|