DGAP

Home

News

Core Labs

People

Projects

Resources



advanced

Valid HTML 4.01!

Valid CSS!

Diabetes Genome Anatomy Project

Joslin Diabetes Center Harvard Medical School Dana-Farber Cancer Institute Children's Hospital Boston Whitehead Institute UMASS Medical School

Core Labs > Informatics Core


Informatics Core

Core Director: Isaac Kohane, M.D., Ph.D. (Children's Hospital Boston)

Specific Aims

The goal of the Bioinformatics Core (BC) will be to provide web-accessible annotation, cataloging facilities and state-of-the-art bioinformatics analyses. This will enable researchers from all DGAP projects to maximally utilize the gene expression, polymorphism and proteomic data sets to determine functional dependencies among the known genes and Expressed Sequence Tags (ESTs) and direct further biological validation of these putative dependencies. The other Cores address this critical biological validation step. The BC will connect data from the two-microarray cores using Affymetrix GeneChip arrays of human and murine expressed sequences. It will also integrate proteomic data and genotype data from the two Projects in the functional analyses, as demonstrated in the figure:

Informatics Core

Central to the BC is the use of a DGAP Collaboration Bus linking all parts of this proposal and allowing data to flow freely (described in more detail below). Data will only traverse the collaboration bus in open XML-based formats, including converting all gene identifiers to a standard nomenclature (such as LocusLink) and converting all microarray data to the MIAME standard (and later MAGE-ML when better established).

The specific aims of the BC are:

Aim 1:
Establish a DGAP collaboration bus allowing free flow of data of multiple bioinformatics types between investigators, and a web-accessible Diabetes Genomic Research Portal (DGRP) for data entry, annotation, and analysis. The DGRP will provide investigators from all DGAP Centers with access to phenotypically annotated microarray expression, proteomic, and genotyping data as well as a large set of analytic procedures with which to explore the shared data set for further hypothesis generation. It will also provide the mechanism for publishing the benchmark data sets to the larger world of diabetes investigators. Because of the use of our DGAP Collaboration Bus, the portal can be gene and protein oriented, in that users will be able to enter a gene name or symbol and immediately find (1) all data sets in which it was measured (even if the gene measured in the original data was not known under that name/symbol), and/or (2) those data sets where the gene or protein was detected. The portal will also offer registration for world-wide users. This will allow users to registered as "interested" in a particular gene or protein, so that when new data is deposited, those users can be immediately notified via e-mail using "push technology" if that gene/protein is detected in the new data.

Aim 2:
Develop "noise-aware" Benchmark Data Sets (BDS) for normal tissues (muscle, fat, liver) of murine models and humans. These data sets will include expression data, proteomic data from humans without glucose intolerance and wild type mice. An important role of the BDS will be to identify the sources of "noise" or variation and use these as one of the bases for evaluating the significance of the hypotheses generated in Specific Aim 3. The BDS will serve also serve as the comparison for all samples from individuals and animal constructs with various disorders of insulin signaling and glucose metabolism covered by the DGAP. Specifically, noise and comparative models will be made using those data points identified as being similar. Normal distributions for each gene, for each core, will be modeled and evaluated. For example, when the same sample is run at both microarray cores, a gene-specific error model will be constructed. For each gene, we will be able to construct a table reporting reproducibility, similar to the example below.

Aim 3:
Hypothesis Generation and Candidate Gene Identification: Use clustering and classification bioinformatics techniques across expression and proteomic patterns to identify functional dependencies between genes (known and/or EST's). Unsupervised machine learning techniques (e.g. clustering) will be use to identify global and insulin and diabetes-specific regulatory pathways. Particular emphasis will be placed on identifying critical genes in these regulatory pathways for inclusion in a candidate gene screening by the polymorphism identification projects of DGAP. Supervised machine learning techniques (e.g. classification) will be applied to identify differences in the expression profiles between the benchmark data set of unaffected individuals/mice and individuals/mice with glucose intolerance and similarly between the various tissue systems covered by the DGAP.

Protocols


Copyright © 2002 by Diabetes Genome Anatomy Project. All rights reserved. All documents on this Web site are the property of Diabetes Genome Anatomy Project and are protected by copyright. Any reproduction of any document on this Web site which omits Joslin's name or copyright notice is prohibited. Documents on this Web site may be reproduced for personal use only. They may not be distributed or sold. They may not be published in any other format (e.g., book, article, Web site) without the prior, written permission of Diabetes Genome Anatomy Project.

Please contact the webmaster with questions, comments, or suggestions.