![]()
|
Gene Structure and Array DesignThe group focuses mainly on the analysis of genome-wide expression data to decipher mechanisms driving tissue-specific activation/repression of genes or distinct transcripts. As a complement we provide bioinformatics support related to the design of experiments and develop tools to facilitate processing/analysis of expression studies based on high-throuput technologies like microarrays or next-generation sequencing.
Prediction of alternative transcripts[Shobhit Gupta, Stefan Haas]
The huge variety of gene products expressed in an organism is to a major extend caused by differential usage of exons leading to alternative transcripts encoded by a single gene. In general, different mechanisms contribute to the formation of alternative transcript isoforms. While alternative splicing via the spliceosome causes variations in the usage of internal exons, alternative promoters and alternative polyadenylation will change the usage of alternative first or last exons, respectively.
By integrating public data sets targeted to determine transcriptional start sites (TSS) like CAGE and full-length clone data (DBTSS) together with Ensembl/RefSeq transcript annotations and our EST-based transcript predictions, we generated a comprehensive set of potential alternative transcriptional start sites. Our analysis revealed that most genes usually express a single, dominant transcript, whereas alternative transcripts are often expressed on lower levels. Although methods like CAGE are tailored to detect TSSs, still the traditional data sets uniquely contribute a significant number of TSSs to the overall set. Such a set of TSSs can be used to select a representative transcript per gene based on expression rather than on location, which might be more appropriate when studying gene regulation. Furthermore, the integration of TSS predictions allows fine-tuning of TSS localization e.g. important in cases where only small proximal promoter regions have to be analyzed. The predicted TSSs are visualized in our Promotion Genome Browser (promotion.molgen.mpg.de) together with additional genomic features dedicated to facilitate the interpretation and integration of important aspects associated with gene regulation (e.g. sequence conservation, transcription factor binding affinities (TRAP), EST data (GeneNest)) Tissue-specific gene regulation[Helge Roider, Sean O'Keeffe, Stefan Haas]
Regulation of gene expression is mainly controlled by chromatin modifications and the activity of specific transcription factors acting either on promoters close to the transcriptional start, or on distant enhancer elements. Since many genes are involved in a discrete biological context, the use of such contextual information is crucial to successfully unravel regulatory relationships between genes and transcription factors (TF). We therefore categorize genes according to their significance of expression in a certain tissue based on the statistical evaluation of EST data (T-STAG) or DNA-microarray/RNA-Seq expression measurements in a number of organisms (Human, Mouse, Sheep etc.). Given the TSSs of such co-expressed genes we rank their potential proximal promoter regions by DNA-binding affinity of known transcription factors using our method TRAP. In order to detect candidate regulatory TFs we developed the method PASTAA, which iteratively tests for genes significantly ranked by tissue-specific expression as well as by high TF binding affinity. This way we were able to computationally predict a large number of functional TF-tissue associations well supported by literature. Intriguingly, we found that TFs predicted to regulate genes expressed in a certain tissue are frequently themselves significantly expressed in the respective tissue. In line with results revealed in another computational study performed in our department we observed potential auto-regulatory loops for many of the TFs involved in tissue-specific gene regulation.
In a subsequent study we could show that the success in predicting functional associations is strongly related to the CpG content of the promoters investigated. Functional predictions for transcription factors with respect to tissue-specificity are far more successful for promoters with low CpG content. However, even in high CpG promoters we could find functionally plausible TF-tissue association, e.g. NRSF-brain, which could be hardly detected when analyzing the full set of promoters. Thus, our computational analysis highlights the importance of categorizing promoters into high and low CpG groups since those promoters may be regulated by alternative biological mechanisms. Next-generation sequencing[Hugues Richard, Anne-Katrin Emde, MarcelSchulz, Sean O'Keeffe, Ramu Chenna, Stefan Haas]
The recent advances in next-generation sequencing technologies (NGS) provide the opportunity to tackle biological questions on genome scale in an unprecedented quality. However, the huge amount of data generated using these technologies requires the development of new algorithms to handle the data efficiently but also to analyse this new type of sequence information. In this context, we were among the first studying the performance of next-generation sequencing in transcriptome sequencing (RNA-Seq) of two human cell lines. In collaboration with the group of Marie-Laure Yaspo we could show that NGS clearly outperforms state-of-the-art microarray technology in terms of sensitivity and noise thus revealing a more complete picture of the transcriptome. Although the transcriptomes in B- and HEK-cells are well studied we were able to discover extensions of exonic regions and a limited number of potential, so far unknown exons.
While the basic mapping of sequencing reads mainly provides information about genomic regions that are transcribed, details about differential splicing has to be determined by those sequencing reads mapping on exon junctions. We therefore generated an artificial set of all exon-junctions of a gene based on our comprehensive set of gene structures derived from EST data and the Ensembl database. The mapping of sequence reads to these junctions revealed a large number of alternative splicing events even with a relatively low sequencing depth.
In light of the steadily increasing sequence output of NGS we are currently implementing a generalised computational pipeline comprising statistical measures for quality control of sequencing runs as well as improved methods for efficient mapping of reads. In this context we are involved in the development of RazerS, a tool to perform read mapping allowing for short insertion/deletions, which is an important feature with respect to future applications aiming at the discovery of disease causing mutations by deep genomic sequencing. Selected Publications
Roider,H.G., Manke,T., O'Keeffe,S., Vingron,M., Haas,S.A. (2009) PASTAA: identifying transcription factors associated with sets of co-regulated genes. Bioinformatics, 25(4):435-442
Sultan,M., Schulz,M.H., Richard,H., Magen,A., Klingenhoff,A., Scherf,M., Seifert,M., Borodina,T., Soldatov,A., Parkhomchuk,D., Schmidt,D., O'Keeffe,S., Haas,S., Vingron,M., Lehrach,H., Yaspo,M.L. (2008) A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome. Science, 321(5891):956-960
Oberthuer,A., Berthold,F., Warnat,P., Hero,B., Kahlert,Y., Spitz,R., Ernestus,K., Koenig,R., Haas,S., Eils,R., Schwab,M., Brors,B., Westermann,F. and Fischer,M. (2006) Gene-expression based classification of neuroblastoma patients using a customized oligonucleotide-microarray outperforms current clinical risk stratification. J. Clin. Oncol., 24:5070-5078
Hecht,H., Kuhl,H., Haas,S.A., Bauer,S., Poustka,A.J., Lienau,J., Schell,H., Stiege,V., Seitz,V., Reinhardt,R., Duda,G.N., Mundlos,S. and Robinson,P.N. (2006) Gene Identification and Analysis of Transcripts Differentially Regulated in Fracture Healing by EST Sequencing in the Domestic Sheep. BMC Genomics, 7:172
Contact: Stefan Haas MPI for Molecular Genetics Computational Molecular Biology Ihnestr. 73 D-14195 Berlin Phone: + 49 + 30 8413 1164 Fax: + 49 + 30 8413 1152 Email: stefan.haas@molgen.mpg.de
|