Bioinformatics of Evolution, Microbiome Communities and Transcriptome Assembly

Bioinformatics of Evolution, Microbiome Communities and Transcriptome Assembly

The facilities of ARCHIE-WeSt are being used for a variety of projects requiring HPC power with large data sets. In these projects we are using Next Generation Sequencing (NGS) of DNA in a variety of biological contexts. NGS generates hundreds of millions of DNA sequences which require Tbytes of storage.

craft_figure1Initially we used the HPC to run MrBayes, a program for Bayesian inference of DNA and protein sequences for a range of phylogenetic and evolutionary models. The project investigated the effects of environmental pollutants on oyster and MrBayes determined the evolutionary relationship of the oyster sequences of xenobiotic metabolising enzymes to their homologues in other invertebrates and vertebrates. More recently the programme has been used to analyze sequences of egg proteins from fish. Our interest was in relating those sequences to ecological features of fish reproduction.

Another use of ARCHIE-WeSt has been in characterizing the microbial communities of rodent and human gut using the qiime package. This package takes paired-end reads, combines the sequences and then through a number of steps identifies taxonomic identity usually to the level of genus. Typically we identify 100 different genera and determine their relative abundance. In this area we are investigating how microbial community structure influences obesity and diabetes using a rodent model. We are also investigating changes of community structure, post-GI surgery in obese patients.

Lastly we use ARCHIE-WeSt to run Trinity, a package for sequence assembly and analysis. Again the input is paired end reads with some 600M read pairs currently undergoing analysis. We are investigating the transcriptome of the brain of guppy and how that might be related to behavioral characteristics of the fish. The Trinity suite allows assembly of the short input sequences (~150 bp) into full-length transcripts (up to 8000 bp) and can distinguish alternative isoforms. The assembled transcriptome is then used to determine the abundance of each transcript and applies statistical methods to establish Differential Expression under different biological conditions.

For more information about the project contact John Craft, (, Professor of Biological and Biomedical Science at the School of Health and Life Sciences, Glasgow Caledonian University.
For a list of the research areas in which ARCHIE-WeSt users are active please click here.