Evolution of genome contents
There are 2,500 fold differences in plant genome size, ranging from ~60 million bases in Genlisea margaretae, a carnivorous plant, to 150 billion bases in Paris japonica, the canopy plant. Unlike other eukaryotes, plants also tend to have younger new genes in gene families than other species. With the increasing ease in sequencing and assessing expression of the entire genome of any species, many parts of the genomes that are between known genes are now found to be expressed. For more information, check out:
- Our review paper discussing evolution of duplicate genes in plants.
- Two studies integrating multi-omics datasets with machine learning methods to identify functional region in Arabidopsis thaliana and in human genomes.
Evolution of transcriptional response to stress
Under stressful environment, the expression of hundreds to thousands of genes changes at once. This response is essential for setting up proper physiological and developmental programs in plants so they can survive in adverse environment. Thus, proper response to stress is most likely under significant amount of selective pressure. Example publications:
- Our PLoS Genetics paper on the evolution of stress response among duplicate genes.
- Our Plant Physiology paper detailing molecular evidence of functional decay in a transcription factor regulating stress response.
Regulatory logic & systems biology
How is gene expression regulated under stress? Substantial knowledge has accumulated on how short DNA motifs, cis-regulatory elements, in the promoter regions are involved in controlling gene expression. We are interested in deciphering the cis-regulatory code, i.e., how cis-elements work in concert to specify gene expression in response to stress. For more information on our research in this area, check out:
- Our PNAS paper analyzing stress expression data from Arabidopsis thaliana with systems biology approaches in the first comprehensive analysis of plant cis-regulatory code.
- Our Plant Physiology paper detailing a predictive model of spatial transcriptional response to stress.
- Our PLoS Comp Biol paper discussing the utility and limitation in using expression data for identifying functionally related genes.
Predictive biology with machine learning
Biology has become a data-rich discipline with a rapid influx of heterogeneous data in a rapid pace. Novel ways in looking at these data have the potential to answer new questions, make high quality predictions of many biological phenomenons through modeling, and, from predictive models, better understand the underlying mechanisms. Since our first foray into this area, our lab has applied machine learning methods in multiple contexts to predict molecular, physiological, and morphological phenotype. Below are some example publications in this area:
- Our study in PLoS Comp Biol that jointly considers sequence motifs, chromatin state, and DNA structure to predict transcription factor binding sites.
- Our study in Genome Research in predicting gene expression level using chromatin accessibility information.
- Our Plant Cell paper combining a large number of heterogeneous characteristics to predict whether a duplicate gene would be retained.
- Another Plant Cell paper detailing a model for predicting genes that will confer a lethal phenotype when mutated.