Projects Currently Funded by the National Institutes of Health

NIH – R01 “Quantifying Molecular Consequences of Human Missense Variants with Large-Scale Interactome Perturbation Studies,” joint with Dr. Hiyuan Yu of the Weill Institute for Cell and Molecular Biology and Department of Biological Statistics and Computational Biology at Cornell.

The dramatic increase of DNA variants discovered through advances in sequencing technologies has been inadequately translated into therapeutic successes.  Although many of these variants are related to human disorders, the overwhelming number of non-functional variants makes the assessment of functional significance a steep challenge.  In this study, we aim to develop a high-throughput pipeline to quickly clone and directly test a large number of coding variants for their impact on the human interactome network and use the results to build a machine learning pipeline to predict functional impact of all coding variants, in anticipation that both our experimental data and computational pipeline will lead to broad clinical and therapeutic applications.

NIH – R01 “Genetic Transmission of Components of the Human Gut Microbiome,” joint with Dr. Ruth Ley, of the Max Planck Institute for Developmental Biology, Dr. Timothy Spector, of the Department of Genetic Epidemiology at King’s College, London and Dr. Ilana Brito, of the Meinig School of Biomedical Engineering at Cornell.

Despite the importance of both variation in the human genome and variation in the gut microbiome to human health, there is currently little knowledge connecting the two. But it is likely that variation in the human genome can result in differences in the composition of the gut microbiota with potential impact on disease outcomes. Results obtained from the proposed research will bridge this knowledge gap, and will ultimately be used to improve the lifestyles of individuals suffering from common diseases such as obesity and diabetes, and to develop preventative measures to mitigate the manifestation of disease.

NIH – R01 “Heterochromatin and Satellite Repeat Sequence Variation in Natural Populations,” joint with Dr. Daniel Barbash

This project aims to apply novel bioinformatics approaches and experimental designs to quantitatively describe and to understand the process of turnover of heterochromatic satellite DNA sequences in the genome. Focusing on the genomes of Drosophila species, we will make use of analysis of both short repeated “words” or “kmers” and complex satellite structures in the genome sequences of inbred lines of diverse species and of mutation-accumulation lines. We will model satellite changes as a Gaussian process, and score their meiotic behavior by testing for departures from Mendelian segregation by genome sequencing of backcross progeny.

NIH – R01 subcontract with University of Arizona “Reference-Quality Drosophila Genome Assemblies for Evolutionary Analysis of Previously Inaccessible Genomic Regions,” joint with Dr. Road Wing of the University of Arizona and Manyuan Long of the University of Chicago.

Dr. Clark’s role on this project is to investigate the evolution of previously inaccessible regions of the genome of Drosophila, including piRNA clusters, heterochromatic repeats and the Y chromosome.  The Clark lab will generate inbred lines and their F1 hybrids of Drosophila melanogaster and several other species of Drosophila for purposes of analysis of polymorphism and evolution of piRNA clusters and of heterochromatic repeats.  The whole-genome library construction and sequencing will be done at the University of Arizona by the lab of Dr. Rod Wing, and piRNA and miRNA libraries will be constructed and sequenced at Cornell.   Much of the effort in the Clark lab will be in computational analysis of these sequences, producing and testing the assemblies, and developing models of evolutionary divergence of these  genome elements (piRNA clusters and heterochromatin).  They will also examine the structure and organization of Y chromosomal genes and repeats.  The results will be shared with Drs. Manyuan Long (Chicago) and Rod Wing (University of Arizona) in this collaborative effort.

NIH – R01 “Regulation of Gamete Use and Neural Pathways in Reproduction,” joint with Dr. Mariana Wolfner.

Using Drosophila melanogaster as a model for male- and female-derived proteins that interact after mating and prior to fertilization, this project aims to test the roles of candidate genes for this process by a series of knockdown experiments tested across a range of natural variation. Aim 1 considers genes that are expressed in the female, while Aim 2 is focused on genes expressed in the male.   The project has significance to understanding the molecular nature of mating interactions, and we anticipate that the results will be relevant to idiopathic infertility in humans, which appears to arise from a reproductive incompatibility between the particular pair of individuals involved.

NIH-R21 “Improving the Efficiency and Control of CRSIPR/Cas9 Gene Drive Systems,”  an exploratory project with Drs. Jackson Champer and Philipp Messer of the Department of Computational Biology at Cornell

This project will develop new CRISPR gene drive strategies with reduced resistance potential and test them by s1tudying the spread of driver constructs over several generations in large cage populations of Drosophila melanogaster as a promising new way for reducing the number of people infected with malaria, dengue, Zika, and other mosquito-borne diseases.

NIH- R01 “Ethnic Differences in Iron Absorption,” joint with Drs. Kimberly O’Brien and Zhenglong Gu of the Department of Nutritional Sciences at Cornell.

Genetic variation in human populations is increasingly recognized to contribute to individual phenotypic differences, variable metobolic traits and differential susceptibility to common chronic and metabolic diseases.  Understanding genetic variations underlying metabolic traits is particularly important for iron (Fe), given that Fe deficiency remains the most widespread micronutrient deficiency worldwide, while Fe overload is thought to contribute to a number of common chronic diseases including type II diabetes, cirrhosis, cardiomyopathy and cancer.  This project takes a multidisciplinary approach to study genetic variations in genes that control Fe metabolism and utilization in order to shed light on the genetic basis of population differences in Fe metabolism and disease susceptibility and to inform population-specific dietary Fe intake recommendations with the long-term goal of minimizing risk of chronic diseases.

NIH-R01 “Improved Methods for Inference of Genotype-Specific Response to Environmental Toxins,” joint with Drs. Julien Ayroles of Princeton University and Noah Zaitlen of the University of California, San Francisco.

This collaborative project will quantify non-additive impacts of hexavalent chromium exposure on neurological phenotypes of Drosophila, using a novel single-fly genotype-phenotype mapping strategy that employs a genetically controlled outbred population.  The Clark lab is primarily responsible for RNAi and CRISPR validation of specific targeted genes discovered in the initial screen.

NIH-R01 “Learning Dynamics of Biological Processes from Time Course Omics Datasets,” joint with Drs. Sumanta Basu of Department of Biological Statistics & Computational Biology at Cornell, Martin Wells and Myung Hee Lee both of Weill Cornell Medical College.

Complex biological processes, including organ development, immune response and disease progression, are inherently dynamic. Learning their regulatory architecture requires understanding how components of a large system dynamically interact with each other and give rise to emergent behavior. Recent experimental advances have made it possible to investigate these biological systems in a data-driven fashion at high temporal resolution, allowing identification of new genes and their regulatory interactions. Longitudinal omics data sets are becoming increasingly common in clinical practice as well. Information on these collections of interacting genes can be integrated to gain systems-level insights into the roles of biological pathways and processes, including progression of diseases. Consequently, developing interpretable methods for learning functional relationships among genes, proteins or metabolites from high-dimensional time series data has become a timely research problem.