Publication Date



Open access

Embargo Period


Degree Type


Degree Name

Doctor of Philosophy (PHD)


Molecular and Cellular Pharmacology (Medicine)

Date of Defense


First Committee Member

Nagi Ayad

Second Committee Member

Stephan Schürer

Third Committee Member

Peter Buchwald

Fourth Committee Member

David Robbins

Fifth Committee Member

Stefan Wuchty


Biological information continues to grow exponentially fueled by massive data generation projects such as the Human Genome Project, The Cancer Genome Atlas (TCGA), and the Library of Integrated Network-based Cellular Signatures (LINCS). Unprecedented amounts and varieties of data (big data) have the potential to bring enormous scientific advances. Such data-driven research relies on advanced computational approaches for data integration and analysis. While bioinformatics encompasses many fields, the focus of my research has been to predict small molecule chemicals that interact with protein targets of interest and could, ultimately, become therapeutically useful drugs. Drug resistance in newly diagnosed tumors is often the major obstacle to the success of cancer chemotherapy. Understanding the molecular mechanisms underlying these conditions is necessary to develop therapeutic strategies that improve current clinical protocols. Heterogeneity in tumor cell populations challenges the efficacy of targeted therapeutics. However, research surrounding the understanding of adaptive cellular responses to targeted therapy has facilitated the development of combination therapies that disrupt these resistance mechanisms. We have developed new approaches to therapeutic discovery via molecular modeling and machine learning. This thesis presents an attempt to integrate biological and computational resources to discover novel therapeutic small molecules using ligand and structure-based modeling techniques. First, a general computational screening approach to identify novel multitarget kinase/bromodomain inhibitors from millions of commercially available small molecules is described. This pipeline identified eight novel BRD4 inhibitors, among them a first in class dual BRD4-EGFR inhibitor. To further characterize these compounds, I quantified their binding potential for BRD4 biochemically using an AlphaScreen assay and evaluated further improvements to our docking models by performing molecular dynamics (MD) simulations with those that displayed activity. Finally, to expand and improve the applicability and performance of my research to a more global predictive architecture, I applied multitask deep neural networks and single task learning methods to the problem of predicting ligand activity across the entire human kinome for which bioactivity information is available. I found that multitask deep learning improves enrichment of active compounds across all kinase targets, regardless of the amount of activity information and similarity between active kinase compounds. This research demonstrates that large-scale data-driven modeling approaches can result in novel small molecule discoveries and introduces a framework that can be utilized by the scientific community to improve computational screening and machine learning methodologies for drug discovery.


drug discovery; machine learning; computational chemistry; bioinformatics; molecular modeling; big data