Publication Date



Open access

Embargo Period


Degree Type


Degree Name

Doctor of Philosophy (PHD)


Computer Science (Arts and Sciences)

Date of Defense


First Committee Member

Ubbo Visser

Second Committee Member

Hüseyin Koçak

Third Committee Member

Stephan Schürer

Fourth Committee Member

Geoff Sutcliffe

Fifth Committee Member

Stefan Wuchty


Technological advancements in many fields have led to huge increases in data production, including data volume, diversity, and the speed at which new data is becoming available. In accordance with this, there is a lack of conformity in the ways data is interpreted. In-depth analyses making use of various data types and data sources, and extracting knowledge has become one of the many challenges with this big data. This is especially the case in life-sciences where simplification and flattening of diverse data types often leads to incorrect predictions. Effective applications of big data approaches in the life sciences require better, knowledge-based, semantic models that are suitable as a framework for big data integration, while avoiding overly extreme simplification, such as reducing various biological data types to the gene level. A major challenge in developing such semantic knowledge models, or ontologies, is the knowledge acquisition bottleneck. Automated methods are still very limited and significant human expertise is required. In this research, we describe a methodology to systematize this knowledge acquisition and representation challenge, termed KNowledge Acquisition and Representation Methodology (KNARM). We also present how KNARM was applied on three ontologies: BioAssay Ontology (BAO), LINCS FramEwork Ontology (LIFE) ,and Drug Target Ontology (DTO) built for three different projects: BioAssay Ontology, Library of Integrated Network-Based Cellular Signatures (LINCS), and Illuminating the Druggable Genome (IDG), and how they work together in complex queries.


ontology; semantic web; data science; semi-automated ontology building; ontology building methodology