Publication Date



Open access

Embargo Period


Degree Type


Degree Name

Doctor of Philosophy (PHD)


Biostatistics (Medicine)

Date of Defense


First Committee Member

Jonnagadda Sunil Rao

Second Committee Member

Daniel J. Feaster

Third Committee Member

Shari Messinger Cayetano

Fourth Committee Member

Maria M. Llabre


Health disparity is an important public health policy concern as it is related to the social inequalities in population health, not only for the persons experiencing them, but also for the entire population in the society. People experiencing poorer health status will negatively impact the overall health of the nation and this inequality is costly as well as burdensome to healthcare system. Health disparities studies allow us to identify and understand disparities, eventually design intervention strategies that could be more effective in reducing disparities. This thesis is mainly focused on building new statistical models for disparity studies in response to the complex data types which we encounter today. We propose tree-based models to unveil the distribution of disparities in a population through the hierarchical interaction between individual level variables (like clinical variables or genetic variables) and social determinants of health (like SES, education level etc.) Precision medicine has the potential to revolutionize medicine because clinical decisions can in theory be made in a manner that is more customized to an individual patient. It’s not surprising then that there has been growing interest in trying to identify and reduce disparities using precision medicine constructs. Central to this paradigm is the search for what we term disparity subtypes. We will take the tree-based models we developed as well as another framework known as peeling and develop a new statistical framework for the identification of disparity subtypes. Even though much of disparity science has traditionally focused on social determinants of health, the move towards an integrative framework together with biological determinants requires that researchers must be able to find a common language and framework for connecting the two. Additionally, as biological data increases, so does the contextual information that researchers are collecting. Thus it’s imperative to be able to quantify the relative importance of different contextual factors for a given disease phenotype. To these ends, we will generalize the concept of genetic penetrance to contextual social determinants of health in a population and then derive a methodology in which to rank and prioritize different social determinants. This should aid our previous modeling efforts making it easier to identify important interactions by providing a method to screen through a collection of potential social determinants.


disparity estimation; tree-based algorithm; individual and environmental factor interaction