Publication Date




Embargo Period


Degree Type


Degree Name

Doctor of Philosophy (PHD)


Biostatistics (Medicine)

Date of Defense


First Committee Member

Hemant Ishwaran

Second Committee Member

J. Sunil Rao

Third Committee Member

Daniel J. Feaster

Fourth Committee Member

Miroslav Kubat


Extending previous work on quantile classifiers, we propose a random forests quantile classifier for the class imbalanced data problem. The new random forests classifier assigns an example to the minority class if the minority class conditional probability exceeds the unconditional probability of observing a minority class instance. The motivation for our random forests quantile classifier stems from a density-based approach and leads to the useful property that it maximizes the sum of the true positive and true negative rates. Moreover, because the procedure can be equivalently expressed as a cost-weighted Bayes classifier, it also minimizes weighted risk. Because of this dual optimization, unlike the traditional Bayes classifier, the random forests quantile classifier can achieve near zero risk in highly imbalanced problems, while simultaneously optimizing true positive and true negative rates. A common strategy employed by classifiers for imbalanced data is to undersample the majority class and apply Bayes rule. We show this strategy allows the Bayes rule (a median-classifier, q=0.5) to achieve the goal of jointly optimizing true positive and true negative rates. At the same time, we show that the random forests quantile classifier is invariant to such sampling strategies and retains its optimality regardless. Moreover, we show it outperforms undersampling with respect to G-mean performance and variable selection in rare, high-dimensional, and high-imbalanced settings.


Weighted Bayes Classifier; Random Forests; Class Imbalance; Minority Class; Class Probability Estimation; Unequal Misclassification Costs

Available for download on Saturday, May 30, 2020