Doctor of Philosophy (PHD)
Electrical and Computer Engineering (Engineering)
Date of Defense
First Committee Member
Second Committee Member
Manohar N. Murthi
Third Committee Member
Fourth Committee Member
Fifth Committee Member
The need for increased automation of learning, knowledge discovery, reasoning, and inference from the rapid growth of the availability of a multitude of various types of sensor/data feeds and databases has generated renewed interest in machine learning (ML). The practical utility of ML algorithms and their effectiveness greatly depend on how well one may learn the relevant parameters from data, and the parameter learning phase of modern ML environments has emerged as a significant challenge because of the increasing complexity of the data being gathered. Adequate representative statistical training data are often too costly to obtain or are simply unavailable; available real-world data are usually rife with incomplete, unknown, or missing entries due to a host of reasons, including simple data entry errors, security and privacy concerns, difficulty in obtaining data corresponding to infrequent events, and others. Data imputation strategies being employed to deal with data “missingness” run the gamut from interpolating the missing value from values of other variables, to using a data “missingness” probability distribution to estimate the missing value, to simply ignoring data records possessing missing values and using only the “clean” records to learn the parameters. Interpolating or employ- ing a “missingness” distribution for data imputation constitutes a recipe for making impaired decisions lacking trustworthiness when there is little or no evidence to support the assumptions made; disregarding data records possessing imperfections has the potential to destroy critical evidence. The main objective of this research work is to develop a comprehensive strategy that can model and account for a wider variety of data imperfections, including those that are generated from human-generated “soft” data; incorporate and propagate the information contained in these data imperfections throughout the decision-making process; conduct the learning, knowledge discovery, reasoning, and inference processes in a computationally efficient manner and generate conclusions that are appropriately calibrated to reflect the underlying uncertainties. The approach we take is based on a framework that employ interval-valued (i.v.) probability functions. They are better suited and offer more flexibility for handling a wider variety of uncertainties and they are what naturally arise in partial elicitation (when insufficient knowledge is available), when it is too time consuming to gather the necessary knowledge to estimate exact probabilities. We do not insist on any monotonicity condition on the i.v. probability functions we utilize, and take the viewpoint that these i.v. probabilities, which we refer to as PrBounds, emerge from a single underlying probability distribution about which agents have only partial information. With a fresh perspective of the i.v. counterpart notions of conditioning and independence, we then propose a framework which allows parameter learning, knowledge discovery, reasoning, and inference in a computationally efficient manner in much the same way as one would with probabilistic graphical models. We show how PrBounds could be extracted from imperfect datasets where the values of different attributes may be dependent and embrace more general evidential uncertainty. When the attribute values are unknown/missing or are known to lie within a set of values, PrBounds can be learned via a computationally tractable and efficient frequency counting method. The probabilities associated with an arbitrary imputation strategy, including the underlying “true” probabilities, are guaranteed to lie within the PrBounds learned in this manner. We also develop new Demspter-Shafer (DS) belief theoretic and PrBounds-based models of an imperfect implication rule which are consistent with Bayesian and classical logic models. We demonstrate how it can be fused with an imperfect antecedent to generate the PrBounds associated with the rule consequent. Finally, inspired by deep learning neural network architectures but operating within the proposed PrBounds-based framework, we develop what we refer to as a deep fusion network (DFN) which allows one to automate fusion of evidence from input data, fusion parameter selection, and classification of potentially uncertain data generated from multi-modal sensors.
Uncertain; Imperfect; Data; Machine Learning; Dempster-Shafer; Probability
Heendeni P. Don, Janith, "Learning and Reasoning with Imperfect Data" (2018). Open Access Dissertations. 2192.