Publication Date




Embargo Period


Degree Type


Degree Name

Doctor of Philosophy (PHD)


Biomedical Engineering (Engineering)

Date of Defense


First Committee Member

Justin C. Sanchez

Second Committee Member

Ozcan Ozdamar

Third Committee Member

Abhishek Prasad

Fourth Committee Member

Jorge E. Bohorquez

Fifth Committee Member

Christopher Bennett


Brain-Machine Interfaces (BMIs) have the potential of restoring functionality of persons suffering from paralysis and amputations. At present, BMIs have been developed to use cortical neural signals and control prosthetic devices or to stimulate paralyzed limbs. However, these BMIs rely on an external training signal (usually desired kinematics) as a reference to infer an error signal to be able to adapt the decoder appropriately and learn the task. For amputees and paralyzed persons, a desired kinematic cannot be measured directly. We propose to acquire an error or reward signal from the brain itself as a training signal for motor decoders. For this, we adopt Actor-Critic Reinforcement Learning (RL) paradigm to use as a BMI. There are several challenges associated with obtaining an error signal from the brain. One of the challenges is due to the unstationary nature of neural signals, the classification of the error being low and there being no indication as to the level of accuracy of the signal. If an indication can be extracted as to the accuracy of the signal, we propose that such a system can maintain performance of the BMI even when the error signal is less than perfect. This is done by incorporating a confidence metric in the weight update rule, where the confidence metric indicates the accuracy of the signal. We propose a synchronous BMI where the forward path is provided by the motor cortex and the feedback path is provided by the striatum. Computer simulations on synthetic data were performed to test the architecture. The confidence metric mentioned above can be obtained by different methods; the distance to the boundary and a probabilistic measure were implemented. The confidence arrived from the different classification methods (distance/ probability) was thresholded to give three output classes indicating rewarding, non-rewarding or ambiguous. As the threshold increased from zero, the performance increased and as the threshold increased further, the performance dropped. By this we conclude that there exists an optimum threshold for the Critic data where even though the Critic feedback is noisy, the Actor can maintain its performance. The said system was implemented in closed-loop with a monkey using a probabilistic classifier, where the probability of the Critic output belonging to one class or the other was used as the confidence measure. With using the confidence measure the performance of the system was improved.


Brain-Machine Interfaces; Reinforcement Learning; Biological Feedback; Feedback Confidence; Machine Learning