Publication Date

2013-04-17

Availability

Open access

Embargo Period

2013-04-17

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PHD)

Department

Electrical and Computer Engineering (Engineering)

Date of Defense

2012-12-11

First Committee Member

Xiaodong Cai

Second Committee Member

Kamal Premaratne

Third Committee Member

Mei-Ling Shyu

Fourth Committee Member

Dimitris Papamichail

Fifth Committee Member

Nigel John

Abstract

Alternative splicing of precursor mRNA (pre-mRNA) provides an important means of regulating gene expression and generating transcriptomic and proteomic diversity in most eukaryotes. A number of special proteins, named splicing factors, can regulate the alternative splicing process by binding to certain short subsequences on pre-mRNA, named splicing regulatory elements (SREs). Therefore, identification of these SREs and prediction of their combinatorial effects are very important to the understanding of the mechanisms that regulate splicing. In this dissertation, we develop two methods for identifying SREs and their interactions. In the first method, we use the traditional enrichment-based approach, which identifies SREs by comparing frequencies of all hexamers in two discriminative data sets generated from mouse RNA-Seq data. The SREs are identified as hexamers that are enriched in the positive data set but under-represented in the negative data set. We also analyze the position preference of the identified SREs and compare their frequencies in constitutive exons and alternatively spliced exons. In the second method, we first derive a mathematical model for splicing regulation based on the principles of thermodynamics. We include the effects of both SREs and interactions between two SREs in the model. We then apply the model to identify SREs and SRE interactions with linear regression. Since the linear regression model contains a very large number of variables, the traditional inference method does not perform well. To overcome this problem, we develop a novel framework for inferring the high-dimensional linear model. Finally, we systematically study the alternative regions, arising from alternative splicing, alternative first exon or alternative last exon events in 105 breast cancer patients using RNA-Seq data. The identified aberrant alternative regions show very interesting associations with cancer development and provide important candidates for cancer diagnosis and cancer therapies.

Keywords

splicing; RNA-Seq; SRE; interaction; cancer

Share

COinS