Publication Date



Open access

Embargo Period


Degree Type


Degree Name

Doctor of Philosophy (PHD)


Electrical and Computer Engineering (Engineering)

Date of Defense


First Committee Member

Mei-Ling Shyu

Second Committee Member

Xiaodong Cai

Third Committee Member

Saman Aliari Zonouz

Fourth Committee Member

Nigel John

Fifth Committee Member

Shu-Ching Chen


With the rapid development of smart devices and ever-increasing popularity of social media websites such as Flickr, YouTube, Twitter and Facebook, we have witnessed a huge increase of multimedia data. Recently, social media technology and big data have converged to provide rich content on what happens around the world via texts, images, videos, audios, etc. Given the enormous volumes of multimedia data, efficient and effective retrieval of relevant information according to users’ needs poses great challenges for traditional text-oriented storage and retrieval systems. Since manually annotating and managing the huge amount of information have become infeasible, data-driven approaches have received more and more attention. As a result, mining interesting patterns and human understandable semantic features automatically from raw multimedia data to facilitate large-scale knowledge discovery and information retrieval has become an essential research task in today’s multimedia big data analysis. One of the central problems in multimedia big data analysis is automatic data annotation. The automatic annotation for human readable patterns such as the semantic concepts in video or image data sets provides the foundation for content-based search and retrieval. The biggest challenge that the researchers face now is the semantic gap problem, which is the gap between low-level features and high-level concepts. A lot of efforts have been made in bridging this gap in the multimedia research field. This dissertation mainly focuses on utilizing content information and inter-label correlations to improve annotation accuracy. The multimedia data annotation problem is first converted to a multi-label or a multi-class classification problem. Next, in order to model correlations among labels mathematically, we design an association affinity network (AAN) to capture such correlations. In multimedia data sets, the label correlation is usually hidden information. In addition, a large number of associations can be noisy. The scores from different models are also of different scales. Facing these challenges, we propose several steps in utilizing the AAN to address these issues. First, the output scores from different models are normalized using the Bayesian posterior probability approach. Second, the nodes and links are modeled properly under a given application scenario. Third, a link selection module is proposed to filter the noisy links using association rule mining and correlation mining. Next, the weights of different links are computed using different models, such as the collaboration model and the regression model. Finally, the newly computed scores are used for classification and/or data retrieval purposes. In addition, negative correlations, which have rarely been explored, are studied and utilized in this dissertation. The proposed framework is applied to two real-world applications: the multi-label high level semantic concept detection and the multi-class biomedical image temporal stage annotation. Experiments utilizing the benchmark data sets, such as the TRECVID semantic indexing data sets and the IICBU biomedical image data set, have been conducted to evaluate the effectiveness of the proposed framework. Generally speaking, the proposed framework achieves promising results. The contributions of different components are evaluated. The experimental results demonstrate that modeling and utilizing the inter-label correlation properly could help improve multimedia data annotation accuracy and bridge the semantic gap. As the extensions of the existing framework, several future research directions are also proposed.


association affinity network; big data; multimedia concept detection; multimedia information retrieval; bioimage informatics; data mining