2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM)
Download PDF

Abstract

Due to large heterogeneity gaps between image, text, and video, finding content similarities of multimedia data is a challenging problem yet to be resolved. In this paper, we propose to integrate high-level feature extractions and learning of the common data representations based on latent semantic regression for hashing multimedia data in a unified optimization framework. We show that the proposed latent semantic regression approach results in a discriminative solution maximizing the inter-modal correlation while preserving the intra-modal similarity of high-level features. In this way, the heterogeneous data are embedded into their common label space more effectively. The embedded feature representations have a natural interpretation as being proportional to the probabilities of classes that each sample belongs to. The training time complexity of the proposed learning scheme is linear with the data size, i.e., O(N) and its experimental results on popular multimodal datasets of Wiki and NUS-WIDE can significantly improve cross-modal hashing by up to 28% in term of the mean average precision (mAP) value.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles