Multimodal diff-hash

Nov 7, 2011

Michael M. Bronstein

1 Introduction

The need to compute similarities between diverse objects is fundamental in multiple fields, including medical imaging and biometric security. We often face the necessity to compare entities as varied as functions, images, geometric shapes, or text documents, each with its unique notion of data similarity.

This process becomes more critical when dealing with multimodal data, characterized by variable representation, structure and dimensionality. Commonly encountered in medical imaging and multimedia retrieval, such data require unique metrics since they are as incomparable as apples to oranges. Yet, in many instances, it is possible to learn multimodal similarity from examples which contribute to creating efficient representations.

This paper is dedicated to the development of a new, efficient multimodal hashing algorithm. This innovative yet simple algorithm significantly outperforms the state-of-the-art methods in encoding multimodal similarity.

2 Background

The comprehensive problem of multimodal hashing focuses on the representation of varied modality data in a common space, with the goal to construct a single metric that preserves intra- and inter-modal similarities. The simplified setting of this problem, cross-modality hashing, primarily concentrates on the inter-modal dissimilarity while ignoring the intra-modal dissimilarities.

Essentially, the problem of cross-modality hashing boils down to finding two embeddings - ξ and η - such that the inter-modal dissimilarity is optimized. In certain instances, this dissimilarity is represented as a binary, making it hard to model but reasonably easy to sample on subsets of data.

3 Cross Modality Similarity-Sensitive Hashing (CM-SSH)

The Cross-Modality Similarity-Sensitive Hashing (CM-SSH) method was introduced as an extension of the similarity-sensitive hashing method that aimed at boosting-based hashes. Each hash dimension in this method acts as a weak binary classifier with AdaBoost being used to maximize the performance of these classifiers.

However, the high computational complexity and inclination to generate lengthier hash combinations limit the practical efficiency of the boosting-based CM-SSH method. This has led to the exploration of alternative, more efficient hashing techniques.

4 Cross-Modality Dif-Hash (CM-DIF)

A simpler approach called dif-hash was proposed to create similarity-sensitive hash functions. The optimal cross-modality hashing was obtained by minimizing the loss with respect to the embedding functions, which eventually equated to minimizing the correlations with respect to the projection matrices and threshold vectors.

The resultant Cross-Modality Dif-Hash (CM-DIF) optimizes the difference between expected values to determine the optimal projection matrices and subsequently, compute the best-fit thresholds. This method simplifies the non-convex optimization problem by minimizing correlations through covariance difference matrices and constraints on the projection matrix.

In conclusion, the challenges of multimodal hashing call for a simpler, more efficient hashing algorithm. The proposed Cross-Modality Dif-Hash (CM-DIF) algorithm brings forward the opportunity to significantly enhance multimodal hashing and consequently outperform the state-of-the-art methods currently in use.

Sign up to AI First Newsletter

Recommended

We use our own cookies as well as third-party cookies on our websites to enhance your experience, analyze our traffic, and for security and marketing. Select "Accept All" to allow them to be used. Read our Cookie Policy.