*Abstract:* Recently there has been a lot of interest in learning common
representations for multiple views of data. These views could belong to
different modalities or languages. Typically, such common representations
are learned using a parallel corpus between the two views (say, 1M images
and their English captions). In this work, we address a real-world scenario
where no direct parallel data is available between two views of interest
(say, V1 and V2) but parallel data is available between each of these
views and a pivot view (V3). We propose a model for learning a common
representation for V1, V2 and V3 using only the parallel data available
between V1V3 and V2V3. The proposed model is generic and even works when
there are n views of interest and only one pivot view which acts as a
bridge between them. There are two specific downstream applications that
we focus on (i) Transfer learning between languages L1,L2,...,Ln using a pivot
language L and (ii) cross modal access between images and a language L1
using a pivot language L2. We evaluate our model using two datasets : (i)
publicly available multilingual TED corpus and (ii) a new multilingual
multimodal dataset created and released as a part of this work. On both
these datasets, our model outperforms state of the art approaches.

Joint work with Janarthanan R., Sarath Chandar A. P., and Mitesh Khapra.


Prof. Ravindran is currently an associate professor in Computer Science at
IIT Madras. He has nearly two decades of research experience in machine
learning and specifically reinforcement learning. Currently his research
interests are centered on learning from and through interactions and span
the areas of data mining, social network analysis, and reinforcement