Title: Deep Learning based Large Scale Visual Recommendation and Visual
Semantic Embedding for E-Commerce.

Speaker: Dr. Krishnendu Chaudhury, Principal Data Scientist, Flipkart,
Time, Place: Mon. 29 Aug. 2016, 17.00, RM101


Two related but separate topics pertaining to e-commerce will be covered:
1. Deep learning based visual recommendation 2. Deep learning based visual
semantic embedding.

1. Deep learning based visual recommendation
Recommending catalog items that are visually similar to a catalog item the
user is  browsing is an important problem in e-commerce. We refer to this
problem as "CIVR" (Catalog Image based Visual Recommendation).
CIVR is a very challenging task in its own right, due to the extreme
variety among the catalog items that could be deemed similar - dress items
maybe hanging or laid flat on a table  or worn by different models or
mannequins having different complexions and/or hair color, standing in
different poses etc. Furthermore, the human notion of similarity is
extremely abstract and complex. Two t-shirts, one with a batman print and
another with a superman print, maybe called "similar" by human beings,
while, in terms of pixels, there maybe very little in common between the
two images.
The main contribution of this paper is a deep CNN architecture for visual
recommendation system,  which has been launched with much user
satisfaction.  Our deep network generates an embedding vector - Euclidean
distance between embedding vectors measures the (dis)similarity between
images. Our embedding captures high level abstractions as well as low level
details -  both of which are important for visual similarity. In this, we
provide practical evidence based support in favor of the parallel deep and
shallow network paradigm of Deep Ranking. The embedding generated by our
network is  quite robust to background, pose variations, partial views etc.
Finally, we provide experimental results on multiple related approaches to
empirically justify our approach

2. Deep learning based visual semantic embedding

A multimodal embedding is learnt whereby product images and visual
keywords describing them are jointly embedded into the same metric space.
This is used to improve search results and to identify discrepancies
between descriptive phrases and images of products.

About speaker:
Krishnendu Chaudhury is a Principal Data Scientist at Flipkart, Bangalore.
He completed his doctorate in computer science, specializing in computer
vision, at the University of Kentucky and has authored several publications
in IEEE and ACM journals/conferences and has over a dozen patents in
imaging and computer related technologies https://sites.google.com/site/
krishhomepage/Home. Prior to joining Flipkart in 2015, he worked at
Google, Mountainview for 10 years and Adobe Systems before that.

 Harish Karnick, Dept of Comp. Sc. & Engg.,
 IIT Kanpur-2018016, UP, India.
 email:hk(at)cse.iitk.ac.in, hk(at)iitk.ac.in
 phone: 91-512-6797601(W),-6798545(H),-2590725(X)