Vinay Kumar Verma
I am a Ph.D. student at the Department of Computer Science and Engineering, IIT Kanpur. I am working under the guidance of Dr. Piyush Rai.
My interest of research includes Probabilistic Machine Learning, Deep Learning and Computer Vision. Currently I am working on Zero-Shot Learning
and Multilabel Zero-shot Learning. Thanks to MeitY for supporting my Ph.D. Fellowship. More about me can be found in the CV.
- [2016-**] Ph.D. in Computer Science and Engineering at CSE, IIT Kanpur.
- [2011-13] M.Tech. in Computer Science and Engineering at SCIS,
Univeristy of Hyderabad, Hyderabad.
- [2008-10] M.Sc. in Computer Science at University of Allahabad, Allahabad.
- [2005-08] B.Sc. in Mathematics at University of Allahabad, Allahabad.
- Bayesian Machine Learning
- Deep Generative Model
- Computer Vision
- Zero-shot Learning
- Generalized Zero-Shot Learning via Synthesized Examples
Vinay Kumar Verma, Gundeep Arora, Ashish Mishra Piyush Rai
CVPR 2018, Salt lake City, Utah, USA.
- A Generative Approach to Zero-Shot and Few-Shot Action Recognition
Ashish Mishra, Vinay Kumar Verma, M Shiva Krishna Reddy, Arulkumar S, Piyush Rai and Anurag Mittal
WACV 2018, Lake Tahoe,USA.
- Zero-Shot Learning via Class-Conditioned Deep Generative Models
Wenlin Wang, Yunchen Pu, Vinay Kumar Verma Kai Fan, Yizhe Zhang, Changyou Chen, Piyush Rai and Lawrence Carin
Conference on Artificial Intelligence (AAAI-18) 2018, Louisiana, USA.
- A Simple Exponential Family Framework for Zero-Shot Learning
Vinay Kumar Verma and Piyush Rai
ECML-PKDD 2017, Skopje, Macedonia
A Probabilistic Framework for Zero-Shot Multi-Label Learning
Abhilash Gaure, Aishwarya Gupta,Vinay Kumar Verma and Piyush Rai
UAI 2017, Sydney, Australia
A Performance Evaluation of Feature Descriptors for Image Stitching in Architectural Images
P Balasubramanian, Vinay Kumar Verma and Anurag Mittal
Hand Gesture Segmentation from Complex Color-Texture Background Image
Vinay Kumar Verma, R Wankar, CR Rao and A Agarwal
MIWAI-2013 Krabi, Thailand
Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition and Face Reconstruction
Facial expression recognition (FER) has been a common problem in the area of computer vision. This has applications in many different areas ranging from advertising, augmented reality, human computer interaction and human response analysis to name a few. This problem has similarities to the action recognition problem, however unlike the actions here problem are very subtle and fine-grained, hence a different approach has to be tried. Here we tried a different approaches of transfer learning using different loss functions, fine-tuning techniques for a generalized performance on expression, age and gender recognition. Another objective is to try the obtained deep embedding of the face for the task of image reconstruction/ inpainting.
You can find report here.
Spatial distance dependent Chinese restaurant processes for image segmentation
The distance dependent Chinese restaurant process(ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This project examines the ddCRP in a spatial setting with the goal of natural image segmentation. Here we explore the biases of the spatial ddCRP model better suited for producing human-like segmentations. You can find report here.
Content Based Visual Question Answer
Given an image and a natural language question about the image, the task is to provide an
accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are content based. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Here we concentrate only on the content based VQA we are not considering the any logical or reasoning question.
Pedestrian Detection using Aggregate Channel Feature(ACF)
Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact the quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images and video has grown steadily. Aggregate Channel Feature(ACF) with the Ada-boosting is one of the most popular technique to detect the pedestrian with high accuracy. The Reference can be find here.
Gesture based Robot Motion
Vision based Human Computer Interaction(HCI) was always a challenging task for the researcher. In this project, I am developing a mechanism to control the robot based on the given Gesture. The objective of the project is Segment the hand Gesture from the Complex Image and Classify the segmented hand gesture to one of the five predefined class(Left move, Right move, Forward move, Backward move and stop). Then after sending the Classification result to the Robot via a wireless media, Robot receives the signal and perform move according to received signal.
Honours and Awards
- Obtained the VVS-Fellowship for Ph.D. from the Ministry of Electronics and Information Technology
- National Eligibility Text(UGC-NET 2013,15) Qualified in Computer Science organized by University Grant Commission(UGC).
- In spare time, I like to challenge computer (or humans) in Chess.
- Cricket is my favorite outdoor sport activity.
- Inclination towards movies and music, adds another dimension to my personal life.
- At last but not least, news articles always make their space in my schedule.
RM-504K, Rajeev Motwani Building
Department of Computer Science & Engineering
IIT-Kanpur, U.P. India, 208016
Email: vkverma at cse dot iitk dot ac dot in