Vinay Kumar Verma

Vinay Kumar Verma

I am a Ph.D. student at the Department of Computer Science and Engineering, IIT Kanpur. I am working under the guidance of Dr. Piyush Rai.
My interest of research includes Probabilistic Machine Learning, Deep Learning and Computer Vision. Currently I am working on Zero-Shot Learning
and Multilabel Zero-shot Learning. Thanks to MeitY for supporting my Ph.D. Fellowship. More about me can be found in the CV.

Educational Qualifications

Research Interests



  • Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition and Face Reconstruction
    Abstract: Facial expression recognition (FER) has been a common problem in the area of computer vision. This has applications in many different areas ranging from advertising, augmented reality, human computer interaction and human response analysis to name a few. This problem has similarities to the action recognition problem, however unlike the actions here problem are very subtle and fine-grained, hence a different approach has to be tried. Here we tried a different approaches of transfer learning using different loss functions, fine-tuning techniques for a generalized performance on expression, age and gender recognition. Another objective is to try the obtained deep embedding of the face for the task of image reconstruction/ inpainting. You can find report here.

  • Spatial distance dependent Chinese restaurant processes for image segmentation
    Abstract: The distance dependent Chinese restaurant process(ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This project examines the ddCRP in a spatial setting with the goal of natural image segmentation. Here we explore the biases of the spatial ddCRP model better suited for producing human-like segmentations. You can find report here.

  • Content Based Visual Question Answer
    Abstract: Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are content based. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Here we concentrate only on the content based VQA we are not considering the any logical or reasoning question.

  • Pedestrian Detection using Aggregate Channel Feature(ACF)
    Abstract: Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact the quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images and video has grown steadily. Aggregate Channel Feature(ACF) with the Ada-boosting is one of the most popular technique to detect the pedestrian with high accuracy. The Reference can be find here.

  • Gesture based Robot Motion
    Abstract: Vision based Human Computer Interaction(HCI) was always a challenging task for the researcher. In this project, I am developing a mechanism to control the robot based on the given Gesture. The objective of the project is Segment the hand Gesture from the Complex Image and Classify the segmented hand gesture to one of the five predefined class(Left move, Right move, Forward move, Backward move and stop). Then after sending the Classification result to the Robot via a wireless media, Robot receives the signal and perform move according to received signal.

Honours and Awards

  • Obtained the VVS-Fellowship for Ph.D. from the Ministry of Electronics and Information Technology
  • National Eligibility Test(UGC-NET 2013,15) Qualified in Computer Science organized by University Grant Commission(UGC).

Other Interests

  • In spare time, I like to challenge computer (or humans) in Chess.
  • Cricket is my favorite outdoor sport activity.
  • Inclination towards movies and music, adds another dimension to my personal life.
  • At last but not least, news articles always make their space in my schedule.

Contact Detail:

    RM-504K, Rajeev Motwani Building
    Department of Computer Science & Engineering
    IIT-Kanpur, U.P. India, 208016
    Email: vkverma at cse dot iitk dot ac dot in