Welcome
        
        
            Welcome to the homepage of the Computer Vision Group of IIT Kanpur. We are a group of
            faculty and students working on exciting problems on the recently very popular area of
            Computer Vision and applied Machine Learning as well as on the intersection of those with
            Signal Processing and Robotics. We are interested primarily in the research problems and
            general directions given below, but are also adaptable and receptive to new interesting
            problems that may come up in the near future.
        
        
            This webpage is under construction so keep checking back for more information.
        
        
            Vision and Language
            
            
            
                The progress in visual and textual processing and understanding has happened in
                relatively distinct threads, traditionally. More recently vision and language methods
                and algorithms have been combined towards applications such as image captioning and
                visual question answering. Eg. the image on the left would be captioned
                automatically as A dog chasing a ball or a question could be posed
                with that image as What color ball is the dog chasing? with a
                possible answer as 'white'. We are interested in such problems where the
                complementarity of vision and language models is exploited and novel algorithms and
                problems are designed to address relevant challenges and applications.
            
         
        
        
            Face and Human Analysis
            
            
            
                Visual data is increasing at a very high rate — everyone has a camera in her/his
                pocket and an internet connection to share picture and videos. Most of such human
                generated visual data has, in turn, humans as the main subjects. Hence, analysis and
                understanding of human centered visual data is an important part of Computer Vision.
                We are interested in many problems focusing on humas such as (i) facial analysis:
                predicting identity, emotions, intent from faces, (ii) human attributes prediction:
                the kind of clothes the person is wearing, the accessories the person is wearing,
                (iii) pose estimation, (iv) action/activity prediction.
            
         
        
        
            Human Behavior Analysis
            
            
            
                Our research direction on human behavior analysis lies in the intersection of
                Computer Vision, Signal Processing and Machine Learning. Human behavior is
                inherently multimodal, and hence requires combining information from other
                modalities (speech or language, for example) with vision. Through the confluence of
                these techniques, our goal is to provide a quantitative understanding of individual,
                group and social human behavior, in domains, such as, media, education, and health.
            
         
        
        
            
Perception/Vision for Robotics
            
            
            
                Today, robots are used to perform challenging tasks that were not possible few years ago
                because of limited computational and sensor resources. In order to perform these complex
                tasks, robots need to sense and understand the environment around them. Depending upon
                the task at hand, robots are often equipped with different sensors to perceive their
                environment. Two important categories of perception sensors mounted on a robotic
                platform are: (i) Range Sensors — 3D/2D lidars, radars, sonars, etc. (ii)
                Cameras — perspective, stereo, omnidirectional, etc. With the recent
                advancements in these sensing technologies, the capabilities of robots to perform
                difficult tasks has been greatly extended. The computer vision group in IIT Kanpur
                is interested in research problems related to sensing for robotics applications. One
                such example is autonomous navigation of robots where techniques from computer
                vision are used for localization of robots and for obstacle detection and
                classification.
            
        
        
        
            
Assistive Computer Vision
            
            
            
                In this research direction, we investigate the scope of computer vision for
                assisting human beings in day-to-day life. We develop algorithms for a class of
                related problems in computer vision. This area is becoming more and more practical
                with the popularity of wearable cameras (eg. google glasses) and lightweight
                computing devices (eg. mobile phones). Our use case centers around a wearable or
                portable camera capturing the world around as still images or video streams. The
                goal is to provide appropriate inputs to the human to help in a specific set of
                tasks. For instance, a visually challenged person uses a wearable camera to know the
                surroundings. The automated understanding of the content from the images/videos, is
                then used as an alternate input to enrich the interaction with the external world.
                For example, this person can read a text or sign, locate a specific object of
                interest, anticipate the pose of the object for manipulation, know the identity of a
                person around and appreciate the facial expressions of the people around.