Project Proposal
Visual Categorization: Basic vs Super/Sub-ordinate levels

Bhuwan Dhingra (Y8167)
Mentor: Prof. Amitabh Mukherjee

 

Introduction

How we categorize objects into different semantic categories (such as dog, cat, chair etc.) has been a widely researched topic in the field of Cognitive sciences. One of the most widely accepted theories in this regard has been the theory of natural categories proposed by Eleanor Rosch in 1973, [1]. It claims that there exists a natural hierarchy of three levels of categorizations - basic level, subordinate level, and superordinate level. Further, out of these three, in taxonomies of different objects the basic level is the one most favored and accessed first, [2]. Recently, however there have been some experiments which counter Rosch's theory and claim that super-ordinate level categorization is accessed first when only visual features are considered, [3]. Another claim which has been made several times in the literature is that expertise plays a significant role in determining which level is the most favored one, [4]. In the current project I aim to use a computational model based on the Bag-of-Features approach to study these categorizations. The goals of such a study are two-fold:

  •  To see which level of categorization does the computational model, when only visual features are used.

  •  To study the effect of Expertise in categorization by varying the training set sizes.

In the following section I give a brief overview of the past research in the area of cognitive object categorization. In the section after that I give a brief introduction to the Bag-of-Features model for object categorization. Finally, I outline my approach and the hypothesis of the study.

 

Natural Categories

Rosch, in her seminal work of object categorization in 1976, argued that the stimuli we receive from our surroundings do not make up a continuous set of all possible attributes. Instead these attributes are structured into naturally occurring groups. For example, the attribute of wings in an object is most likely to occur with the attribute of feather and less likely to occur with the attribute fur, [2]. This led her to define certain basic level categories, which correspond to these naturally occurring cuts in the input stimuli space. Another way of looking at these basic level categories is that they maximize the ratio of between-class variance of attributes to within-class variance of attributes. Examples include dog, chair, car etc. Stemming from these basic level categories are subordinate and superordinate level categories. Subordinate levels are obtained by dividing the basic level into finer categories, and superordinate levels are obtained by combining the basic levels to form a higher level. Analogous examples would be Dalmatian, armchair, and SUV for the subordinate level, and animal, furniture, and vehicle for the superordinate level. In a series of experiments described in [2], Rosch shows how the basic level is accessed first in any taxonomy of objects.


Obtained from http://jonathanhungworks.wordpress.com/2009/03/23/monday-mentor-eleanor-rosch/

Recent studies, however, have shown that in a fast go/no-go categorization task subjects take a significantly greater time in identifying basic level categories as compared to the superordinate level categories, [3]. In a fast go/no-go task there is time enough for only the visual features of the objects to be processed, hence it is hypothesized that the dominance of the basic level is due to the lexical/semantic accessing required in the experiments conducted by Rosch. In terms of pure visual features however, the superordinate category might be dominant. It must be noted however, that their experiments were only on animate categories where visual features do posses greater similarities as compared to inanimate categories.

Another interesting aspect about the hierarchy of categorizations is the role of expertise in determining them. It is intuitive and has also been shown by studies [4], that the subordinate categories for most people might be basic categories for experts in that area. For example, fish would be a basic level category for most of us but for an ichthyologist it might be a superordinate category. Such a claim was also made by Rosch in her original paper, [2].

 

Bag-of-Features

The bag-of-features model for image classification has displayed excellent performance for image classification, [5]. It is based on the detection of keypoints, which are basically certain salient regions within an image, for representation. While there are several techniques to extract these keypoints, Jiang et al. show in [5] that the Scale Invariant Feature Transform (SIFT) descriptor provides the best experimental performance. Once these keypoints are detected, all the bag-of-features models proceed in the same way - clustering techniques are used to form a codebook of descriptors, and each image is characterized by a histogram of the concentration of these descriptor. This histogram is the feature vector of an image. The entire process is illustrated below (images obtained from http://www.cs.unc.edu/~lazebnik/spring09/lec18_bag_of_features.pdf):



In terms of the current problem, the significance of the bag-of-features approach lies in the concept of cue-validity. The cue-validity of a category is defined as the frequency with which an attribute is associated with the category, summed over all attributes of the category. In the visual context, these cues are nothing but the keypoints detected in the bag-of-features approach. Since, Rosch defined basic level categories as the categories which have maximum cue-validity it would be interesting to see the results of the computational model.

 

Approach

To facilitate categorization, a taxonomy of object images would be formed, which will be input to the model. Since, most human learning is unsupervised over here too unsupervised learning would be applied. Hierarchical clustering is a popular approach where the number of clusters are not known beforehand. The model would then categorize the taxonomy in two steps - firstly it would need to extract features from each image using the bag-of-features method, then it would use hierarchical clustering to form natural categories. By varying the number of training images of different classes (say a lot more animal images than furniture images) we can study how expertise affects categorization.

Dataset - Caltech256 (available at http://www.vision.caltech.edu/Image_Datasets/Caltech256/)

Bag-of-features implementation - VLFeat open-source library (http://www.vlfeat.org/)

 

References

[1] Rosch, E. (1973). Natural categories. Cognitive Psychology.
[2] Rosch, E., Mervis, C., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology.
[3] Marc, J.M.M., Joubert, O.R., Nespoulous, J.L. & Fabre-Thorpe, M (2009). The time-course of visual categorizations: you spot the animal faster than the bird. PLoS one.
[4] Johnson, K.E., Mervis, C.B. (1997). Effects of varying levels of expertise on the basic level of categorization. Journal of Expert Psychology.
[5] Jiang, Y.G., Ngo, C.W., Yang, J. (2007). Towards Optimal Bag-of-Features for Object Categorization and Semantic Video Retrieval. CIVR'07.