Artificial Intelligence ME 768 Jan-Apr 2000
RECOGNITION OF CARTOON CHARACTERS

Submitted by:

Satyendra Kumar Gupta (satyend@iitk.ac.in)
S.Prashanth Aditya (aditya@iitk.ac.in)

IIT Kanpur: April 2000


Contents


Motivation

The idea behind choosing this project was to develop a natural language "commentary" from a cartoon strip. The project would then involve the areas of vision (object recognition) and NLP (text generation). This, it was thought, would be a step in helping blind people and children understand and appreciate cartoons better.
The amount of work involved in such a venture would have been too much to complete in a semester. So we decided upon the simpler (relatively) subproblem of recognising a cartoon character in a cartoon strip.
We chose this out of the two subproblems at hand because this work can be extended to human objects and might be of considerable help, in that case, in areas like criminal investigations etc.


  • Back to contents

  • Example:

    Consider the image below. We assume the standard set of Tintin characters to be the following:- Tintin, Snowy, Captain Haddock, Professor Calculus, Nestor and the Thomson and Thompson twins. Though we have developed the project to recognise Tintin alone, the methods used could be modified to capture any of these characters from the input image.

    Here the program should be able to identify Tintin and Captain Haddock (the modern one, not Sir Francis Haddock). Another example could be

    Here, we should be able to recognise Tintin, Snowy, Captain Haddock and Professor Calculus.


  • Back to contents

  • Past Work - Using the unique features of a cartoon character to identify it

    Past work was focussed on isolating distinct features of a character like its hair and images were scanned using the "dark hair on light background" concept etc. In addition to this, [Syeda:1999] gives a method of scanning the entire image as well. [Yamada/Watanabe:1999] discusses some other aspects of cartoon character recognition, notably working with "characteristic pixels". This is necessary because objects could be in any orientation in the image. These characteristic pixels are crosses, junctions and the like. The hair regions are characterised by rectangular and principal moments and then advantage is taken of the fact that in most cases, the body is perpendicular to the hair region. In multiple frame cartoons, a distance function based on template similarity between objects (in each image) is used to identify individual characters.
    There are quite a few papers on individual topics of our concern in this project, like template matching, segmentation of images by colour etc. [Lai/Chin:1995], for instance, deals with the extraction and modelling of deformable contours in an image. The significance of this paper with respect to our project will be illustrated in a moment.
    John R. Smith's thesis on Integrated Spatial and Feature Image Systems: Retrieval, Analysis and Compression discusses various methods of extracting regions from images using colour/texture Histogram-Back-Projection techniques.


  • Back to contents

  • Sample Input and Expected Output

    The input is an image. The strip we have chosen for this project is Tintin. We have developed our program to recognise Tintin. But the limitations and conclusions section will also explain how this can be extended to other characters as well.
    For the input image, our goal would be to specify whether it contains Tintin or not. This will be made known to the user at the the screen itself.
    The output will be an image of size equal to the input image and witha white background. It will contain the edges of the character we have recognised as Tintin.
    The following example will make this clear.



  • Back to contents

  • Methodology

    The methodology basically involves two steps: Segmentation of the images based on colour and Template based Feature Extraction. It revolves around the fact that Tintin (who is the character we are trying to identify), has a particular complexion and a marked feature as regards his hair.
    For the segmentation part, we have chosen John Smith's thesis. Regions having a specified colour can be extracted by Histogram-Back-Projection. The thesis describes a variety of algorithms working on the basis of this method. Local histogramming achieves the best results.
    Feature extraction helps to identify Tintin from a host of portions in the image which are detected in the first step. [Lai/Chin:1995] describes methods of extracting deformable contours from noisy images. It should be noticed that Tintin's hair shape, though unique, does change shape and orientation-wise across images. Therefore the chosen template should be able to deal with this. Further, scanned images are very noisy, as compared to other images available from the web. So the extraction algorithm should be good enough to detect the shape in not-so-clean images.
    The following paragraphs explain the algorithms adapted from both the works mentioned above.

    There is something we would like to point out at this stage. So far, we were working with a particular algorithm for the segmentation which basically works the same way Local Histogramming HBP works (though we didn't know it until very very recently!), the difference being that instead of comparing a pixel in the target region with a group of pixels which make up the model image, we used to compare each individual pixel with a value which we arrived at through some experimentation. So our algorithm was not very robust, and importantly, we couldn't do much about the noise we got due to scanning.

    Histogram-Back-Projection by Local Histogramming to detect colour regions in an image.

    HBP determines the most likely spatial position of a given histogram in an image. According to John Smith, it is defined as "a class of algorithms designed to detect within images the feature histograms that are similar to a given model histogram".
    Combined with what is known as the Visual Feature Library (VFL), HBP constitutes a powerful solution for extracting regions from a large collection of unconstrained images.
    HBP consists of four steps: back-projection, smoothing, thresholding and detection. The Local Histogramming method combines the smoothing and back-projection steps. It works as follows: For each point in the target image a local neighbourhood histogram is computed and its distance to the model histogram determines whether each image point belongs to the model. (The model image provides the model histogram. The model images constitute the VFL.)
    The concept of "distance" between histograms is simple to understand. If there are two histograms (considered points in a histogram space), there is a number called the distance between them which is non-zero if they are different and zero if they are the same. The distance also follows the rules of commutativity and symmetry as well as the triangle inequality (given histograms 1,2 and 3 D(1,2)+D(2,3)>=D(1,3)).
    There are many ways to calculate the distance. Anything is valid as long as it conforms to the above laws. We use the Minkowski-form distance, defined as follows: Let and be the query and target histograms, respectively, then

    , where r could be 1 or 2. We choose r = 1 - essentially we get the same results.
    John Smith also presented two efficient strategies for the computation of the histogram queries. The first, query optimized distance (QOD) computation, prioritizes the elements of the computation in the order of largest significance to the query. By prioritizing the computation in this way, the query is computed more efficiently. Furthermore, by stopping the computation process to only approximate the actual histogram distances, the query response-time is reduced dramatically without significantly decreasing the retrieval effectiveness.
    As far as the histograms are concerned, the range of 256 intensities is divided into 8 bins of 32 each. The bins are then normalised according to the following function:, where again r = 1 or 2 and we choose r = 1.


    This is a feature vector representation of discrete-space, two-bin normalised histograms (with r = 1).

    Minkowski form distances compute differences between the same bins, not even the same kind of bins. As John puts it, in Minkowski form distance computation, deep red is as different from red as it is from blue.

    NOTE:- ALL THESE IMAGES WERE TAKEN FROM JOHN SMITH'S THESIS PAGE.

    Here, we would like to present a comparison of the algorithm we had used earlier and this one. The first image that you see is a scanned image of Tintin - an example of the input our program would accept (in ppm format, of course). The second image you see is the segmented version of the first the way we were doing it earlier. The third is the result of the new algorithm.
    The results page will show better examples of segmented images. These were achieved by using some other simle algorithms in conjunction with the HBP algorithm. This is schematically represented in the image shown below these three.


    The salient features of our previous algorithm:

    The salient features of the new algorithm we have used:

    The first thing that strikes as the difference between the two images is that the latter doesn't give as much information as the former in terms of edge-detail. What we essentially get is a rough idea of where in the image we'll get the colours we are looking for. But another thing to be noticed is that noise is completely eliminated in the latter image.


    Limitations and Conclusions


    Due to problems like the difference in the quality of scanning across images, and "differently coloured" Tintin's in different comics, the HBP algorithm works well (mostly) only in the case when the model images in the VFL are taken from the same image which has to be processed. This defeats the purpose of the project to some extent, but it also says that the problem is something about which we can do nothing much about, unless of course we think of an alternative method of identifying Tintin other than the Segmentation-Feature Extraction routine.
    The HBP algorithm also detects all regions of the image which have roughly the same colour distribution as the model image. What this means is, since all the main characters have the same colour (facial), this method of segmentation can be used as a primary step to detect almost any major Tintin character (with the exception of Snowy and the one odd cats in Marlinspike Hall!!).
    Also, this algorithm works irrespective of the background colour.
    After seeing the results page and the cases in which the HBP method doesn't produce very satisfying results, we feel that removing the dialog boxes might help.
    We also did not have an extensive VFL. We had only one or two model images for each image and we checked our code on those. What we should have done was to have a model image which was the mean of several model images taken from many images and then applied that mean model and check for the distnace between histograms such that it lies between the values mean-standard dev. and mean+standard dev., where the standard deviation is also calculated from many images. We expect better results with this.
    We were also not able to implement the feature extraction part. Character recognition is all about verifying as many features as possible to get closer to recognising the required character. So feature recognition would have been a good final step in the recognition procedure. This is especially true in this case because almost all characters in the Tintin books have more or less the same facial colour. The importance of feature extraction can br gauged by the results obtained under the "unsuccessful search" section in the results page.


  • To the results section
  • Back to contents

  • On-Line Links


  • Back to contents

  • Annotated Bibliography

    @Article{Syeda:1999,
    author = { T F Syeda-Mahmood.},
    year = { 1999},
    keywords = { IMAGE TEXTURE SEGMENTATION},
    institution = { IBM Almaden Research Centre,CA},
    journal = { Computer Vision and Image Understanding},
    title = { Detecting Perceptually Salient Texture Regions in Images},
    month = { October},
    volume = { 76},
    pages = { 93-108 },
    e_mail = { stf@almaden.ibm.com},
    annote = {
    	This paper reports on the techniques and the heuristics applied and used to
    	group the salient texture regions in the natural scenes.It works out the
    	possible presence of distinctive pattern localised to some arbitrary
    	position of any given image.The saliency of the texture region can be
    	inferred by analyzing the interplay between the bright and dark region
    	against a gray background. This condition is met by using convolution
    	like-LOG(Laplacian of Gaussian).One of the problem faced by the
    	author is the selection of the relevant area to which analysis is to be
    	applied.The heuristic proposed is to assign weight functions to different
    	visible pattern and obtain the relevant portion by the MST(minimum
    	spanning tree) of the region.
    
    	An interesting part of the paper is texture segmentation method by
    	moving window analysis and LP-spectrum difference measure .It first
    	divide the image into different overlapping window and by series of
    	iteration it recontructs the missing portion of the boundary fragments,
    	thus in a way parsing the whole image.
    
    	One important aspect of the paper which remained unclear is choice
    	of weight function based on "psychophysical experimental data",
    	what I propose is to use use fuzzy logic.Each pattern type can be mapped
    	to some prototype ,and appropriate parameters can be obtained using
    	fuzzy laws and principles.Over all the paper is well presented and the
    	end results are well exemplified
    }
    @article{Yamada/Watanabe:1999,
    author = { Yamada, T. and Watanabe, T.},
    year = { 1999},
    keywords = { CARTOONS-RECOGNITION EXTRACTION PERSON-SLOPE VECTORIZATION IDENTIFICATION },
    institution = { Graduate school of engineering,Nayoya University},
    title = { Identification of Person Objects in Four-Scenes Comics of Japanese Newspapers},
    booktitle = { Intl. wkshp. on Graphics Recognition,GREC`99},
    pages = { 152-159},
    email = { {yamada,watanabe}@watanabe.nuie.nagoya-u.ac.jp}
    annote = {
    	This paper describes a method to identify a comic character. The subjects which extract meaningful data
    	from paper based documents and make it possible to manipulate the extracted data as operative objects in
    	electronic formed documents are looked upon as interesting and worthy issues, and have been used for various
    	kinds kind of documents until today. In these type of comics some objects are illustrated in different
    	scenes, but these shapes and sizes, locations, directions, whole-part relationships, and so on are
    	tremendously deformed and changed. The analyzing method is based on the distance distribution among
    	composite elements.
    	With respect to extraction and identification of person objects, the basic recognition strategy focuses on
    	the hair area continuously. Here area has been defined as the area of continuously black painted pixels
    	whose number is more than experimental threshold value. The next processing stage is to extract the areas
    	related to person objects with respect to the hair areas in the previous stage. In this case it is effective
    	to determine the searching directions first in order to extract person objects. The characteristic pixels
    	are looked at in handwriting line segments, which assign some remarkable features to drawn lines in
    	directions structure and so on.
    	Using the characteristic pixels, handwriting line based drawings are vectorized. Basically, two
    	neighbouring characteristic pixels in one handwriting line segment are transformed into one straight
    	line segment. The final processing in the extraction phase of person objects is to select the vectorized
    	line segments in the estimated person region and distinguish only reliable line segments as composite
    	elements of person objects. In the identification phase of the same person object, the objective is to find
    	out the same person object from four different scenes.
    }
    @Article{Lai/Chin:1995,
    author = { Lai, Kok F. and Chin, Roland T.},
    year = { 1995},
    keywords = { DEFORMABLE-MODEL FEATURE-EXTRACTION},
    institution = { },
    title = { Deformable contours: modeling and extraction},
    journal = { IEEE transactions on Pattern Analysis and Machine Intelligence},
    month = {},
    volume = { 17 (11), 10841090},
    pages = { 28},
    e_mail = {},
    annote = {
    	This paper considers the problem of modeling and extracting arbitrary deformable contours from noisy
    	images. The authors propose a global contour model based on a stable and regenerative shape matrix,
    	which is invariant and unique under rigid motions. Combined with Markov random field to model local
    	deformations, this yields prior distribution that exerts influence over a global model while allowing
    	for deformations. The authors then cast the problem of extraction into posterior estimation and show
    	its equivalence to energy minimization of a generalized active contour model. The paper discusses
    	pertinent issues in shape training, energy minimization, line search strategies, minimax regularization
    	and initialization by generalozed Hough transform. Finally, experiment results are presented and a
    	performance comparison vis-a-vis rigid template matching is conducted.
    The major problem with this paper is that it uses a few structures and term without really making it clear to the reader how they are actually calculated. For instance, the concept of variance at a point (with regard to deformations of the contour) is used throughout the paper. This as such can be intuitively understood by the reader but there are no details like the points in the image which were considered to calculate this. This poses a very big problem if we have to implement the algorithms discussed in the paper, which I should say really good.
    - S.P.Aditya, 09/04/2000 }
    NOTE:- The CVPR/PAMI/thesis versions [Lai/Chin:1995] can be downloaded in zipped/normal versions from Cora's pages.

  • Back to contents

  • Documentation of libraries used and usage of the code

    We have worked with images of the type .ppm and .pgm throughout the project. So we have used the libraries "ppm.h", "pgm.h" and "pbm.h" and linked them to our code, which is written in C.
    The input image has to be converted to the ppm format. Then the program is compiled using the following command line command:
    $ cc -lpgm -lppm -lpbm -lm test.c
    The name of our program is test.c. This program deals with the HBP and segmentation.
    Then the following command is executed at the prompt:
    $ a.out imagename.ppm
    Here, imagename is the name of the input image.
    An image named model_image.ppm is used as a single image from the VFL in the program. As of now, this model image is taken from the same image and is also converted into a .ppm file. This could not be helped because of the non-uniformity in facial colour across images.

    Now a listing of all the functions/subroutines used in the code is given, along with a brief description of what they do: