Shot Classification and Semantic Query Processing on Broadcast Cricket Videos

by

Dipen Rughwani

M.Tech. Thesis summary
Department of Computer Science and Engineering
IIT Kanpur, India.

under the guidance of Prof Amitabha Mukerjee

 

Summary

Much that passes as semantic processing in broadcast sports video analysis involves identifying highlights from auxiliary cues such as applause. Handling video queries like "Show the balls where Hayden hits fours off Vaas" is a considerable challenge based on video alone. Here we use supervised learning on approximately twenty minutes of video, to segment it into episodes (balls and overs). We now align this visual analysis with parallel textual data to obtain the relevant semantics for content-based video retrieval.

The visual processing involves the problems of

Query: Show the third over of the World Cup innings 1 (Australia batting against Sri Lanka) (Video response (49MB)) Query: Show the balls where Hayden hits fours off Vaas (Video response (11MB))

Algorithm: The algorithm involves video processing to classify shots and then to segment it into balls, and text processing on the commentary to identify the actions in each ball.

Video processing: multi-scale spatio-temporal analysis of colour and flow features for shot boundary detection. We use multi-class SVM to classify each video frame into one of Batsman view, Pitch view, Fielder view, Ground view and Crowd view. Finally, each shot is classified using a voting scheme based on which category dominates the shot. Sometimes a shot may range across differing types of views (e.g. this shot (32MB)); we also try to segment a shot into views, and these appear to correlate well with view classes identified by humans.

For segmenting the video into balls, we use a Bayesian scheme to identify the starting shot in the ball. This is done based on shot duration, prospective ball duration (interval to last ball-start), the class of the previous and next shots. that is learned from the training data; this coalesces the shots into balls, which are the unit of alignment with the textual commentary.

Query processing: Using very shallow processing of the commentary, we are able to identify the balls that constitute the answer to the video query. Finally, we stitch together the first two shots of each of these balls to provide the video skim response to the user query. The semantic depth is a considerable improvement on existing semantic query systems on cricket video, and the approach is validated on a sequence of ten overs from the World Cup 2007.

Download:

Video Query result clips (here) Code (here) text commentary (copyright cricinfo.com)
Thesis Report (pdf) Figures (here) References (bib | papers)
Dataset: 10 overs from Cricket World cup 2007 final, Innings 1
[low quality video] (2GB avi) | first 2K images (480MB .rar) | full imageset (15 files, 12GB)

Contact:


Dipen Rughwani
Please email me () or Prof. Mukerjee () for more information!


[Back to the vision group page]