Shot Classification and Semantic Query Processing on Broadcast Cricket Videos
by
Dipen Rughwani
M.Tech. Thesis summary
Department of Computer Science and Engineering
IIT Kanpur, India.under the guidance of Prof Amitabha Mukerjee
Summary
Much that passes as semantic processing in broadcast sports video analysis involves identifying highlights from auxiliary cues such as applause. Handling video queries like "Show the balls where Hayden hits fours off Vaas" is a considerable challenge based on video alone. Here we use supervised learning on approximately twenty minutes of video, to segment it into episodes (balls and overs). We now align this visual analysis with parallel textual data to obtain the relevant semantics for content-based video retrieval.
The visual processing involves the problems of
- Shot boundary detection: Detecting points at which the broadcast changes from one camera to another. May involve sharp cuts or fades, or may be an action replay marker.
Cut: abrupt frame change at arrow Fade: gradual frame change Action replay (specific to the World cup broadcasts). All images shown are copyright of the broadcaster.
Shown here for research purposes only.- Shot classification: identifying the type of content visible in a shot. We use five classes:
These classes are illustrated in the next figure.
- pitch shot: shot showing the whole pitch
- batsman shot: zoomed in on the batsman
- ground shot: showing the ground, usually wider than just the pitch
- fielder shot: zoomed in on one or two fielders
- crowd view: showing the crowd
![]()
- Ball Segmentation: identifying the point in the video from which a new bowling action is starting. Since the bowler runup is often not shown, we take this as the pitch view shot which shows the bowler bowling to the batsman. (e.g. Frame 11285 shown bottom right in figure.)
- Database matching: once each ball is known, we can associate the video with availabole cricket commentaries to generate an active search showing the result to particular queries like the one above.
Query: Show the third over of the World Cup innings 1 (Australia batting against Sri Lanka) (Video response (49MB)) Query: Show the balls where Hayden hits fours off Vaas (Video response (11MB)) Algorithm: The algorithm involves video processing to classify shots and then to segment it into balls, and text processing on the commentary to identify the actions in each ball.
Video processing: multi-scale spatio-temporal analysis of colour and flow features for shot boundary detection. We use multi-class SVM to classify each video frame into one of Batsman view, Pitch view, Fielder view, Ground view and Crowd view. Finally, each shot is classified using a voting scheme based on which category dominates the shot. Sometimes a shot may range across differing types of views (e.g. this shot (32MB)); we also try to segment a shot into views, and these appear to correlate well with view classes identified by humans.
For segmenting the video into balls, we use a Bayesian scheme to identify the starting shot in the ball. This is done based on shot duration, prospective ball duration (interval to last ball-start), the class of the previous and next shots. that is learned from the training data; this coalesces the shots into balls, which are the unit of alignment with the textual commentary.
Query processing: Using very shallow processing of the commentary, we are able to identify the balls that constitute the answer to the video query. Finally, we stitch together the first two shots of each of these balls to provide the video skim response to the user query. The semantic depth is a considerable improvement on existing semantic query systems on cricket video, and the approach is validated on a sequence of ten overs from the World Cup 2007.
Download:
Video Query result clips (here) Code (here) text commentary (copyright cricinfo.com) Thesis Report (pdf) Figures (here) References (bib | papers)
- Dataset: 10 overs from Cricket World cup 2007 final, Innings 1
- [low quality video] (2GB avi) | first 2K images (480MB .rar) | full imageset (15 files, 12GB)
Contact:
Dipen Rughwani
Please email me () or Prof. Mukerjee (
) for more information!
[Back to the vision group page]