PersonID - Makarand Tapaswi

Person Identification in TV series CVPR 2012

At CeBIT 2013

Among other face analysis, we presented a demonstration of the person identification method at CeBIT 2013. Here is some press coverage. [English] [German]

Paper and Poster

``Knock! Knock! Who is it?'' Probabilistic Person Identification in TV series
Makarand Tapaswi, Martin Baeuml and Rainer Stiefelhagen
IEEE Computer Vision and Pattern Recognition (CVPR Poster), Providence, RI, June 2012
[paper] [poster-1] [poster-2]

Errata: The precision and recall labels on Figure 7 should be flipped.

Contributions

Shift from face tracks to full person tracks to achieve full coverage
Automatically learn clothing models using face recognition results
Leverage the temporal structure of the episodes
Model the problem using a Markov Random Field

Codes

Download code release v0.1. Contains the most important codes which will help understand the method. It is not yet a fully reproducible package.

Supplementary material video

Download original supplementary material (avi + README, ~20MB).

Disclaimer: This video clip is presented for academic, non-profit purposes to demonstrate our person identification methods. Copyrights are held by original content creators, producers or country-specific copyright holders.

Dataset

UPDATE (20.06.2013) We have an updated version of the data set containing face tracks with features and speaker identity assigned to them. Check it out here! They also contain six more videos from Buffy The Vampire Slayer (Season 5, Episodes 1 to 6). The work however does not focus on person tracks, so continue to use them from below.

ORIGINAL POSTING
The Big Bang Theory (Season 1, Episodes 1 to 6). You can buy the season 1 DVD at any store (Amazon US DE). Please note that the following data contains only the annotations and not the actual audio-visual content.

This data has been used in our CVPR 2012 paper, please cite it if you use the data.

PRACTICAL NOTE: The bounding boxes for tracks, timestamps, etc. are obtained from Region 2 DVDs (PAL) for which the video frames have 720x576 resolution with display at 1024x576. The bounding boxes use the latter 1024x576 resolution. The frame rate is 25fps.

Video Events videvents.tar.gz
Contains a list of auto-detected video events: shots, special sequences, title song, credits.
Format: start_frame, start_time, TYPE, [end_frame,] [end_time]
Face Tracks facetracks.tar.gz
Contains face tracks
Format: frame_number, timestamp, number_of_tracks, [track_information]
Track information is declared in the header
Person Tracks persontracks.tar.gz
Contains person tracks
Format: frame_number, timestamp, number_of_tracks, [track_information]
Track information is declared in the header
Note that the tracks here are provided for every 10th consecutive frame since we used them in that way.
Speaker Labels speakerid.tar.gz
Contains speaker identity labels
Compatible with Praat
Includes a Matlab-Praat format reader

Person Identification in TV series CVPR 2012

At CeBIT 2013

Paper and Poster

Contributions

Codes

Supplementary material video

Dataset

Links

At a glimpse

Multimedia Resources

CV, MM Papers on the Web