Object detection is a fundamental problem in image understanding. One popular solution is the R-CNN framework and its fast versions. They decompose the object detection problem into two cascaded easier tasks: 1) generating object proposals from images, 2) classifying proposals into various object categories. Despite that we are handling with two relatively easier tasks, they are not solved perfectly and there's still room for improvement.
In this paper, we push the "divide and conquer" solution even further by dividing each task into two sub-tasks. We call the proposed method "CRAFT" (Cascade Region-proposal-network And FasT-rcnn), which tackles each task with a carefully designed network cascade. We show that the cascade structure helps in both tasks: in proposal generation, it provides more compact and better localized object proposals; in object classification, it reduces false positives (mainly between ambiguous categories) by capturing both inter- and intra-category variances. CRAFT achieves consistent and considerable improvement over the state-of-the-art on object detection benchmarks like PASCAL VOC 07/12 and ILSVRC.




1. Object proposal on VOC07 test

Note: "Ours" keeps fixed number of proposals per image (same as RPN), while "Ours_S" keeps proposals whose scores (output of the cascaded FRCN classifier) are above a fixed threshold.

2. Object detection on VOC07/12 test and ILSVRC val2


3. ImageNet 2015 Object Detection from Video (VID) Competition

CRAFT is one of the two components in "CUimage" (in ILSVRC2015 Object Detection Competition) and "CUvideo" (in ILSVRC Obejct Detection from Video Competition).
Note: The mAP result are static-image detection result only without using temporal information. For details concering the temporal part, please refer to this slide.


1. Paper, Poster

2. Code


  title={Craft Objects from Images},
  author={Yang, Bin and Yan, Junjie and Lei, Zhen and Li, Stan},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},