Object Detection and Recognition

Object recognition and detection refer to the tasks of classifying and localizing a given object from an image or video sequence. Since the problem in its general form is very difficult to tackle, it is usually constrained to a particular application with restricted object class and detection tolerance. For example, one can focus on detecting the bounding boxes containing all persons in the given image. In another example, one could be interested in the pixel level segmentations of cars in aerial images acquired over a city.

persondetection.png

An example result of person detection.

The aforementioned problem is a very topical issue in computer vision and it is currently studied in numerous renowned research groups around the world. The wide interest in the subject has produced many benchmark datasets for the proposed approaches, which help the community to assess the methods and follow the advantages in the field. The well-known examples of these include PASCAL VOC visual object class detection challenge and ETHZ dataset for human detection. In some cases (e.g. in face detection) the results have already achieved the level of commercial applications, but the more general solutions require still notable research efforts.

facedetection.png
Face detection is already a common feature in modern cameras.

The widely used paradigm to object detection and recognition is the following algorithm: 1) Take a bounding box which approximately corresponds to the size of the searched object. (One might use several scales), 2) Slide the box over the image and at each position extract some characteristic appearance of the underlying image pixels (e.g. the orientation of edges or color histograms), 3) Use machine learning methods (e.g. SVM or Adaboost) to compare the achieved features with manually labeled training samples and score how likely each box contains the object of interest. 4) Find the scores that are above a threshold and label these bounding boxes as positive detections. 5) Remove multiple detections by keeping only the best scoring one among the highly overlapping boxes. 6) Return the remaining positive bounding boxes as the resulting detections.

Our research focuses on two subject areas:

Human detection:

Spatial-temporal granularity-tunable gradients partition (STGGP) descriptors for human detection.

Isotropic granularity-tunable gradients partition (IGGP) descriptors for human detection.

Generic object detection:

Finding methods of predicting the locations of visual attention within very diverse object class of interest.

generaldetector.png
General detector finds objects from
extremely diverse object class.

CMV/Research/ObjectDetectionAndRecognition (last edited 2011-11-19 15:09:37 by WebMaster)