Topic Title: Pedestrian Retrieval and Behavior Recognition based on Video Pattern Learning


Technical Area: Person Re-ID; Anomalous Behavior Recognition



Pedestrian retrieval and behavior recognition technologies apply in many real-world applications beyond academia. In DAMO academy of Alibaba Group, we focus on urban public scenarios and expect that these technologies can effectively prevent criminals and contribute to public security through the analysis of surveillance data in urban public areas.


Pedestrian retrieval aims to identify a pedestrian from a set of gallery images captured by different cameras or a single camera different with different timestamps. Behavior recognition, from the viewpoint of computer vision, is to match the observation (e.g. videos) with predefined patterns and then assign with a label named as behavior type. Actually, static images based pedestrian retrieval and behavior recognition have been well studied in many years. However, video-based methods in this area are inadequate, which still remains plenty of space for improvement.


We are looking for a good solution for pedestrian retrieval and anomalous behavior recognition, by the way of video pattern learning among urban public surveillance cameras. Our goal is to establish the city-level retrieval and recognition on people, vehicles (all kinds of transportations), objects (appendage or surrounding buildings) and events (human behaviors), and then to develop large-scale visual cloud-computing based on “dynamic information in videos” such as urban traffics, public security and urban construction, which eventually supports the services of comprehensive analysis, understanding, and intervention.


We invite the researchers in the related fields to work on new solutions of video-based pedestrian retrieval and behavior recognition that can achieve city-level large scale searching and improve the robustness on pedestrian retrieval and the accuracy on behavior recognition.



1. Video based pedestrian retrieval and behavior recognition: Except the existing fine-grained feature representation of single image, the proposal should develop pedestrian retrieval and anomalous behavior recognition based on video pattern learning, including but not limited to silhouette, gait, and so on. Based on the temporal feature modeling and representation learning, the precision of pedestrian retrieval for common surveillance scenarios will be further improved. In addition, the video pattern learning should focus on anomalous behavior recognition for public security, e.g., steal, brawl.


2. Urban-object retrieval and recognition: Based on the pedestrian recognition and the video data fusion of people on foot, by bike and other on-board vehicles, the proposal is supposed to track the trajectory of people in the city and structure the index data of urban objects which include people, vehicles (all kinds of transportations), objects (appendage or surrounding buildings) and events (human behaviors).


Related Research Topics

In order to model video pattern features, some related research topics are given as follows:


1. 3D pose estimation: Based on the 3D position estimation of skeleton key-points, the representation learning for pedestrians can be improved through the pose alignment. Moreover, behavior recognition also can benefit from sequential pose estimation. However, the required real-time multi-target 3D pose estimation is still a difficulty, especially in a monocular camera.


2. Anomaly detection: Although the anomaly detection is a classic problem in computer vision, the proposed algorithms still cannot well apply to realistic applications. The detection on anomalous behaviors of people in public surveillance is our primary target. Complex various scenes and extremely unbalanced training data are two big challenges in realistic scenarios.