EN FR
EN FR
STARS - 2018
Overall Objectives
New Software and Platforms
Bilateral Contracts and Grants with Industry
Bibliography
Overall Objectives
New Software and Platforms
Bilateral Contracts and Grants with Industry
Bibliography


Section: New Results

Activity Detection in Long-term Untrimmed Videos by discovering sub-activities

Participants : Farhood Negin, Abhishek Goel, Abdelrahman G. Abubakr, Gianpiero Francesca, Francois Brémond.

Keywords: Activity detection, Semi-supervised learning, Sub-activity detection.

Figure 20. The process of extracting PC-CNN features and training of a weakly supervised sub-activity detector for the "Cooking" activity.
IMG/cropped_sub-activity.png

Detecting temporal delineation of activities is important to analyze large-scale videos. However, there are still challenges yet to be overcome in order to have an accurate temporal segmentation of activities. Detection of daily-living activities is even more challenging due to their high intra-class and low inter-class variations, complex temporal relationships of sub-activities performed in realistic settings. To tackle these problems, we propose an online activity detection framework based on the discovery of sub-activities. We consider a long-term activity as a sequence of short-term sub-activities. Our contributions can be summarized as follows:

  • We introduce a new online frame-level activity detection pipeline which uses single-sized window approach. A weakly supervised classifier is trained directly on sub-activities discovered by clustering and operates on test videos to capture sub-activities of long videos within a fixed temporal window.

  • To alleviate the noisy detections especially in activity boundaries, we propose a novel greedy post-processing method based on Markov models.

  • We have extensively evaluated our proposed method on untrimmed videos from DAHLIA [68] and GAADRD [77] datasets and achieved state-of-the-art performances.

Proposed Method:

Our framework produces frame-level activity labels in an online manner by two major steps followed by a novel greedy post-processing technique. In order to handle long activities, activities are decomposed into a sequence of fixed-length overlapping temporal clips. We then extract deep features from the clips. We suggested a person-centric feature (PC-CNN) based on SSD detector that satisfies required processing efficiency of online systems. We then proposed a weakly-supervised method for the discovery of sub-activities of long-term activities which benefits from clustering and model selection methods to find the optimal sub-activities of the given activities. In order to characterize each activity with constituent sub-activities, we use K-means to cluster that activity's clips and construct a specific sub-activity dictionary. Therefore, we have one sub-activity dictionary for each main activity. We represent an activity sequence with sub-activity assignments using the trained dictionary. Then, for each activity class, we train a binary SVM classifier (one versus all) based on its sub-activities (Figure 20). The trained classifiers are then simultaneously used to produce frame-level activity labels with the help of a sliding window architecture. It should be noticed that unlike multi-scale sliding window methods, we only use a single fixed-size temporal window thanks to recognition of fixed length sub-activities. Finally, assuming temporal progression of sub-activities, we developed a greedy algorithm based on Markov models to refine noisy sub-activity proposals in middle and boundary regions of long activities. We evaluated the proposed method on two daily-living activity datasets and achieved state-of-the-art performances.

Table 1. The activity detection results obtained on the DAHLIA. Values in bold represent the best performance.
ELS Max Subgraph Search DOHT (HOG) Sub Activity
FA_1 F_score IoU FA_1 F_score IoU FA_1 F_score IoU FA_1 F_score IoU
View 1 0.18 0.18 0.11 - 0.25 0.15 0.80 0.77 0.64 0.85 0.81 0.73
View 2 0.27 0.26 0.16 - 0.18 0.10 0.81 0.79 0.66 0.87 0.82 0.75
View 3 0.52 0.55 0.39 - 0.44 0.31 0.80 0.77 0.65 0.82 0.76 0.69
Table 2. Detection results obtained on the GAADRD dataset.
Method FA_1 F_score IoU
simple sliding window(HOG) 0.68 0.52 0.40
simple sliding window(PC-CNN) 0.61 0.55 0.44

Tables 1 and 2 show the results of applying the developed frameworks on DAHLIA and GAADRD respectively. It can be noticed that in DAHLIA dataset (compared to [71], [61], [60]), we significantly outperformed state-of-the-art results in all of the categories except in camera view 3 when the F-Score metric is used. We reported the results of GAADRD dataset with the two types of features HOG and PC-CNN. As it can be seen, even with hand-crafted features our framework produces comparable results. In future work, we are going to improve the sub-activity discovery algorithm by making it able to distinguish similar sub-activities in two different activities.