Name | Scale | Length per clip (sec) | Resolution | Annotation | Scenario |
---|---|---|---|---|---|
BEHAVE | 4 videos (171 clips) | 0.24~61.92 | 640x480 | frame-level | acted fights |
RE-DID | 30 videos | 20~240 | 1280x720 | frame-level | natural |
VSD | 18 movies (1317 clips) | 55.3~829.4 | variable | frame-level | movies |
CCTV-Fights | 1000 clips | 5~720 | variable | frame-level | surveillance, mobile cameras |
Hockey Fight | 1000 clips | 1.6~1.96 | 360x288 | video-level | Hockey |
Movies Fight | 200 clips | 1.6~2 | 720x480 | video-level | movies, sports |
Crowd Violence | 246 clips | 1.04~6.52 | variable | video-level | natural |
SBU Kinect Interaction | 264 lips | 0.67~3 | 640x480 | video-level | acted fights |
Violent-Flows | 246 clips | 1.04~6.52 | 320x240 | video-level | streets/school/sports |
Avenue | 37 videos | only normal videos are present in training set | |||
——————— | ——————— | —————— | —————— | —————— | ——————— |
UCF-Crime | 1900 clips | 60~600 | variable | video-level | surveillance |
UCFCrime2Local | frame-level | ||||
UCF-Crime annotated | frame-level | ||||
RWF-2000 | 2000 clips | 5 | variable | video-level | surveillance |
fight-detection-surv | 300 videos | 2 | variable | video-level | surveillance |
RLV | 2000 clips | 3-7 | 397x511(?) | video-level | natural (?) |
XD-Violence | 4754 videos | 1~240 (mostly) | variable | video-level, multiple label | movies, sports, games, hand-held cameras, surveillance, car cameras, etc… |
Shanghai-Tech | 437 videos | surveillance | |||
UCSD | 98 videos | frame-level | surveillance | ||
——————— | ——————— | —————— | —————— | —————— | ——————— |
YouTube-Small (ours) | 58 - 41 clips(fight/non-fight) | 2~3 | variable | video-level | natural |
- we have the bold ones on HAL
paper link: https://arxiv.org/pdf/1801.04264.pdf
webpage: https://webpages.charlotte.edu/cchen62/dataset.html
- contains 1,900 untrimmed real-world street and indoor surveillance videos with a total duration of 128 hours.
- the training set contains 1,610 videos with video-level labels, and the test set contains 290 videos with frame-level labels.
- 13 realistic anomalies: Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism.
Model | Reported on Conference/Journal | Supervised | Feature | Encoder-based | 32 Segments | AUC (%) | [email protected] on Normal (%) |
---|---|---|---|---|---|---|---|
ST-Graph | ACM MM 20 | Un | - | √ | X | 72.7 | |
Sultani.etl | CVPR 18 | Weakly | C3D RGB | X | √ | 75.41 | 1.9 |
IBL | ICIP 19 | Weakly | C3D RGB | X | √ | 78.66 | - |
Motion-Aware | BMVC 19 | Weakly | PWC Flow | X | √ | 79.0 | - |
Background-Bias | ACM MM 19 | Fully | NLN RGB | √ | X | 82.0 | - |
GCN-Anomaly | CVPR 19 | Weakly | TSN RGB | √ | X | 82.12 | 0.1 |
MIST | CVPR 21 | Weakly | I3D RGB | √ | X | 82.30 | 0.13 |
MSL | AAAI 22 | Weakly | C3D RGB | √ | X | 82.85 | - |
CLAWS | ECCV 20 | Weakly | C3D RGB | √ | X | 83.03 | - |
RTFM | ICCV 21 | Weakly | I3D RGB | X | √ | 84.03 | - |
CRFD | TIP 21 | Weakly | I3D RGB | X | √ | 84.89 | - |
MSL | AAAI 22 | Weakly | I3D RGB | √ | X | 85.30 | - |
WSAL | TIP 21 | Weakly | I3D RGB | X | √ | 85.38 | - |
MSL | AAAI 22 | Weakly | VideoSwin-RGB | √ | X | 85.62 | - |
paper link: https://arxiv.org/pdf/1911.05913.pdf
webpage: https://github.com/mchengny/RWF2000-Video-Database-for-Violence-Detection
- contains 2,000 videos captured by surveillance cameras in real-world scenes.
paper link: https://arxiv.org/pdf/2007.04687.pdf
- contains 4,754 untrimmed videos (2405 violent and 2349 non-violent) with a total duration of 217 hours.
- 6 physically violent classes: abuse, car accident, explosion, fighting, riot, shooting
- collect from multiple sources, such as movies, sports, surveillances, and CCTVs, with audio signals.
- the training set contains 3,954 videos with video-level labels, and the test set contains 800 videos with frame-level labels.
- multiple violent labels (1~3) for each violent video.
Model | Reported on Conference/Journal | Supervision | Feature | Encoder-based | 32 Segments | AP(%) |
---|---|---|---|---|---|---|
Wu et al. | ECCV 2020 | Weakly | C3D-RGB | X | X | 67.19 |
Sultani et al. | ECCV 2020 (reported by Wu) | Weakly | I3D-RGB | X | √ | 73.20 |
MSL | AAAI 2022 | Weakly | C3D-RGB | X | X | 75.53 |
CRFD | TIP 2021 | Weakly | I3D-RGB | X | √ | 75.90 |
RTFM | ICCV 2021 | Weakly | I3D-RGB | X | √ | 77.81 |
MSL | AAAI 2022 | Weakly | I3D-RGB | X | X | 78.28 |
MSL | AAAI 2022 | Weakly | VideoSwin-RGB | X | X | 78.59 |
Wu et al. | ECCV 2020 | Weakly | I3D-RGB+Audio | X | X | 78.64 |
webpage: https://svip-lab.github.io/dataset/campus_dataset.html; https://github.com/desenzhou/ShanghaiTechDataset
- contains 437 campus surveillance videos with 130 abnormal events in 13 scenes.
- all the videos in the training set are normal.
Model | Reported on Conference/Journal | Supervision | Feature | Encoder-based | AUC(%) | [email protected] (%) |
---|---|---|---|---|---|---|
Conv-AE | CVPR 16 | Un | - | √ | 60.85 | - |
stacked-RNN | ICCV 17 | Un | - | √ | 68.0 | - |
MNAD | CVPR 20 | Un | - | √ | 70.5 | - |
Mem-AE | ICCV 19 | Un | - | √ | 71.2 | - |
FramePred | CVPR 18 | Un | - | √ | 72.8 | - |
FramePred* | IJCAI 19 | Un | - | √ | 73.4 | - |
AMMC | AAAI 21 | Un | - | √ | 73.7 | - |
ST-Graph | ACM MM 20 | Un | - | √ | 74.7 | - |
VEC | ACM MM 20 | Un | - | √ | 74.8 | - |
MLEP | IJCAI 19 | 10% test vids with Video Anno | - | √ | 75.6 | - |
HF2-VAD | ICCV 21 | Un | - | √ | 76.2 | - |
GCN-Anomaly | CVPR 19 | Weakly (Re-Organized Dataset) | C3D-RGB | √ | 76.44 | - |
ROADMAP | TNNLS 21 | Un | - | √ | 76.6 | - |
MLEP | IJCAI 19 | 10% test vids with Frame Anno | - | √ | 76.8 | - |
BDPN | AAAI 22 | Un | - | √ | 78.1 | - |
CAC | ACM MM 20 | Un | - | √ | 79.3 | |
IBL | ICME 2020 | Weakly (Re-Organized Dataset) | I3D-RGB | X | 82.5 | 0.10 |
GCN-Anomaly | CVPR 19 | Weakly (Re-Organized Dataset) | TSN-Flow | √ | 84.13 | - |
GCN-Anomaly | CVPR 19 | Weakly (Re-Organized Dataset) | TSN-RGB | √ | 84.44 | - |
Sultani.etl | ICME 2020 | Weakly (Re-Organized Dataset) | C3D-RGB | X | 86.3 | 0.15 |
CLAWS | ECCV 20 | Weakly (Re-Organized Dataset) | C3D-RGB | √ | 89.67 | |
SSMT | CVPR 21 | Un | - | √ | 90.2 | - |
AR-Net | ICME 20 | Weakly (Re-Organized Dataset) | I3D-RGB & I3D Flow | X | 91.24 | 0.10 |
MSL | AAAI 22 | Weakly (Re-Organized Dataset) | C3D-RGB | X | 94.81 | - |
MIST | CVPR 21 | Weakly (Re-Organized Dataset) | I3D-RGB | √ | 94.83 | 0.05 |
MSL | AAAI 22 | Weakly (Re-Organized Dataset) | I3D-RGB | X | 96.08 | - |
RTFM | ICCV 21 | Weakly (Re-Organized Dataset) | I3D-RGB | X | 97.21 | - |
MSL | AAAI 22 | Weakly (Re-Organized Dataset) | VideoSwin-RGB | X | 97.32 | - |
CRFD | TIP 21 | Weakly (Re-Organized Dataset) | I3D-RGB | X | 97.48 | - |
paper link: http://www.cse.cuhk.edu.hk/leojia/papers/abnormaldect_iccv13.pdf
webpage: http://www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html
- contains 16 training and 21 testing video clips. The videos are captured in CUHK campus avenue with 30652 (15328 training, 15324 testing) frames in total.
- The training videos capture normal situations. Testing videos include both normal and abnormal events.
Model | Reported on Conference/Journal | Supervision | Feature | End2End | AUC(%) |
---|---|---|---|---|---|
Conv-AE | CVPR 16 | Un | - | √ | 70.2 |
ConvLSTM-AE | ICME 17 | Un | - | √ | 77.0 |
Conv-AE* | CVPR 18 | Un | - | √ | 80.0 |
Unmasking | ICCV 17 | Un | 3D gradients+VGG conv5 | X | 80.6 |
stacked-RNN | ICCV 17 | Un | - | √ | 81.7 |
Mem-AE | ICCV 19 | Un | - | √ | 83.3 |
DeepAppearance | ICAIP 17 | Un | - | √ | 84.6 |
FramePred | CVPR 18 | Un | - | √ | 85.1 |
AMMC | AAAI 21 | Un | - | √ | 86.6 |
Appearance-Motion Correspondence | ICCV 19 | Un | - | √ | 86.9 |
CAC | ACM MM 20 | Un | - | √ | 87.0 |
ROADMAP | TNNLS 21 | Un | - | √ | 88.3 |
MNAD | CVPR 20 | Un | - | √ | 88.5 |
FramePred* | IJCAI 19 | Un | - | √ | 89.2 |
ST-Graph | ACM MM 20 | Un | - | √ | 89.6 |
VEC | ACM MM 20 | Un | - | √ | 90.2 |
AEP | TNNLS 21 | Un | - | √ | 90.2 |
Causal | AAAI 22 | Un | I3D-RGB | X | 90.3 |
BDPN | AAAI 22 | Un | - | √ | 90.3 |
HF2-VAD | ICCV 21 | Un | - | √ | 91.1 |
MLEP | IJCAI 19 | 10% test vids with Video Anno | - | √ | 91.3 |
SSMT | CVPR 21 | Un | - | √ | 92.8 |
MLEP | IJCAI 19 | 10% test vids with Frame Anno | - | √ | 92.8 |
paper link:
webpage: http://www.svcl.ucsd.edu/projects/anomaly/dataset.html
-
acquired with a stationary camera mounted at an elevation, overlooking pedestrian walkways.
-
the crowd density in the walkways was variable, ranging from sparse to very crowded.
-
in the normal setting, the video contains only pedestrians. Abnormal events are due to either: the circulation of non pedestrian entities in the walkways; or anomalous pedestrian motion patterns.
-
Commonly occurring anomalies include bikers, skaters, small carts, and people walking across a walkway or in the grass that surrounds it. A few instances of people in wheelchair were also recorded.
-
The data was split into 2 subsets, each corresponding to a different scene. The video footage recorded from each scene was split into various clips of around 200 frames.
Peds1: clips of groups of people walking towards and away from the camera, and some amount of perspective distortion. Contains 34 training video samples and 36 testing video samples.
Peds2: scenes with pedestrian movement parallel to the camera plane. Contains 16 training video samples and 12 testing video samples.
-
For each clip, the ground truth annotation includes a binary flag per frame, indicating whether an anomaly is present at that frame. In addition, a subset of 10 clips for Peds1 and 12 clips for Peds2 are provided with manually generated pixel-level binary masks, which identify the regions containing anomalies.
paper link: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9304905 code link: https://github.com/DegardinBruno/human-self-learning-anomaly webpage: http://socia-lab.di.ubi.pt/EventDetection/