Skip to content

Latest commit

 

History

History
224 lines (173 loc) · 22.9 KB

Violence Datasets.md

File metadata and controls

224 lines (173 loc) · 22.9 KB

Violence Datasets

Overview

Name Scale Length per clip (sec) Resolution Annotation Scenario
BEHAVE 4 videos (171 clips) 0.24~61.92 640x480 frame-level acted fights
RE-DID 30 videos 20~240 1280x720 frame-level natural
VSD 18 movies (1317 clips) 55.3~829.4 variable frame-level movies
CCTV-Fights 1000 clips 5~720 variable frame-level surveillance, mobile cameras
Hockey Fight 1000 clips 1.6~1.96 360x288 video-level Hockey
Movies Fight 200 clips 1.6~2 720x480 video-level movies, sports
Crowd Violence 246 clips 1.04~6.52 variable video-level natural
SBU Kinect Interaction 264 lips 0.67~3 640x480 video-level acted fights
Violent-Flows 246 clips 1.04~6.52 320x240 video-level streets/school/sports
Avenue 37 videos only normal videos are present in training set
——————— ——————— —————— —————— —————— ———————
UCF-Crime 1900 clips 60~600 variable video-level surveillance
UCFCrime2Local frame-level
UCF-Crime annotated frame-level
RWF-2000 2000 clips 5 variable video-level surveillance
fight-detection-surv 300 videos 2 variable video-level surveillance
RLV 2000 clips 3-7 397x511(?) video-level natural (?)
XD-Violence 4754 videos 1~240 (mostly) variable video-level, multiple label movies, sports, games, hand-held cameras, surveillance, car cameras, etc…
Shanghai-Tech 437 videos surveillance
UCSD 98 videos frame-level surveillance
——————— ——————— —————— —————— —————— ———————
YouTube-Small (ours) 58 - 41 clips(fight/non-fight) 2~3 variable video-level natural
  • we have the bold ones on HAL

UCF-Crime

paper link: https://arxiv.org/pdf/1801.04264.pdf

webpage: https://webpages.charlotte.edu/cchen62/dataset.html

Details

  1. contains 1,900 untrimmed real-world street and indoor surveillance videos with a total duration of 128 hours.
  2. the training set contains 1,610 videos with video-level labels, and the test set contains 290 videos with frame-level labels.
  3. 13 realistic anomalies: Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism.

Leader Board

Model Reported on Conference/Journal Supervised Feature Encoder-based 32 Segments AUC (%) [email protected] on Normal (%)
ST-Graph ACM MM 20 Un - X 72.7
Sultani.etl CVPR 18 Weakly C3D RGB X 75.41 1.9
IBL ICIP 19 Weakly C3D RGB X 78.66 -
Motion-Aware BMVC 19 Weakly PWC Flow X 79.0 -
Background-Bias ACM MM 19 Fully NLN RGB X 82.0 -
GCN-Anomaly CVPR 19 Weakly TSN RGB X 82.12 0.1
MIST CVPR 21 Weakly I3D RGB X 82.30 0.13
MSL AAAI 22 Weakly C3D RGB X 82.85 -
CLAWS ECCV 20 Weakly C3D RGB X 83.03 -
RTFM ICCV 21 Weakly I3D RGB X 84.03 -
CRFD TIP 21 Weakly I3D RGB X 84.89 -
MSL AAAI 22 Weakly I3D RGB X 85.30 -
WSAL TIP 21 Weakly I3D RGB X 85.38 -
MSL AAAI 22 Weakly VideoSwin-RGB X 85.62 -

RWF-2000

paper link: https://arxiv.org/pdf/1911.05913.pdf

webpage: https://github.com/mchengny/RWF2000-Video-Database-for-Violence-Detection

Details

  1. contains 2,000 videos captured by surveillance cameras in real-world scenes.

Leader Board

XD-Violence

paper link: https://arxiv.org/pdf/2007.04687.pdf

webpage: https://roc-ng.github.io/XD-Violence/

Details

  1. contains 4,754 untrimmed videos (2405 violent and 2349 non-violent) with a total duration of 217 hours.
  2. 6 physically violent classes: abuse, car accident, explosion, fighting, riot, shooting
  3. collect from multiple sources, such as movies, sports, surveillances, and CCTVs, with audio signals.
  4. the training set contains 3,954 videos with video-level labels, and the test set contains 800 videos with frame-level labels.
  5. multiple violent labels (1~3) for each violent video.

Leader Board

Model Reported on Conference/Journal Supervision Feature Encoder-based 32 Segments AP(%)
Wu et al. ECCV 2020 Weakly C3D-RGB X X 67.19
Sultani et al. ECCV 2020 (reported by Wu) Weakly I3D-RGB X 73.20
MSL AAAI 2022 Weakly C3D-RGB X X 75.53
CRFD TIP 2021 Weakly I3D-RGB X 75.90
RTFM ICCV 2021 Weakly I3D-RGB X 77.81
MSL AAAI 2022 Weakly I3D-RGB X X 78.28
MSL AAAI 2022 Weakly VideoSwin-RGB X X 78.59
Wu et al. ECCV 2020 Weakly I3D-RGB+Audio X X 78.64

Shanghai-Tech

paper link: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Zhang_Single-Image_Crowd_Counting_CVPR_2016_paper.pdf

webpage: https://svip-lab.github.io/dataset/campus_dataset.html; https://github.com/desenzhou/ShanghaiTechDataset

Details

  1. contains 437 campus surveillance videos with 130 abnormal events in 13 scenes.
  2. all the videos in the training set are normal.

Leader Board

Model Reported on Conference/Journal Supervision Feature Encoder-based AUC(%) [email protected] (%)
Conv-AE CVPR 16 Un - 60.85 -
stacked-RNN ICCV 17 Un - 68.0 -
MNAD CVPR 20 Un - 70.5 -
Mem-AE ICCV 19 Un - 71.2 -
FramePred CVPR 18 Un - 72.8 -
FramePred* IJCAI 19 Un - 73.4 -
AMMC AAAI 21 Un - 73.7 -
ST-Graph ACM MM 20 Un - 74.7 -
VEC ACM MM 20 Un - 74.8 -
MLEP IJCAI 19 10% test vids with Video Anno - 75.6 -
HF2-VAD ICCV 21 Un - 76.2 -
GCN-Anomaly CVPR 19 Weakly (Re-Organized Dataset) C3D-RGB 76.44 -
ROADMAP TNNLS 21 Un - 76.6 -
MLEP IJCAI 19 10% test vids with Frame Anno - 76.8 -
BDPN AAAI 22 Un - 78.1 -
CAC ACM MM 20 Un - 79.3
IBL ICME 2020 Weakly (Re-Organized Dataset) I3D-RGB X 82.5 0.10
GCN-Anomaly CVPR 19 Weakly (Re-Organized Dataset) TSN-Flow 84.13 -
GCN-Anomaly CVPR 19 Weakly (Re-Organized Dataset) TSN-RGB 84.44 -
Sultani.etl ICME 2020 Weakly (Re-Organized Dataset) C3D-RGB X 86.3 0.15
CLAWS ECCV 20 Weakly (Re-Organized Dataset) C3D-RGB 89.67
SSMT CVPR 21 Un - 90.2 -
AR-Net ICME 20 Weakly (Re-Organized Dataset) I3D-RGB & I3D Flow X 91.24 0.10
MSL AAAI 22 Weakly (Re-Organized Dataset) C3D-RGB X 94.81 -
MIST CVPR 21 Weakly (Re-Organized Dataset) I3D-RGB 94.83 0.05
MSL AAAI 22 Weakly (Re-Organized Dataset) I3D-RGB X 96.08 -
RTFM ICCV 21 Weakly (Re-Organized Dataset) I3D-RGB X 97.21 -
MSL AAAI 22 Weakly (Re-Organized Dataset) VideoSwin-RGB X 97.32 -
CRFD TIP 21 Weakly (Re-Organized Dataset) I3D-RGB X 97.48 -

Avenue

paper link: http://www.cse.cuhk.edu.hk/leojia/papers/abnormaldect_iccv13.pdf

webpage: http://www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html

Details

  1. contains 16 training and 21 testing video clips. The videos are captured in CUHK campus avenue with 30652 (15328 training, 15324 testing) frames in total.
  2. The training videos capture normal situations. Testing videos include both normal and abnormal events.

Leader Board

Model Reported on Conference/Journal Supervision Feature End2End AUC(%)
Conv-AE CVPR 16 Un - 70.2
ConvLSTM-AE ICME 17 Un - 77.0
Conv-AE* CVPR 18 Un - 80.0
Unmasking ICCV 17 Un 3D gradients+VGG conv5 X 80.6
stacked-RNN ICCV 17 Un - 81.7
Mem-AE ICCV 19 Un - 83.3
DeepAppearance ICAIP 17 Un - 84.6
FramePred CVPR 18 Un - 85.1
AMMC AAAI 21 Un - 86.6
Appearance-Motion Correspondence ICCV 19 Un - 86.9
CAC ACM MM 20 Un - 87.0
ROADMAP TNNLS 21 Un - 88.3
MNAD CVPR 20 Un - 88.5
FramePred* IJCAI 19 Un - 89.2
ST-Graph ACM MM 20 Un - 89.6
VEC ACM MM 20 Un - 90.2
AEP TNNLS 21 Un - 90.2
Causal AAAI 22 Un I3D-RGB X 90.3
BDPN AAAI 22 Un - 90.3
HF2-VAD ICCV 21 Un - 91.1
MLEP IJCAI 19 10% test vids with Video Anno - 91.3
SSMT CVPR 21 Un - 92.8
MLEP IJCAI 19 10% test vids with Frame Anno - 92.8

UCSD

paper link:

webpage: http://www.svcl.ucsd.edu/projects/anomaly/dataset.html

Details

  1. acquired with a stationary camera mounted at an elevation, overlooking pedestrian walkways.

  2. the crowd density in the walkways was variable, ranging from sparse to very crowded.

  3. in the normal setting, the video contains only pedestrians. Abnormal events are due to either: the circulation of non pedestrian entities in the walkways; or anomalous pedestrian motion patterns.

  4. Commonly occurring anomalies include bikers, skaters, small carts, and people walking across a walkway or in the grass that surrounds it. A few instances of people in wheelchair were also recorded.

  5. The data was split into 2 subsets, each corresponding to a different scene. The video footage recorded from each scene was split into various clips of around 200 frames.

    Peds1: clips of groups of people walking towards and away from the camera, and some amount of perspective distortion. Contains 34 training video samples and 36 testing video samples.

    Peds2: scenes with pedestrian movement parallel to the camera plane. Contains 16 training video samples and 12 testing video samples.

  6. For each clip, the ground truth annotation includes a binary flag per frame, indicating whether an anomaly is present at that frame. In addition, a subset of 10 clips for Peds1 and 12 clips for Peds2 are provided with manually generated pixel-level binary masks, which identify the regions containing anomalies.

Leader Board

UBI-Fights

paper link: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9304905 code link: https://github.com/DegardinBruno/human-self-learning-anomaly webpage: http://socia-lab.di.ubi.pt/EventDetection/

Details

Leader Board

https://paperswithcode.com/sota/abnormal-event-detection-in-video-on-ubi?p=weakly-and-partially-supervised-learning