Violence Datasets

Overview

Name	Scale	Length per clip (sec)	Resolution	Annotation	Scenario
BEHAVE	4 videos (171 clips)	0.24~61.92	640x480	frame-level	acted fights
RE-DID	30 videos	20~240	1280x720	frame-level	natural
VSD	18 movies (1317 clips)	55.3~829.4	variable	frame-level	movies
CCTV-Fights	1000 clips	5~720	variable	frame-level	surveillance, mobile cameras
Hockey Fight	1000 clips	1.6~1.96	360x288	video-level	Hockey
Movies Fight	200 clips	1.6~2	720x480	video-level	movies, sports
Crowd Violence	246 clips	1.04~6.52	variable	video-level	natural
SBU Kinect Interaction	264 lips	0.67~3	640x480	video-level	acted fights
Violent-Flows	246 clips	1.04~6.52	320x240	video-level	streets/school/sports
Avenue	37 videos			only normal videos are present in training set
———————	———————	——————	——————	——————	———————
UCF-Crime	1900 clips	60~600	variable	video-level	surveillance
UCFCrime2Local				frame-level
UCF-Crime annotated				frame-level
RWF-2000	2000 clips	5	variable	video-level	surveillance
fight-detection-surv	300 videos	2	variable	video-level	surveillance
RLV	2000 clips	3-7	397x511(?)	video-level	natural (?)
XD-Violence	4754 videos	1~240 (mostly)	variable	video-level, multiple label	movies, sports, games, hand-held cameras, surveillance, car cameras, etc…
Shanghai-Tech	437 videos				surveillance
UCSD	98 videos			frame-level	surveillance
———————	———————	——————	——————	——————	———————
YouTube-Small (ours)	58 - 41 clips(fight/non-fight)	2~3	variable	video-level	natural

we have the bold ones on HAL

UCF-Crime

paper link: https://arxiv.org/pdf/1801.04264.pdf

webpage: https://webpages.charlotte.edu/cchen62/dataset.html

Details

contains 1,900 untrimmed real-world street and indoor surveillance videos with a total duration of 128 hours.
the training set contains 1,610 videos with video-level labels, and the test set contains 290 videos with frame-level labels.
13 realistic anomalies: Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism.

Leader Board

Model	Reported on Conference/Journal	Supervised	Feature	Encoder-based	32 Segments	AUC (%)	FAR@0.5 on Normal (%)
ST-Graph	ACM MM 20	Un	-	√	X	72.7
Sultani.etl	CVPR 18	Weakly	C3D RGB	X	√	75.41	1.9
IBL	ICIP 19	Weakly	C3D RGB	X	√	78.66	-
Motion-Aware	BMVC 19	Weakly	PWC Flow	X	√	79.0	-
Background-Bias	ACM MM 19	Fully	NLN RGB	√	X	82.0	-
GCN-Anomaly	CVPR 19	Weakly	TSN RGB	√	X	82.12	0.1
MIST	CVPR 21	Weakly	I3D RGB	√	X	82.30	0.13
MSL	AAAI 22	Weakly	C3D RGB	√	X	82.85	-
CLAWS	ECCV 20	Weakly	C3D RGB	√	X	83.03	-
RTFM	ICCV 21	Weakly	I3D RGB	X	√	84.03	-
CRFD	TIP 21	Weakly	I3D RGB	X	√	84.89	-
MSL	AAAI 22	Weakly	I3D RGB	√	X	85.30	-
WSAL	TIP 21	Weakly	I3D RGB	X	√	85.38	-
MSL	AAAI 22	Weakly	VideoSwin-RGB	√	X	85.62	-

RWF-2000

paper link: https://arxiv.org/pdf/1911.05913.pdf

webpage: https://github.com/mchengny/RWF2000-Video-Database-for-Violence-Detection

Details

contains 2,000 videos captured by surveillance cameras in real-world scenes.

Leader Board

XD-Violence

paper link: https://arxiv.org/pdf/2007.04687.pdf

webpage: https://roc-ng.github.io/XD-Violence/

Details

contains 4,754 untrimmed videos (2405 violent and 2349 non-violent) with a total duration of 217 hours.
6 physically violent classes: abuse, car accident, explosion, fighting, riot, shooting
collect from multiple sources, such as movies, sports, surveillances, and CCTVs, with audio signals.
the training set contains 3,954 videos with video-level labels, and the test set contains 800 videos with frame-level labels.
multiple violent labels (1~3) for each violent video.

Leader Board

Model	Reported on Conference/Journal	Supervision	Feature	Encoder-based	32 Segments	AP(%)
Wu et al.	ECCV 2020	Weakly	C3D-RGB	X	X	67.19
Sultani et al.	ECCV 2020 (reported by Wu)	Weakly	I3D-RGB	X	√	73.20
MSL	AAAI 2022	Weakly	C3D-RGB	X	X	75.53
CRFD	TIP 2021	Weakly	I3D-RGB	X	√	75.90
RTFM	ICCV 2021	Weakly	I3D-RGB	X	√	77.81
MSL	AAAI 2022	Weakly	I3D-RGB	X	X	78.28
MSL	AAAI 2022	Weakly	VideoSwin-RGB	X	X	78.59
Wu et al.	ECCV 2020	Weakly	I3D-RGB+Audio	X	X	78.64

Shanghai-Tech

paper link: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Zhang_Single-Image_Crowd_Counting_CVPR_2016_paper.pdf

webpage: https://svip-lab.github.io/dataset/campus_dataset.html; https://github.com/desenzhou/ShanghaiTechDataset

Details

contains 437 campus surveillance videos with 130 abnormal events in 13 scenes.
all the videos in the training set are normal.

Leader Board

Model	Reported on Conference/Journal	Supervision	Feature	Encoder-based	AUC(%)	FAR@0.5 (%)
Conv-AE	CVPR 16	Un	-	√	60.85	-
stacked-RNN	ICCV 17	Un	-	√	68.0	-
MNAD	CVPR 20	Un	-	√	70.5	-
Mem-AE	ICCV 19	Un	-	√	71.2	-
FramePred	CVPR 18	Un	-	√	72.8	-
FramePred*	IJCAI 19	Un	-	√	73.4	-
AMMC	AAAI 21	Un	-	√	73.7	-
ST-Graph	ACM MM 20	Un	-	√	74.7	-
VEC	ACM MM 20	Un	-	√	74.8	-
MLEP	IJCAI 19	10% test vids with Video Anno	-	√	75.6	-
HF2-VAD	ICCV 21	Un	-	√	76.2	-
GCN-Anomaly	CVPR 19	Weakly (Re-Organized Dataset)	C3D-RGB	√	76.44	-
ROADMAP	TNNLS 21	Un	-	√	76.6	-
MLEP	IJCAI 19	10% test vids with Frame Anno	-	√	76.8	-
BDPN	AAAI 22	Un	-	√	78.1	-
CAC	ACM MM 20	Un	-	√	79.3
IBL	ICME 2020	Weakly (Re-Organized Dataset)	I3D-RGB	X	82.5	0.10
GCN-Anomaly	CVPR 19	Weakly (Re-Organized Dataset)	TSN-Flow	√	84.13	-
GCN-Anomaly	CVPR 19	Weakly (Re-Organized Dataset)	TSN-RGB	√	84.44	-
Sultani.etl	ICME 2020	Weakly (Re-Organized Dataset)	C3D-RGB	X	86.3	0.15
CLAWS	ECCV 20	Weakly (Re-Organized Dataset)	C3D-RGB	√	89.67
SSMT	CVPR 21	Un	-	√	90.2	-
AR-Net	ICME 20	Weakly (Re-Organized Dataset)	I3D-RGB & I3D Flow	X	91.24	0.10
MSL	AAAI 22	Weakly (Re-Organized Dataset)	C3D-RGB	X	94.81	-
MIST	CVPR 21	Weakly (Re-Organized Dataset)	I3D-RGB	√	94.83	0.05
MSL	AAAI 22	Weakly (Re-Organized Dataset)	I3D-RGB	X	96.08	-
RTFM	ICCV 21	Weakly (Re-Organized Dataset)	I3D-RGB	X	97.21	-
MSL	AAAI 22	Weakly (Re-Organized Dataset)	VideoSwin-RGB	X	97.32	-
CRFD	TIP 21	Weakly (Re-Organized Dataset)	I3D-RGB	X	97.48	-

Avenue

paper link: http://www.cse.cuhk.edu.hk/leojia/papers/abnormaldect_iccv13.pdf

webpage: http://www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html

Details

contains 16 training and 21 testing video clips. The videos are captured in CUHK campus avenue with 30652 (15328 training, 15324 testing) frames in total.
The training videos capture normal situations. Testing videos include both normal and abnormal events.

Leader Board

Model	Reported on Conference/Journal	Supervision	Feature	End2End	AUC(%)
Conv-AE	CVPR 16	Un	-	√	70.2
ConvLSTM-AE	ICME 17	Un	-	√	77.0
Conv-AE*	CVPR 18	Un	-	√	80.0
Unmasking	ICCV 17	Un	3D gradients+VGG conv5	X	80.6
stacked-RNN	ICCV 17	Un	-	√	81.7
Mem-AE	ICCV 19	Un	-	√	83.3
DeepAppearance	ICAIP 17	Un	-	√	84.6
FramePred	CVPR 18	Un	-	√	85.1
AMMC	AAAI 21	Un	-	√	86.6
Appearance-Motion Correspondence	ICCV 19	Un	-	√	86.9
CAC	ACM MM 20	Un	-	√	87.0
ROADMAP	TNNLS 21	Un	-	√	88.3
MNAD	CVPR 20	Un	-	√	88.5
FramePred*	IJCAI 19	Un	-	√	89.2
ST-Graph	ACM MM 20	Un	-	√	89.6
VEC	ACM MM 20	Un	-	√	90.2
AEP	TNNLS 21	Un	-	√	90.2
Causal	AAAI 22	Un	I3D-RGB	X	90.3
BDPN	AAAI 22	Un	-	√	90.3
HF2-VAD	ICCV 21	Un	-	√	91.1
MLEP	IJCAI 19	10% test vids with Video Anno	-	√	91.3
SSMT	CVPR 21	Un	-	√	92.8
MLEP	IJCAI 19	10% test vids with Frame Anno	-	√	92.8

UCSD

paper link:

webpage: http://www.svcl.ucsd.edu/projects/anomaly/dataset.html

Details

acquired with a stationary camera mounted at an elevation, overlooking pedestrian walkways.
the crowd density in the walkways was variable, ranging from sparse to very crowded.
in the normal setting, the video contains only pedestrians. Abnormal events are due to either: the circulation of non pedestrian entities in the walkways; or anomalous pedestrian motion patterns.
Commonly occurring anomalies include bikers, skaters, small carts, and people walking across a walkway or in the grass that surrounds it. A few instances of people in wheelchair were also recorded.
The data was split into 2 subsets, each corresponding to a different scene. The video footage recorded from each scene was split into various clips of around 200 frames.

Peds1: clips of groups of people walking towards and away from the camera, and some amount of perspective distortion. Contains 34 training video samples and 36 testing video samples.

Peds2: scenes with pedestrian movement parallel to the camera plane. Contains 16 training video samples and 12 testing video samples.
For each clip, the ground truth annotation includes a binary flag per frame, indicating whether an anomaly is present at that frame. In addition, a subset of 10 clips for Peds1 and 12 clips for Peds2 are provided with manually generated pixel-level binary masks, which identify the regions containing anomalies.

Leader Board

UBI-Fights

paper link: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9304905 code link: https://github.com/DegardinBruno/human-self-learning-anomaly webpage: http://socia-lab.di.ubi.pt/EventDetection/

Details

Leader Board

https://paperswithcode.com/sota/abnormal-event-detection-in-video-on-ubi?p=weakly-and-partially-supervised-learning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Violence Datasets.md

Violence Datasets.md

Violence Datasets

Overview

UCF-Crime

Details

Leader Board

RWF-2000

Details

Leader Board

XD-Violence

Details

Leader Board

Shanghai-Tech

Details

Leader Board

Avenue

Details

Leader Board

UCSD

Details

Leader Board

UBI-Fights

Details

Leader Board

Files

Violence Datasets.md

Latest commit

History

Violence Datasets.md

File metadata and controls

Violence Datasets

Overview

UCF-Crime

Details

Leader Board

RWF-2000

Details

Leader Board

XD-Violence

Details

Leader Board

Shanghai-Tech

Details

Leader Board

Avenue

Details

Leader Board

UCSD

Details

Leader Board

UBI-Fights

Details

Leader Board