HumanPoseSurvey

Deep learning methods for 3D human pose estimation under different supervision manners: A survey

The rise of deep learning technology has broadly promoted the practical application of artificial intelligence in production and daily life. In computer vision, many human-centered applications, such as video surveillance, human-computer interaction, digital entertainment, etc., rely heavily on accurate and efficient human pose estimation techniques. Inspired by the remarkable achievements in learning-based 2D human pose estimation, numerous researches are devoted to the topic of 3D human pose estimation via deep learning methods. Against this backdrop, this paper provides an extensive literature survey of recent literature about deep learning methods for 3D human pose estimation to display the development process of these researches, track the latest research trends and analyze the characteristics of devised types of methods. The literatures are reviewed along with the general pipeline of 3D human pose estimation, which consists of human body modeling, learning-based pose estimation, and regularization for refinement. Different from existing reviews of the same topic, this paper focus on deep learning-based methods. The learning-based pose estimation is discussed from two categories of single-person and multi-person. Each one is further categorized by data type to the image-based methods and the video-based methods. Moreover, due to the significance of data for learning-based methods, this paper surveys the 3D human pose estimation methods according to the taxonomy of supervision form. At last, this paper also enlists the current and widely used datasets and compares performances of reviewed methods. Based on this literature survey, it can be concluded that each branch of 3D human pose estimation starts with fully-supervised methods, and there is still much room for multi-person pose estimation based on other supervision methods from both image and video. Besides the significant development of 3D human pose estimation via deep learning, the inherent ambiguity and occlusion problems remain challenging issues that need to be better addressed.

Keywords: 3D human pose estimation; deep learning; unsupervised; fully-supervised; weakly-supervised; semi-supervised

Single-person and Multi-person

Single-person 3D pose estimation falls into two categories: two-stage and One-stage methods.
- Two-stage methods involve two steps, first, 2D joint locations are obtained by 2D keypoints detection models, then 2D keypoints are lifted to 3D keypoints by deep learning methods.
- One-stage methods mean regressing 3D joint locations directly from a RGB image. These methods require many training data with 3D annotations, but manual annotation is costly and demanding.
Multi-person 3D pose estimation is divided into two categories: top-down and bottom-up methods.
- Top-down methods first detect the human candidates and then apply single-person pose estimation for each of them.
- Bottom-up methods first detect all keypoints followed by grouping them into different people.

Input paradigm

RGB image-based methods take static images as input, only taking spatial context into account, which differs from video-based methods.
Video-based methods meet more challenges than image-based methods, such as temporal information processing, correspondence between spatial information and temporal information and motion changes in different frames, etc.

Supervision form

Unsupervised methods do not require any multi-view image data, 3D skeletons, correspondences between 2D-3D points, or use previously learned 3D priors during training. Self-supervised methods which can also solve the issue, deficiency of 3D data, have become popular in recent years. Self-supervised methods is a form of unsupervised learning where the data provides the supervision.
Fully-supervised methods rely on large training sets annotated with ground-truth 3D positions coming from multi-view motion capture systems.
Weakly-supervised methods access multiple cues for weak supervision, such as, a) paired 2D ground-truth, b) unpaired 3D ground-truth (3D pose without the corresponding image), c) multi-view image pair, d) camera parameters in a multi-view setup, etc.
Semi-supervised methods use part of annotated data (e.g. 10 percent of 3D labels), which means labeled training data is scarce.

Taxonomy

Both single-person 3D pose estimation and multi-person 3D pose estimation combined with different supervision forms could derive various branches as described in the figure below.

It is an unbalanced tree describing deep learning based 3D human pose estimation. Multi-person 3D pose estimation has received less interest compared to single-person 3D pose estimation. Also, video-based 3D pose estimation is less studied than image-based 3D pose estimation. Another interesting sight is that fully-supervised methods are presented in each sub-category, which may indicate that fully-supervised methods are helped to investigate a research area at the beginning.

Summary on datasets

We present the state-of-the-art results on several datasets, such as Human3.6m, MPI-INF-3DHP, MuPoTS-3D, Shelf, and Campus datasets.

Summary of the state-of-the-arts methods on Human3.6M dataset.

Title	Year	Supervision	Type	URL
3d human pose estimation from monocular images with deep convolutional neural network	2014	fully-supervised	monocular	-
Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video	2016	weakly-supervised	monocular	code
Structured Prediction of 3D Human Pose with Deep Neural Networks	2016	fully-supervised	monocular	code
Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image	2017	weakly-supervised	monocular	code
Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach	2017	weakly-supervised	monocular	code
End-to-End Recovery of Human Shape and Pose	2017	weakly-supervised	monocular	code
3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training	2018	semi-supervised	monocular	code
Ordinal Depth Supervision for 3D Human Pose Estimation	2018	weakly-supervised	monocular	code
Occlusion-Aware Networks for 3D Human Pose Estimation in Video	2019	semi-supervised	monocular	-
RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation	2019	weakly-supervised	monocular	code
HoloPose: Holistic 3D Human Reconstruction In-The-Wild	2019	weakly-supervised	monocular	-
Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition	2020	fully-supervised	monocular	code
3D Human Pose Estimation Using Spatio-Temporal Networks with Explicit Occlusion Training	2020	semi-supervised	monocular	-
Multi-View Pose Generator Based on Deep Learning for Monocular 3D Human Pose Estimation	2020	fully-supervised	monocular	-
A Simple Yet Effective Baseline for 3d Human Pose Estimation	2017	fully-supervised	multi-view	-
Self-Supervised Learning of 3D Human Pose Using Multi-View Geometry	2019	semi-supervised	multi-view	code
Learnable Triangulation of Human Pose	2019	fully-supervised	multi-view	code
Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation	2020	fully-supervised	multi-view	-
Epipolar Transformers	2020	weakly-supervised	multi-view	code

Summary of the state-of-the-arts methods on MPI-INF-3DHP datasets.

Title	Year	Supervision	Type	URL
Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision	2017	fully-supervised	monocular	code
Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach	2017	weakly-supervised	monocular	code
3D Human Pose Estimation in the Wild by Adversarial Learning	2018	semi-supervised	monocular	-
3d human pose estimation with 2d marginal heatmaps	2019	weakly-supervised	monocular	code
Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop	2019	weakly-supervised	monocular	code
RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation	2019	weakly-supervised	monocular	code
Unsupervised 3D Pose Estimation With Geometric Self-Supervision	2019	unsupervised-supervised	monocular	-
Anatomy-aware 3D Human Pose Estimation with Bone-based Pose Decomposition	2021	fully-supervised	monocular	code
Generalizing Monocular 3D Human Pose Estimation in the Wild	2019	weakly-supervised	multi-view	-

Summary of the state-of-the-art multi-person 3D pose estimation methods on MuPoTS-3D dataset.

Title	Year	Supervision	Type	URL
LCR-Net: Localization-Classification-Regression for Human Pose	2017	weakly -supervised	monocular	code
Single-shot multi-person 3d pose estimation from monocular rgb	2018	fully-supervised	monocular	code
Camera Distance-Aware Top-Down Approach for 3D Multi-Person Pose Estimation From a Single RGB Image	2019	fully-supervised	monocular	code
XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera	2020	semi-supervised	monocular	code
Lcr-net++: Multi-person 2d and 3d pose detection in natural images	2019	weakly-supervised	monocular	code
Multi-person 3d human pose estimation from monocular images	2019	weakly-supervised	monocular	-

Summary of the state-of-the-art multi-person 3D pose estimation methods on Campus dataset.

Title	Year	Supervision	Type	URL
3D Pictorial Structures for Multiple Human Pose Estimation	2014	fully-supervised	multi-view	-
Multiple human pose estimation with temporally consistent 3D pictorial structures	2014	weakly-supervised	multi-view	-
3d pictorial structures revisited: Multiple human pose estimation	2015	fully-supervised	multi-view	-
Multiple human 3d pose estimation from multiview images	2018	weakly-supervised	multi-view	-
Fast and Robust Multi-Person 3D Pose Estimation From Multiple Views	2019	weakly-supervised	multi-view	code
Multi-Person 3D Pose Estimation and Tracking in Sports	2019	unsupervised	multi-view	code
VoxelPose: Towards Multi-camera 3D Human Pose Estimation in Wild Environment	2020	fully-supervised	multi-view	code
Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS	2020	unsupervised	multi-view	code

Summary of the state-of-the-art multi-person 3D pose estimation methods on Shelf dataset.

Title	Year	Supervision	Type	URL
3D Pictorial Structures for Multiple Human Pose Estimation	2014	fully-supervised	multi-view	-
Multiple human pose estimation with temporally consistent 3D pictorial structures	2014	weakly-supervised	multi-view	-
3d pictorial structures revisited: Multiple human pose estimation	2015	fully-supervised	multi-view	-
Multiple human 3d pose estimation from multiview images	2018	weakly-supervised	multi-view	-
Fast and Robust Multi-Person 3D Pose Estimation From Multiple Views	2019	weakly-supervised	multi-view	code
Multi-Person 3D Pose Estimation and Tracking in Sports	2019	unsupervised	multi-view	code
VoxelPose: Towards Multi-camera 3D Human Pose Estimation in Wild Environment	2020	fully-supervised	multi-view	code
Light3DPose: Real-time Multi-Person 3D Pose Estimation from Multiple Views	2021	weakly-supervised	multi-view	-
Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS	2020	unsupervised	multi-view	code

Acknowledgment

This work is supported by the National National Science Foundation of China (Grant No. 61802355 and 61702350) and the Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing (KLIGIP-2019B04).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
taxonomy.png		taxonomy.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HumanPoseSurvey

Deep learning methods for 3D human pose estimation under different supervision manners: A survey

Single-person and Multi-person

Input paradigm

Supervision form

Taxonomy

Summary on datasets

Summary of the state-of-the-arts methods on Human3.6M dataset.

Summary of the state-of-the-arts methods on MPI-INF-3DHP datasets.

Summary of the state-of-the-art multi-person 3D pose estimation methods on MuPoTS-3D dataset.

Summary of the state-of-the-art multi-person 3D pose estimation methods on Campus dataset.

Summary of the state-of-the-art multi-person 3D pose estimation methods on Shelf dataset.

Acknowledgment

About

Releases

Packages

djzgroup/HumanPoseSurvey

Folders and files

Latest commit

History

Repository files navigation

HumanPoseSurvey

Deep learning methods for 3D human pose estimation under different supervision manners: A survey

Single-person and Multi-person

Input paradigm

Supervision form

Taxonomy

Summary on datasets

Summary of the state-of-the-arts methods on Human3.6M dataset.

Summary of the state-of-the-arts methods on MPI-INF-3DHP datasets.

Summary of the state-of-the-art multi-person 3D pose estimation methods on MuPoTS-3D dataset.

Summary of the state-of-the-art multi-person 3D pose estimation methods on Campus dataset.

Summary of the state-of-the-art multi-person 3D pose estimation methods on Shelf dataset.

Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages