Skip to content

djzgroup/HumanPoseSurvey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

HumanPoseSurvey

Deep learning methods for 3D human pose estimation under different supervision manners: A survey

The rise of deep learning technology has broadly promoted the practical application of artificial intelligence in production and daily life. In computer vision, many human-centered applications, such as video surveillance, human-computer interaction, digital entertainment, etc., rely heavily on accurate and efficient human pose estimation techniques. Inspired by the remarkable achievements in learning-based 2D human pose estimation, numerous researches are devoted to the topic of 3D human pose estimation via deep learning methods. Against this backdrop, this paper provides an extensive literature survey of recent literature about deep learning methods for 3D human pose estimation to display the development process of these researches, track the latest research trends and analyze the characteristics of devised types of methods. The literatures are reviewed along with the general pipeline of 3D human pose estimation, which consists of human body modeling, learning-based pose estimation, and regularization for refinement. Different from existing reviews of the same topic, this paper focus on deep learning-based methods. The learning-based pose estimation is discussed from two categories of single-person and multi-person. Each one is further categorized by data type to the image-based methods and the video-based methods. Moreover, due to the significance of data for learning-based methods, this paper surveys the 3D human pose estimation methods according to the taxonomy of supervision form. At last, this paper also enlists the current and widely used datasets and compares performances of reviewed methods. Based on this literature survey, it can be concluded that each branch of 3D human pose estimation starts with fully-supervised methods, and there is still much room for multi-person pose estimation based on other supervision methods from both image and video. Besides the significant development of 3D human pose estimation via deep learning, the inherent ambiguity and occlusion problems remain challenging issues that need to be better addressed.

Keywords: 3D human pose estimation; deep learning; unsupervised; fully-supervised; weakly-supervised; semi-supervised

Single-person and Multi-person

  • Single-person 3D pose estimation falls into two categories: two-stage and One-stage methods.
    • Two-stage methods involve two steps, first, 2D joint locations are obtained by 2D keypoints detection models, then 2D keypoints are lifted to 3D keypoints by deep learning methods.
    • One-stage methods mean regressing 3D joint locations directly from a RGB image. These methods require many training data with 3D annotations, but manual annotation is costly and demanding.
  • Multi-person 3D pose estimation is divided into two categories: top-down and bottom-up methods.
    • Top-down methods first detect the human candidates and then apply single-person pose estimation for each of them.
    • Bottom-up methods first detect all keypoints followed by grouping them into different people.

Input paradigm

  • RGB image-based methods take static images as input, only taking spatial context into account, which differs from video-based methods.
  • Video-based methods meet more challenges than image-based methods, such as temporal information processing, correspondence between spatial information and temporal information and motion changes in different frames, etc.

Supervision form

  • Unsupervised methods do not require any multi-view image data, 3D skeletons, correspondences between 2D-3D points, or use previously learned 3D priors during training. Self-supervised methods which can also solve the issue, deficiency of 3D data, have become popular in recent years. Self-supervised methods is a form of unsupervised learning where the data provides the supervision.
  • Fully-supervised methods rely on large training sets annotated with ground-truth 3D positions coming from multi-view motion capture systems.
  • Weakly-supervised methods access multiple cues for weak supervision, such as, a) paired 2D ground-truth, b) unpaired 3D ground-truth (3D pose without the corresponding image), c) multi-view image pair, d) camera parameters in a multi-view setup, etc.
  • Semi-supervised methods use part of annotated data (e.g. 10 percent of 3D labels), which means labeled training data is scarce.

Taxonomy

Both single-person 3D pose estimation and multi-person 3D pose estimation combined with different supervision forms could derive various branches as described in the figure below.

It is an unbalanced tree describing deep learning based 3D human pose estimation. Multi-person 3D pose estimation has received less interest compared to single-person 3D pose estimation. Also, video-based 3D pose estimation is less studied than image-based 3D pose estimation. Another interesting sight is that fully-supervised methods are presented in each sub-category, which may indicate that fully-supervised methods are helped to investigate a research area at the beginning.

Summary on datasets

We present the state-of-the-art results on several datasets, such as Human3.6m, MPI-INF-3DHP, MuPoTS-3D, Shelf, and Campus datasets.

Summary of the state-of-the-arts methods on Human3.6M dataset.

Title Year Supervision Type URL
3d human pose estimation from monocular images with deep convolutional neural network 2014 fully-supervised monocular -
Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video 2016 weakly-supervised monocular code
Structured Prediction of 3D Human Pose with Deep Neural Networks 2016 fully-supervised monocular code
Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image 2017 weakly-supervised monocular code
Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach 2017 weakly-supervised monocular code
End-to-End Recovery of Human Shape and Pose 2017 weakly-supervised monocular code
3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training 2018 semi-supervised monocular code
Ordinal Depth Supervision for 3D Human Pose Estimation 2018 weakly-supervised monocular code
Occlusion-Aware Networks for 3D Human Pose Estimation in Video 2019 semi-supervised monocular -
RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation 2019 weakly-supervised monocular code
HoloPose: Holistic 3D Human Reconstruction In-The-Wild 2019 weakly-supervised monocular -
Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition 2020 fully-supervised monocular code
3D Human Pose Estimation Using Spatio-Temporal Networks with Explicit Occlusion Training 2020 semi-supervised monocular -
Multi-View Pose Generator Based on Deep Learning for Monocular 3D Human Pose Estimation 2020 fully-supervised monocular -
A Simple Yet Effective Baseline for 3d Human Pose Estimation 2017 fully-supervised multi-view -
Self-Supervised Learning of 3D Human Pose Using Multi-View Geometry 2019 semi-supervised multi-view code
Learnable Triangulation of Human Pose 2019 fully-supervised multi-view code
Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation 2020 fully-supervised multi-view -
Epipolar Transformers 2020 weakly-supervised multi-view code

Summary of the state-of-the-arts methods on MPI-INF-3DHP datasets.

Title Year Supervision Type URL
Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision 2017 fully-supervised monocular code
Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach 2017 weakly-supervised monocular code
3D Human Pose Estimation in the Wild by Adversarial Learning 2018 semi-supervised monocular -
3d human pose estimation with 2d marginal heatmaps 2019 weakly-supervised monocular code
Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop 2019 weakly-supervised monocular code
RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation 2019 weakly-supervised monocular code
Unsupervised 3D Pose Estimation With Geometric Self-Supervision 2019 unsupervised-supervised monocular -
Anatomy-aware 3D Human Pose Estimation with Bone-based Pose Decomposition 2021 fully-supervised monocular code
Generalizing Monocular 3D Human Pose Estimation in the Wild 2019 weakly-supervised multi-view -

Summary of the state-of-the-art multi-person 3D pose estimation methods on MuPoTS-3D dataset.

Title Year Supervision Type URL
LCR-Net: Localization-Classification-Regression for Human Pose 2017 weakly -supervised monocular code
Single-shot multi-person 3d pose estimation from monocular rgb 2018 fully-supervised monocular code
Camera Distance-Aware Top-Down Approach for 3D Multi-Person Pose Estimation From a Single RGB Image 2019 fully-supervised monocular code
XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera 2020 semi-supervised monocular code
Lcr-net++: Multi-person 2d and 3d pose detection in natural images 2019 weakly-supervised monocular code
Multi-person 3d human pose estimation from monocular images 2019 weakly-supervised monocular -

Summary of the state-of-the-art multi-person 3D pose estimation methods on Campus dataset.

Title Year Supervision Type URL
3D Pictorial Structures for Multiple Human Pose Estimation 2014 fully-supervised multi-view -
Multiple human pose estimation with temporally consistent 3D pictorial structures 2014 weakly-supervised multi-view -
3d pictorial structures revisited: Multiple human pose estimation 2015 fully-supervised multi-view -
Multiple human 3d pose estimation from multiview images 2018 weakly-supervised multi-view -
Fast and Robust Multi-Person 3D Pose Estimation From Multiple Views 2019 weakly-supervised multi-view code
Multi-Person 3D Pose Estimation and Tracking in Sports 2019 unsupervised multi-view code
VoxelPose: Towards Multi-camera 3D Human Pose Estimation in Wild Environment 2020 fully-supervised multi-view code
Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS 2020 unsupervised multi-view code

Summary of the state-of-the-art multi-person 3D pose estimation methods on Shelf dataset.

Title Year Supervision **Type ** URL
3D Pictorial Structures for Multiple Human Pose Estimation 2014 fully-supervised multi-view -
Multiple human pose estimation with temporally consistent 3D pictorial structures 2014 weakly-supervised multi-view -
3d pictorial structures revisited: Multiple human pose estimation 2015 fully-supervised multi-view -
Multiple human 3d pose estimation from multiview images 2018 weakly-supervised multi-view -
Fast and Robust Multi-Person 3D Pose Estimation From Multiple Views 2019 weakly-supervised multi-view code
Multi-Person 3D Pose Estimation and Tracking in Sports 2019 unsupervised multi-view code
VoxelPose: Towards Multi-camera 3D Human Pose Estimation in Wild Environment 2020 fully-supervised multi-view code
Light3DPose: Real-time Multi-Person 3D Pose Estimation from Multiple Views 2021 weakly-supervised multi-view -
Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS 2020 unsupervised multi-view code

Acknowledgment

This work is supported by the National National Science Foundation of China (Grant No. 61802355 and 61702350) and the Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing (KLIGIP-2019B04).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published