Skip to content

Latest commit



169 lines (142 loc) · 7.41 KB

File metadata and controls

169 lines (142 loc) · 7.41 KB

Readme not complete


We develop a method for pedestrian pose estimation and intent prediction. The source code is built on top of PifPaf with very little modifications. Work done at EPFL VITA laboratory under Professor Alexandre Alahi. Paper accepted at Transportation Research Part C: Emerging Technologies.



What we used to develop the system

  • Python 3
  • PyTorch 1.0.1
  • OpenCV
  • Ubuntu 18.04.2



Brief Project Structure


  ├── datasets                        : Where the datasets are stored. See setup for more details.
  ├── Pedestrian-Intention-Prediction : Cloned project root
      ├── openpifpaf      
          ├── decoder                 : Scripts to decode the Pif and Paf fields into skeletons.
          ├── encoder                 : Scripts to preprocess the ground truth for the Pif and Paf heads.              
          ├── network                 : Scripts to build the base and head encoder networks.                
          ├──             : Script containing the data loaders.          
          ├──                 : Script to generate the training and validation curves.
          ├──                : Script to train the model.
          ├──         : Script to to generate the video with the predicted activity map.
          ├──           : Script to to generate the video with the predicted poses.

      ├──             : Shell script that runs
      ├──               : Shell script that runs
      ├──                    : Shell script that runs

      ├── outputs     : where the models and logs are stored  
      ├── plots (check)
      ├── scripts (check)
      ├── tests (check) 
      ├── Report.pdf          


  • Create anaconda environment and install PyTorch and OpenCV
conda create -n env python=3.6 anaconda
conda activate env
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
pip install opencv-python
  • Install the source code
pip install Pedestrian-Intention-Prediction
cd Pedestrian-Intention-Prediction
pip install --editable '.[train,test]'
  • Download and unzip the MSCOCO 2017 Train images [118K/18GB] and 2017 Val images [5K/1GB] into the images folder the 2017 Train/Val annotations [241MB] into the annotations folder.
  • Download the JAAD clips (UNRESIZED) and unzip them in the videos folder.
  • Run the script to convert the JAAD videos into frames. Each frame will be placed in a folder under the scene folder. Note that this takes 169G of space.
  • Download and unzip the JAAD annotation files into the annotations folder.

How the datasets folder should look like

  ├── datasets  
      ├── coco
          ├── annotations 
              ├── captions_train2017.json
          ├── images   
              ├── train2017
                  ├── 000000000009.jpg
              ├── val2017
                  ├── 000000000139.jpg
      ├── jaad  
          ├── annotations 
              ├── singletxt_train_1s
                  ├── train.txt
              ├── singletxt_val_1s
                  ├── val.txt
          ├── videos 
              ├── 0001.mp4
          ├── scene 
              ├── 0001
                  ├── 0001.png


  • Download PifPaf's resnet50 model from a direct link or from openpifpaf's pretrained models, rename it to resnet50block5-pif-paf-edge401.pkl and place it in outputs/. Note that the current version only works with resnet50.
  • Run ./ which contains the following command
CUDA_VISIBLE_DEVICES="0,1,3" python3 -m openpifpaf.train \
  --pre-lr=1e-5 \
  --lr=1e-5 \
  --momentum=0.95 \
  --epochs=20 \
  --lr-decay 60 70 \
  --jaad-batch-size=3 \
  --coco-batch-size=6 \
  --basenet=resnet50block5 \
  --head-quad=1 \
  --headnets pif paf crm \
  --square-edge=401 \
  --regression-loss=laplace \
  --lambdas 30 2 2 50 3 3 \
  --freeze-base=1 \
  --jaad_train "singletxt_train_3s" --jaad_val "singletxt_val_3s" --jaad_pre_train "singletxt_pre_train_3s"

The arguments are as follows

  • CUDA_VISIBLE_DEVICES: To control which CUDA devices the program should use.
  • pre-lr: Learning rate when the base net is frozen during the first epoch to initialize the head nets.
  • lr: Learning rate after the base net is unfrozen.
  • momentum: Adam parameter.
  • epochs: Number of epochs to train the model for.
  • lr-decay:
  • jaad-batch-size: Batch size for the JAAD dataset.
  • coco-batch-size: Batch size for the COCO dataset.
  • basenet: PifPaf pretrained base network. Currently only works for resnet50.
  • head-quad: Number of pixel shuffle layers in the head net. Each layer has been hardcoded to upsample the input by a factor of 2 i.e. from H/8,W/8 to H/4,W/4
  • headnets: Head nets to use. Code does not work with anything other than pif paf crm
  • square-edge: Preprocessing parameter as done in PifPaf.
  • regression-loss: Loss used for the vector components in PifPaf.
  • lambdas: Loss weights for Pif's confidence, regression and scale heads and for Paf's confidence and two regression heads.
  • freeze-base: 1 if the base network should be frozen to initialize the task heads and 0 if not.
  • jaad_train: Path to JAAD dataset

At any point of time, the training and validation curves can be visualized as follows

python3 -m openpifpaf.logs \
  outputs/model_name.pkl.log \


  • Either use the model you trained or download a trained one here and place it in the outputs folder.

  • Generate the video with the predicted poses by running ./ which contains the following command

CUDA_VISIBLE_DEVICES="0" python3 -m openpifpaf.video_pose --batch-size 1 --jaad-batch-size 1 \
--checkpoint outputs/resnet50block5-pif-paf-crm-edge401-190525-002719.pkl.epoch006
  • Generate the video with the predicted action activity map by running ./ which contains the following command
CUDA_VISIBLE_DEVICES="0" python3 -m openpifpaf.video_crm --batch-size 1 --jaad-batch-size 1 \
--checkpoint outputs/resnet50block5-pif-paf-crm-edge401-190525-002719.pkl.epoch006
  • Generate the video showing the results of guided backpropagation by running
  • Evaluate precision and recall