Skip to content

SichengMo/Ego4D_NLQ_Actionformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge

The second place of Ego4D Natural Language Queries challenge on ECCV 2022. Our arXiv version can be found in this link. We invite our audience to try out the code.

Introduction

This code repo implements an ActionFormer variant for single-stage temporal sentence grounding on Ego4D NLQ challenge. Our model differs from ActionFormer in following aspects:

  • An additional transformer-based text encoder.
  • Transformer-based classification and regression heads.
  • Attention-based fusion of video and text features.
  • Frame level contrastive loss

Code Overview

The structure of this code repo is heavily inspired by ActionFormer. Some of the main components are

  • ./libs/core: Parameter configuration module.
  • ./libs/datasets: Data loader and IO module.
  • ./libs/model: Our main model with all its building blocks.
  • ./libs/utils: Utility functions for training, inference, and postprocessing.

Installation

  • Follow INSTALL.md for installing necessary dependencies and compiling the code.

Dataset

Ego4D NLQ

Details These are EgoVlP features extracted using EgoVLP official code. The features are extracted using clips of 16 frames and a stride of 16 frames. We reshaped these features to align with Ego4D official slowfast features.

Quick Start

  • Follow data/DATA_README.md for prepare video features.
  • Unpack the file under ./data/ego4d (or elsewhere and link to ./data).
  • The folder structure should look like
This folder
│   README.md
│   ...  
│
└───data/
│    └───ego4d/
│    │	 └───annotations
│    │	 └───video_features
│    │	     └───ego_vlp_reshape
│    │	     └───official_slowfast   
│    │	     └───official_omnivore 
│    │	     └───fusion     
│    └───...
|
└───libs
│
│   ...

Training and Evaluation

  • Train our model on the Ego4D dataset. This will create an experiment folder under ./log that stores training config, logs, and checkpoints.
python ./train.py --config configs/ego4d.yaml -n ego4d -g 0
  • [Optional] Monitor the training using TensorBoard
tensorboard --logdir=./log/ego4d/
  • Evaluate the trained model. The [email protected] metric for Ego4D should be around 15.5%.
python ./eval.py -n ego4d -c last -ema 
  • Generate submission file for Ego4D NLQ challenge.
python ./submit.py -n ego4d -c last -ema 

Reperduce Our Results

  • Our checkpoint can be downloaded from here. You can download the checkpoint file, move it the to ./log folder, and use the following command to reproduce our results. (If the data format is correct, the result should be: Rank@1, [email protected] = 17.58 and Rank@1, [email protected] = 9.76)
python ./eval.py -n ego4d -c 08 -ema 

Contact

Sicheng Mo ([email protected])

References

If you are using our code, please consider citing our paper.

@misc{mo2022simple,
      title={A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge}, 
      author={Sicheng Mo and Fangzhou Mu and Yin Li},
      year={2022},
      eprint={2211.08704},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

Ego4D_NLQ_Actionformer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published