A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge

The second place of Ego4D Natural Language Queries challenge on ECCV 2022. Our arXiv version can be found in this link. We invite our audience to try out the code.

Introduction

This code repo implements an ActionFormer variant for single-stage temporal sentence grounding on Ego4D NLQ challenge. Our model differs from ActionFormer in following aspects:

An additional transformer-based text encoder.
Transformer-based classification and regression heads.
Attention-based fusion of video and text features.
Frame level contrastive loss

Code Overview

The structure of this code repo is heavily inspired by ActionFormer. Some of the main components are

./libs/core: Parameter configuration module.
./libs/datasets: Data loader and IO module.
./libs/model: Our main model with all its building blocks.
./libs/utils: Utility functions for training, inference, and postprocessing.

Installation

Follow INSTALL.md for installing necessary dependencies and compiling the code.

Dataset

Ego4D NLQ

Download ego_vlp_reshape.zip from this google drive link. This file includes EgoVLP feature in pt format.
Download the official Slowfast and Omnivore features from Ego4D official repo.

Details These are EgoVlP features extracted using EgoVLP official code. The features are extracted using clips of 16 frames and a stride of 16 frames. We reshaped these features to align with Ego4D official slowfast features.

Quick Start

Follow data/DATA_README.md for prepare video features.
Unpack the file under ./data/ego4d (or elsewhere and link to ./data).
The folder structure should look like

This folder
│   README.md
│   ...  
│
└───data/
│    └───ego4d/
│    │	 └───annotations
│    │	 └───video_features
│    │	     └───ego_vlp_reshape
│    │	     └───official_slowfast   
│    │	     └───official_omnivore 
│    │	     └───fusion     
│    └───...
|
└───libs
│
│   ...

Training and Evaluation

Train our model on the Ego4D dataset. This will create an experiment folder under ./log that stores training config, logs, and checkpoints.

python ./train.py --config configs/ego4d.yaml -n ego4d -g 0

[Optional] Monitor the training using TensorBoard

tensorboard --logdir=./log/ego4d/

Evaluate the trained model. The Rank1@IoU0.3 metric for Ego4D should be around 15.5%.

python ./eval.py -n ego4d -c last -ema

Generate submission file for Ego4D NLQ challenge.

python ./submit.py -n ego4d -c last -ema

Reperduce Our Results

Our checkpoint can be downloaded from here. You can download the checkpoint file, move it the to ./log folder, and use the following command to reproduce our results. (If the data format is correct, the result should be: Rank@1, IoU@0.1 = 17.58 and Rank@1, IoU@0.3 = 9.76)

python ./eval.py -n ego4d -c 08 -ema

Contact

Sicheng Mo (smo3@wisc.edu)

References

If you are using our code, please consider citing our paper.

@misc{mo2022simple,
      title={A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge}, 
      author={Sicheng Mo and Fangzhou Mu and Yin Li},
      year={2022},
      eprint={2211.08704},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge

Introduction

Code Overview

Installation

Dataset

Ego4D NLQ

Quick Start

Contact

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge

Introduction

Code Overview

Installation

Dataset

Ego4D NLQ

Quick Start

Contact

References