This repo hosts the official implementation of STAMP: an open heterogeneous multi-agent collaborative perception framework for autonomous driving.
Before CFA | After CFA |
---|---|
Our framework supports:
-
Heterogeneous Modalities: Each agent can be equipped with sensors of different modalities.
- LiDAR
- Camera
- LiDAR + Camera
-
Heterogeneous Model Architectures and Parameters: Each agent can be equipped with different model architectures.
- Encoder
- PointPillars (LiDAR)
- SECOND (LiDAR)
- Pixor (LiDAR)
- VoxelNet (LiDAR)
- PointFormer (LiDAR)
- Lift-Splat-Shoot [ResNet] (Camera)
- Lift-Splat-Shoot [EfficientNet] (Camera)
- Fusion model
- Window Attention first proposed by V2X-ViT (ECCV 2022)
- Pyramid Fusion first proposed by HEAL (ICLR 2024)
- Fused Axial Attention first proposed by CoBevt (PMLR 2023)
- Cross-Vehicle Aggregation first proposed by V2VNet (ECCV 2022)
- Encoder
-
Heterogeneous Downstream Tasks: Each agent can be trained towards various downstream tasks (training objectives).
- 3D Object Detection
- BEV Segmentation
-
Multiple Datasets:
We are committed to expanding our framework's capabilities. Future updates will include support for:
- Additional modalities
- New model architectures
- Diverse downstream tasks
- More datasets
For data and environment preparation, please refer to the HEAL repository.
To reproduce our results, use the following commands:
bash train_object_detection.sh
bash train_v2v4real.sh
bash task_agnostic.sh
We are in the process of preparing model checkpoints for release. Please stay tuned for updates.
This project builds upon the excellent work of HEAL. We extend our sincere gratitude to their team for their outstanding contributions to the field.
For the purpose of double blind review, we will release the contact information later.
For any questions or concerns, please open an issue in this repository, and we'll be happy to assist you.