Temporal-Informative Adapters in VideoMAE V2 and Multi-Scale Feature Fusion for Micro-Expression Spotting-then-Recognition (ACM MM MEGC2024)
Micro-expressions are subtle facial movements that reveal hidden emotions, but their fleeting and involuntary nature poses significant challenges for detection. This paper introduces a novel approach addressing two critical tasks in micro-expression analysis: spotting and recognition. We integrate the VideoMAE V2 framework with a temporal information adapter and multi-scale feature fusion to enhance the performance of micro-expression spotting and recognition. Our method leverages the temporal information adapter to capture local temporal context within video frames, improving feature extraction efficiency. Additionally, we construct a multi-scale image pyramid to capture a range of motion features, from broad movements to subtle details. By combining these multi-scale features, our approach strengthens the model's capabilities in both micro-expression spotting and recognition. Our method effectively addresses issues related to environmental variations, involuntary facial movements, and dataset imbalance, leading to improved accuracy in Micro-Expression Spotting-then-Recognition.
The architecture is shown as follows:
```
python train.py
```
you might edit datasets.py, config.py, to suit your data