Quesion about temporal part #15

iseunghoon · 2025-02-08T15:56:29Z

Thank you for the great resource. I have a question: It appears that, without considering the temporal axis, the features are fused independently for each frame in the form [bs*t, c, h, w]. Is that correct? There is no fusion of information between the frames except for video swin backbone. Did I understand right?

bo-miao · 2025-02-12T13:09:27Z

Yes, you are correct.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quesion about temporal part #15

Quesion about temporal part #15

iseunghoon commented Feb 8, 2025

bo-miao commented Feb 12, 2025

Quesion about temporal part #15

Quesion about temporal part #15

Comments

iseunghoon commented Feb 8, 2025

bo-miao commented Feb 12, 2025