Cannot Reproduce Results. Release Evaluation Code #15

wjc2830 · 2024-08-12T11:08:32Z

Thank you for your work on this paper.
However, I am unable to reproduce the main results reported in your paper, including the FID, onset detection accuracy, and AP.
For evaluating FID, I used the SpecVQGAN code, and for onset performance, I used the CondFoleyGen code.
Despite using these resources, the results obtained from this repository's inference code do not match those reported in your paper.
Could you please release your evaluation scripts to facilitate further investigation and ensure reproducibility?

ymzhang0319 · 2024-08-19T08:49:29Z

Hi @wjc2830, thanks for your interest.

We use the same evaluation tools.
Could you please provide more details in your evaluation settings (semantic weight 1.0, temporal weight 0.2,... in our experiments)?
Then I can try to help you figure it out.

wjc2830 · 2024-08-19T09:36:20Z

Yes, ip_adapter_weight is set to 1.0 and controlnet_conditioning_scale is set to 0.2. I opted not to use a class prompt like "machine gun shooting" because I observed that the results generated without it were superior to those with it. With these settings, I got onset acc: 0.1213, detection acc: 0.1347, detection ap: 0.6893 FID: 47.498411865917966, MKL: 5.17888476451238, KID: [0.046363522010469276-1.8900277153235862e-07] on AVSync15 (1500 samples).
Regarding FID computation, I wanna clarify if the reported FID score represents an average across all 15 classes or no concept of class here.

ymzhang0319 · 2024-08-19T12:58:13Z

Thanks for your information! The evaluation experiments is conducted on the AVSync15 test set (150 samples.) You can refer to the official link of AVSync15.
If you have any other questions, please feel free to contact us.

wjc2830 · 2024-08-19T14:54:02Z

Thank you for the prompt reply. I have re-implemented the evaluation and obtained the following results: FID: 33.87400189673342 MKL: 5.159568889935811 KID: [0.053455384736237746-1.3165596698642787e-07] #onset acc: 0.1007, detection acc: 0.1209, detection ap: 0.6936.
There are still discrepancies in metrics such as MKL and detection accuracy. To ensure consistency, I recommend releasing your evaluation script.

Gloria2tt · 2024-11-20T17:02:15Z

@wjc2830 Can you provide your test code. I also try to do the same thing you do.

wjc2830 changed the title ~~Release Evaluation Code~~ Cannot Reproduce Results. Release Evaluation Code Aug 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot Reproduce Results. Release Evaluation Code #15

Cannot Reproduce Results. Release Evaluation Code #15

wjc2830 commented Aug 12, 2024

ymzhang0319 commented Aug 19, 2024

wjc2830 commented Aug 19, 2024

ymzhang0319 commented Aug 19, 2024

wjc2830 commented Aug 19, 2024

Gloria2tt commented Nov 20, 2024

Cannot Reproduce Results. Release Evaluation Code #15

Cannot Reproduce Results. Release Evaluation Code #15

Comments

wjc2830 commented Aug 12, 2024

ymzhang0319 commented Aug 19, 2024

wjc2830 commented Aug 19, 2024

ymzhang0319 commented Aug 19, 2024

wjc2830 commented Aug 19, 2024

Gloria2tt commented Nov 20, 2024