AST for multi-label audio tagging? #142
Antoine101
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am trying to describe acoustic scenes from audio samples by listing all the sources present in the sounds from a set list of learnt labels.
I have read your paper and used your model, mainly through the Hugging Face hub, for single label classification.
Does it work for multi-labels classification as well (one audio sample = possibly multiple labels)?
In here you say this checkpoint is able to classify an audio into one of audioset classes.
![image](https://private-user-images.githubusercontent.com/48209504/396167229-2e28a508-de8d-48e8-b222-29bb82e9def6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzOTA1OTIsIm5iZiI6MTczOTM5MDI5MiwicGF0aCI6Ii80ODIwOTUwNC8zOTYxNjcyMjktMmUyOGE1MDgtZGU4ZC00OGU4LWIyMjItMjliYjgyZTlkZWY2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDE5NTgxMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRjYzIzZjFiZTRiZjE1MjM0NDhjMDNiZDk5ZjA4ZjM3ZWIzOWE1NGVjMGI2NGFhYzEwMTdlMTY1YmFlMTk2YzMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.HF2XGIvGfM9ykKR8XoRnjqK_JLg14XdQancqs9d_Avc)
In your paper however, you mention results obtained on the FSD50K dataset which is multi-label dataset (correct me if I'm wrong).
![image](https://private-user-images.githubusercontent.com/48209504/396167738-5769e855-e871-4147-bc50-e3ce8dbc797a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzOTA1OTIsIm5iZiI6MTczOTM5MDI5MiwicGF0aCI6Ii80ODIwOTUwNC8zOTYxNjc3MzgtNTc2OWU4NTUtZTg3MS00MTQ3LWJjNTAtZTNjZThkYmM3OTdhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDE5NTgxMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBlODQ0OWVjMTkyNjAxYTNlOTc4ZjM4YmI3MGZjNWQ0Y2I0YTE0OGY3NTNiN2M0ZjEwNzQwMjE4MDE5NzEwYmEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.p8F_9BlFn-k0NJfVIXF0Ll-wGhMjfmb3PGbaUWgUAHs)
I have come accross the LwLRAP metric which seems to be suited to multi-labels tasks. Did you use this metric specially for finetuning your model on the FSD50K? Or did you tweak FSD50K to turn it into a single label dataset?
And finally, would it be possible to finetune AST on my multi-labels downstream task through the hugging face checkpoint? Does it only require the appropriate arrays of labels and metric? Is it only a matter of metric or is there more to it?
Thanks a lot in advance.
Antoine
Beta Was this translation helpful? Give feedback.
All reactions