Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Drum Datasets #91

Open
thenapking opened this issue Aug 12, 2020 · 2 comments
Open

Custom Drum Datasets #91

thenapking opened this issue Aug 12, 2020 · 2 comments

Comments

@thenapking
Copy link

I assembled a dataset of approximately 9,000 bass drum samples from my own recordings, of about 250mb in total. This is much larger than the drum training set you used, however less varied because it's just one type of drum. The samples are really very short - most of the sounds last 250ms. So I trained using the following options for around 20,000 steps:
--data_first_slice --data_pad_end --data_fast_wav --wavegan_dim 32 --data_num_channels 1 --data_sample_rate=22050
After 5-10,000 steps the output was good. The sounds were recognisably bass drums, but had too much random HF noise. Unfort by 20,000 steps the output was mainly noise, and had lost the characteristics, and I stopped training as this had taken roughly 24 hours over a couple of days on a google collab page.
I tried adding the options:
--data_slice_len=16384 --wavegan_batchnorm --data_normalize
However this made the situation even worse (although it was much quicker). I've considered rewriting your code to allow a data_slice_len of 8192, which would be suitable for my dataset. However I am concerned that the dataset itself is the problem given how good your results were.
Unfort this isn't so much an issue, but a request for advice, from @chrisdonahue and others who have used this project. I see that others have raised issues with small datasets #77 , high frequency components #88 etc. I'm a noob with this and don't as yet understand how to interpret the scores and graphs it outputs, but when I do I will see if I can add more detail to this issue.

@chrisdonahue
Copy link
Owner

Hey there. Sorry you're not getting the results you want. It's possible that --wavegan_dim 32 is the culprit; this will result in a model with far fewer parameters than the models we trained in the paper. Is there a particular reason you chose to reduce the size of the model?

@thenapking
Copy link
Author

Hi @chrisdonahue, Thanks for your reply! There wasn't a particular reason for specifying 32 dimensions. I've since tried with the following options: --data_first_slice --data_pad_end --data_fast_wav --wavegan_genr_pp --data_sample_rate=22050 --data_slice_len=16384 --data_normalize. I found batch normalisation was making the situation worse, but I still don't get good results audibly. By 3000 steps I actually had some fairly good results with good HF definition however by 20k these had all disappeared and the situation was getting worse. Inception score was around 1 - but the loss functions looked weird when I graphed them. I will try with --wavegan_disc_phaseshuffle 2 though.
I guess my real question is: how do I assess the dataset? There's more variety in your set, as it includes sounds from multiple drums. How important is that variety? Or is it the size of the set that is important, esp if there is less variety?
Dloss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants