Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training New Models #3

Open
lyellr opened this issue Jun 21, 2020 · 20 comments
Open

Training New Models #3

lyellr opened this issue Jun 21, 2020 · 20 comments

Comments

@lyellr
Copy link

lyellr commented Jun 21, 2020

I was wondering if you had any advice or code you'd be willing to share for training other models. I have a couple pieces of hardware I'd love to play around with using your inference implementation. Thanks!!

@ljuvela
Copy link
Collaborator

ljuvela commented Jul 12, 2020

Sorry but we can't share the training code directly at this point.
I had a look at this and it's mostly correct https://github.com/teddykoker/pedalnet (Also seeing you've branched it 🙂)

As for tips, check that the convolutions actually use a causal padding mode. That's not available in PyTorch by default and easy to get slightly wrong (valid padding trims symmetrically).

Next tip: be careful when permuting PyTorch tensor dimensions for JSON export. We used Tensorflow originally and the convolution weight tensors work differently there.

@GuitarML
Copy link
Contributor

So for 10 dilated layers the left side padding would actually be quite large compared to the input, correct? Assuming pad=(kernel_size-1)*dilation

@ljuvela
Copy link
Collaborator

ljuvela commented Jul 24, 2020

You're correct. The total amount of zero padding will then match the receptive field of the network. If you're concerned about the "invalid" samples at the model output, you can always trim the target and model output signals to the valid length.

I find it convenient to create a "CausalConv1D" module, which follows the standard Conv1D semantics, but zero pads internally. Other methods may be more efficient, but this I think is least error-prone.

@GuitarML
Copy link
Contributor

Thanks for the help, have not been able to figure out the causal padding implementation yet, but I was able to convert a model trained from PedalNet into a format that WaveNetVA can build a plugin from, and the result was listenable but still pretty far off from the ts9 pedal samples I recorded. I used 0's for the layer values I couldn't infer from the pytorch model: the -1 input layer weights and biases, and the remaining weight values for layer 0 (pytorch only had 96 values here, so I added 0s to get to 1536). Also the linear mix layer weights were really high compared to the wavenet1.json models, so I scaled them down by a factor of 3 to get something that would run in my DAW. Any tips would be appreciated, I'm new to AI programming so my guesses at the model conversion are probably not great, but being able to virtualize some of my music equipment would be pretty cool.

@ljuvela
Copy link
Collaborator

ljuvela commented Jul 31, 2020

You could try this to implement a causal convolution pytorch/pytorch#1333 (comment)

Sounds like there is some type of size mismatch between the PedalNet and WaveNetVA configurations. I'd think it's best to try identify what that is exactly, as it's really hard to adjust the weights manually post-hoc.

Another thing that's easy to get wrong when exporting is the convolution weights ordering. Pytorch uses (out_channels, in_channels, kernel_size), while the plugin (and TensorFlow which we used originally) uses (kernel_size, in_channels, out_channels). When exporting from PyTorch, you'll need to permute the weights before exporting.

For reference, this function reads in a flattened double array to a convolution kernel

void Convolution::setKernel(std::vector<float> W)
{
assert(W.size() == inputChannels*outputChannels*filterWidth);
size_t i = 0;
for (size_t k = 0; k < filterWidth; ++k)
for(size_t row = 0; row < inputChannels; ++row)
for (size_t col = 0; col < outputChannels; ++col)
{
kernel[filterWidth-1-k](row,col) = W[i];
i += 1;
}
}

As a fun side note, I remember playing around with the plugin with different random weights and most of the time it sound like some kind of usable distortion effect. This seems to drop off from the WaveNet structure somehow.

@GuitarML
Copy link
Contributor

That helps a lot, thanks! I'll share the code once I get it working right. I did add an analysis script to my PedalNet fork to compare predicted vs actual wave files, if anyone is interested. It would be interesting to see what other types of hardware this model works well on, such as a compressor, or microphones.

@GuitarML
Copy link
Contributor

GuitarML commented Aug 6, 2020

Well, I think I've accounted for all the obvious differences in my converter, but the sound still doesn't match the original pedalnet model when loaded in the WaveNetVA plugin. Needed to add an input layer to get the layer sizes to match up, and the large weights on the linear mix layer was due to training on Int16 audio data as opposed to Float32. The code as it stands is available in my fork of PedalNet, along with my trained and converted models, if anyone wants to take a look.

@ljuvela
Copy link
Collaborator

ljuvela commented Aug 8, 2020

Good catches!

I think something might go wrong when you're slicing the residual output and skip values on lines 96 and 99 in
https://github.com/keyth72/pedalnet/blob/1cf03f73a8a5f60d157422849cf43a75dfb7f6ef/model.py#L81-L101
Doing causal padding the way you did it should give the same size outputs anyway?

Have you tried to match the numerics in a very minimal example (one hidden layer, small dimensions)? No need to even train the model, just export random weights and test with a few input samples of a linear ramp or something similar. Debug build in standalone mode and printing to std::cerr are useful.

@GuitarML
Copy link
Contributor

Got the converter working, I had to combine the tanh and sigm layers into one layer in the PedalNet model. Stepping through the wavenetva code in debug mode really helped. Thanks again!

@ljuvela
Copy link
Collaborator

ljuvela commented Aug 15, 2020

Awesome! We should add a pointer to your pedalnet fork in the Readme!

@GuitarML
Copy link
Contributor

Go for it! I think what I'd like to do next is combine this with traditional modeling to add in things like drive/tone/reverb controls. And also look into lowering the network latency for live guitar playing. For the time being though, it will be nice to have the sound of an amp "cranked to 12" for my recordings without waking up the whole neighborhood!

@damskaggep
Copy link
Owner

Hey, nice job getting the exporter working! About lowering the network latency: the model itself shouldn't introduce any latency. Any latency in the processing is due to the buffer size used by Juce. The buffer size can be changed in the settings of the compiled standalone version, or in the settings of your DAW if you are using the plugin version.

@ljuvela
Copy link
Collaborator

ljuvela commented Aug 19, 2020

Another potential source of latency is your recording setup. An easy way to sync the recorded input-target pairs is to use a physical loopback connection for the input.

@GuitarML
Copy link
Contributor

Changing the buffer size in my DAW fixed it, sounds great now. Out of curiosity, are there any plans to release the RNN plugin code?

@yudashuixiao1
Copy link

Hi, does it work in WINDOWS? It happened some serious noise when I tried to buit the project and the sound is input through the sound card. or other problems caused? thx!

@GuitarML
Copy link
Contributor

I'm running it in Windows, I hear some clicks occasionally but adjusting the settings in the DAW help, and it's a low end computer. I'm going through a separate audio interface and using the VST plugin though, haven't tried directly into the sound card.

@ljuvela
Copy link
Collaborator

ljuvela commented Aug 21, 2020

To run this (or pretty much any other audio plugin) on Windows, it's best to have an external audio interface with ASIO support. Windows Audio drivers and internal sound cards won't allow low enough latency for real-time playing, plus you're likely to get some very annoying buffer-grind noise.

@GuitarML
Copy link
Contributor

GuitarML commented Sep 8, 2020

If anyone on this thread is interested, I added two guitar plugins built from the WaveNet model:
https://github.com/keyth72/SmartGuitarPedal
https://github.com/keyth72/SmartGuitarAmp
If anyone wants to add new models or features (or point out bugs in my code), I'd be happy to incorporate them.

@yudashuixiao1
Copy link

amazing! It work well in PC.I plan to port the model to embedded devices. Can the computing power of the chip support this model?

@GuitarML
Copy link
Contributor

@ljuvela I made a LSTM model in Keras based on the research paper, but there are a few things I'm not sure about from reading the paper. If you can't comment on it I understand, but I opened up this issue to try to interpret the research paper more accurately: GuitarML/GuitarLSTM#8

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants