-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
artifacts at the verge between existing and extended frequency bands #23
Comments
I find there is a parameter of "model" in the official pretrained model is "hdemucs-snake-ftb-lstm-peg-concat". But this value is not supported by the code. The supported values for this parameter are just "Aero" and "SEAnet". |
I had the same issue although I am not converting .flac to .wav so I would rule this out.. I have partially fixed the issue by implementing variance in the way I downsample my training examples (Now using 10 different downsampling methods). it's working nicely for clean voice samples but as soon as there is a bit of noise or distortion in the voice, I have a frequency boost at the verge between existing and extended frequency bands again. Any update on this issue? |
I also found that "a line of artifacts at at the verge between existing and extended frequency bands using the model I trained". Did you tackle this problem? Or any suggestions? |
Gould you tell me how to "implementing variance in the way I downsample my training examples (Now using 10 different downsampling methods)" ? I also met this problem. |
Just to be clear, what do these artifacts sound like? I've been using my modified version of this project, and even though I've primarily been doing 44.1->44.1 conversion, I'm getting buzzing/sibilance in the 7-8 khz range for my AM radio upscale project. |
So, having done more experimentation this weekend, it seems like in my case this issue correlates with the weight of the STFT loss. Those coefficients are adjustable in the base version, so try lowering them and see if it helps. I'm not 100% convinced the STFT loss is the underlying cause of the artifacts (it seems odd that it would affect such specific ranges differently than others), but it does seem to make it worse. |
I've done more work on this in my fork of the project. Right now, I'm testing a modified version of the STFT loss that allows for restricting the frequency range of the comparison (in terms of code, it zeros out parts of the STFT results above/below specified points), which should force the loss to focus on that particular frequency range. It shows some promise, but I'll need to keep increasing the weights to see how effective it really is. |
I trained the model by myself and found there was a line of artifacts at at the verge between existing and extended frequency bands using the model I trained. I read the paper related to this repo and found the three reasons provided in the paper for this phenomenon didn't exist in my training process. I don't know why.
I guess maybe it's because of the process i transformed .flac to .wav for the dataset VCTK. So could you please tell me how did you transform .flac to .wav?
Thank you!
The text was updated successfully, but these errors were encountered: