Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

artifacts at the verge between existing and extended frequency bands #23

Open
Cunfu-Zhuge opened this issue Jan 2, 2024 · 7 comments
Open

Comments

@Cunfu-Zhuge
Copy link

I trained the model by myself and found there was a line of artifacts at at the verge between existing and extended frequency bands using the model I trained. I read the paper related to this repo and found the three reasons provided in the paper for this phenomenon didn't exist in my training process. I don't know why.
I guess maybe it's because of the process i transformed .flac to .wav for the dataset VCTK. So could you please tell me how did you transform .flac to .wav?
Thank you!

@Cunfu-Zhuge
Copy link
Author

I find there is a parameter of "model" in the official pretrained model is "hdemucs-snake-ftb-lstm-peg-concat". But this value is not supported by the code. The supported values for this parameter are just "Aero" and "SEAnet".

@At0nale
Copy link

At0nale commented Feb 12, 2024

I had the same issue although I am not converting .flac to .wav so I would rule this out.. I have partially fixed the issue by implementing variance in the way I downsample my training examples (Now using 10 different downsampling methods).

it's working nicely for clean voice samples but as soon as there is a bit of noise or distortion in the voice, I have a frequency boost at the verge between existing and extended frequency bands again.

Any update on this issue?

@yezhangyinge
Copy link

I trained the model by myself and found there was a line of artifacts at at the verge between existing and extended frequency bands using the model I trained. I read the paper related to this repo and found the three reasons provided in the paper for this phenomenon didn't exist in my training process. I don't know why. I guess maybe it's because of the process i transformed .flac to .wav for the dataset VCTK. So could you please tell me how did you transform .flac to .wav? Thank you!

I also found that "a line of artifacts at at the verge between existing and extended frequency bands using the model I trained". Did you tackle this problem? Or any suggestions?

@yezhangyinge
Copy link

I had the same issue although I am not converting .flac to .wav so I would rule this out.. I have partially fixed the issue by implementing variance in the way I downsample my training examples (Now using 10 different downsampling methods).

it's working nicely for clean voice samples but as soon as there is a bit of noise or distortion in the voice, I have a frequency boost at the verge between existing and extended frequency bands again.

Any update on this issue?

Gould you tell me how to "implementing variance in the way I downsample my training examples (Now using 10 different downsampling methods)" ? I also met this problem.

@pokepress
Copy link

Just to be clear, what do these artifacts sound like? I've been using my modified version of this project, and even though I've primarily been doing 44.1->44.1 conversion, I'm getting buzzing/sibilance in the 7-8 khz range for my AM radio upscale project.

@pokepress
Copy link

So, having done more experimentation this weekend, it seems like in my case this issue correlates with the weight of the STFT loss. Those coefficients are adjustable in the base version, so try lowering them and see if it helps. I'm not 100% convinced the STFT loss is the underlying cause of the artifacts (it seems odd that it would affect such specific ranges differently than others), but it does seem to make it worse.

@pokepress
Copy link

I've done more work on this in my fork of the project. Right now, I'm testing a modified version of the STFT loss that allows for restricting the frequency range of the comparison (in terms of code, it zeros out parts of the STFT results above/below specified points), which should force the loss to focus on that particular frequency range. It shows some promise, but I'll need to keep increasing the weights to see how effective it really is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants