Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How STFT Discriminator Output Logits are used in Hinge Loss? #74

Open
steventan0110 opened this issue Oct 28, 2023 · 0 comments
Open

How STFT Discriminator Output Logits are used in Hinge Loss? #74

steventan0110 opened this issue Oct 28, 2023 · 0 comments
Labels
question Further information is requested

Comments

@steventan0110
Copy link

❓ Questions

Hi authors, thanks for open-sourcing the project code as well as your implementation of STFT discriminator. I was able to train a good codec model with your STFT implementation.

However, I see that the output logits from the discriminator have the shape (b c t w), b for batch size, c for #channels, t for timesteps/frames, and w for frequency bins/dimension. In order to compute the hinge loss, D(x) should return a shape of (b c) or (b c t) right, and I am wondering how you aggregate the information from the frequency bins/dimension (the last dimension)?

My current implementation directly sums over the last dimension and I suppose I could also use a nn.Linear layer to map dimension from w->1. I'm wondering if you could provide more details on how the discriminator logits are used during training, thanks!

@steventan0110 steventan0110 added the question Further information is requested label Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant