How STFT Discriminator Output Logits are used in Hinge Loss? #74

steventan0110 · 2023-10-28T09:43:04Z

❓ Questions

Hi authors, thanks for open-sourcing the project code as well as your implementation of STFT discriminator. I was able to train a good codec model with your STFT implementation.

However, I see that the output logits from the discriminator have the shape (b c t w), b for batch size, c for #channels, t for timesteps/frames, and w for frequency bins/dimension. In order to compute the hinge loss, D(x) should return a shape of (b c) or (b c t) right, and I am wondering how you aggregate the information from the frequency bins/dimension (the last dimension)?

My current implementation directly sums over the last dimension and I suppose I could also use a nn.Linear layer to map dimension from w->1. I'm wondering if you could provide more details on how the discriminator logits are used during training, thanks!

steventan0110 added the question Further information is requested label Oct 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How STFT Discriminator Output Logits are used in Hinge Loss? #74

How STFT Discriminator Output Logits are used in Hinge Loss? #74

steventan0110 commented Oct 28, 2023

How STFT Discriminator Output Logits are used in Hinge Loss? #74

How STFT Discriminator Output Logits are used in Hinge Loss? #74

Comments

steventan0110 commented Oct 28, 2023

❓ Questions