Skip to content
This repository has been archived by the owner on Sep 1, 2024. It is now read-only.

Ask for help about the net_vocal #24

Open
dengyuanjie opened this issue Jun 26, 2022 · 0 comments
Open

Ask for help about the net_vocal #24

dengyuanjie opened this issue Jun 26, 2022 · 0 comments

Comments

@dengyuanjie
Copy link

Hello, I observed the effect of net_vocal_attributes in the whole model framework.

At present, the embedding extracted from the predicted sound, the distance of the negative sample pair (audio_embedding_A1_pred and audio_embedding_B1_pred) can reach 2, and the distance of the positive sample pair (audio_embedding_A1_pred and audio_embedding_A2_pred) can reach about 0.

But after I changed the input of net_vocal to pure real sound, the distance between negative sample pairs (audio_embedding_A1_gt and audio_embedding_B_gt) can only reach 1. That is to say, the sound feature extraction is not good when I train the net_vocal alone.

It stands to reason that pure ground voices are easier to extract features than predicted voices. I modified the parameters of the training (batch, learning rate, etc.) but none solved the problem. May I know what is the reason?

Looking forward to your reply!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant