Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about validate_cut_set in qa.py #1457

Open
t13m opened this issue Feb 21, 2025 · 2 comments
Open

Question about validate_cut_set in qa.py #1457

t13m opened this issue Feb 21, 2025 · 2 comments

Comments

@t13m
Copy link

t13m commented Feb 21, 2025

The validator validate_cut_set serves to make sure no duplicated ids in a CutSet. If I get it right, it is because the CutSet maintains a dict-like interface. My question is this: what should I do to duplicate some of the samples(cuts)? Will it work well if I simply comment out the whole validate_cut_set function and duplicate corresponding lines in cutset.jsonl?

In my experiment setting, the amount of data from different speakers varies over a wide range. In order to achieve the minimum level of balance, the data from speakers with fewer data need to be duplicated. Is there any more lhotse way to achieve that?

@pzelasko
Copy link
Collaborator

That used to be a constraint but at some point we dropped it. I may have missed that validation still checks for this. If you could make a PR to remove this check it would be greatly appreciated. Thanks.

t13m added a commit to t13m/lhotse that referenced this issue Feb 22, 2025
@t13m
Copy link
Author

t13m commented Feb 22, 2025

Sure, will do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants