Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The tfrecord files #23

Open
xjwla opened this issue Jan 11, 2021 · 8 comments
Open

The tfrecord files #23

xjwla opened this issue Jan 11, 2021 · 8 comments

Comments

@xjwla
Copy link

xjwla commented Jan 11, 2021

Hi,
Because the dataset I downloaded is incomplete. When I use write_records_tcd.py to convert data into .tfrecord files, I cannot get the complete files. Can you send the .tfrecords files to my email address: [email protected].
Thank you very much.
xjw

@georgesterpu
Copy link
Owner

georgesterpu commented Jan 11, 2021

Hi @xjwla

Thanks for opening the issue.
Could you please paste the error message here ?

On my local copy of the TCD-TIMIT dataset I made a few corrections, most of them for speaker 42M.
That script uses the fixed dataset partitioning, as listed here. I suspect that your download is not incomplete, and you only need to rename a few files.

It would not be possible for me to share raw or processed data because it is licensed.

@clarahohohoho
Copy link

HI,

I am also facing the same issue when I run write_records_tcd.py. I am able to run till I reach this error:

FileNotFoundError: [Errno 2] No such file or directory: './datasets/tcdtimit/volunteers/01M/Clips/straightcam/si2077.wav'

Am I suppose to download a local copy of the TCD-TMIT dataset before running the script?

Thank you in advance for your help!

@georgesterpu
Copy link
Owner

@clarahohohoho
Yes, you would have to manually download any dataset you would like to use this project with.
The script is only meant to serve as an example for writing your data into a format compatible with the code.

Most datasets that I've worked with required me to sign up with an academic email or fill in license agreements. I did not see an easy way to automate this process. After all, data preparation is still one of the most time consuming aspects of machine learning.

@clarahohohoho
Copy link

Hi George,

Appreciate your quick reply on this! Unfortunately while working on this, the download link provided on (http://www.mee.tcd.ie/~sigmedia/Resources/TCD-TIMIT) is unavailable at the moment due to the upgrading of the servers. Are you able to share the raw TCD-TIMIT dataset with me?

Thank you!

@georgesterpu
Copy link
Owner

@clarahohohoho
Sorry, I am not aware of an alternative download page for TCD-TIMIT.
I contacted the administrator of the webpage you mentioned. They are still working on finding a new server.

This github repository is not related to the management of the TCD-TIMIT dataset, and I recommend that you contact my supervisor for any further assistance on this topic.

@CindyZyxxxxxx
Copy link

CindyZyxxxxxx commented Mar 3, 2021

Sorry, another question, I didn't know which is the true label_file in write_records.py for LRS2 dataset. I couldn't find out the file "contain pairs of (example name - transcription) on each line, delimited by a space" in the whole LRS2 dataset. If something is wrong with my understanding, please pardon me.

Thank you!

@georgesterpu
Copy link
Owner

@ZhangYX-bin
Sorry, I missed your issue on the Taris repository. Will reply there.

@xjwla
Copy link
Author

xjwla commented Jun 13, 2021

Hi George,

Appreciate your quick reply on this! Unfortunately while working on this, the download link provided on (http://www.mee.tcd.ie/~sigmedia/Resources/TCD-TIMIT) is unavailable at the moment due to the upgrading of the servers. Are you able to share the raw TCD-TIMIT dataset with me?

Thank you!

Hi, Unfortunately, I had the same problem that the download link provided on (http://www.mee.tcd.ie/~sigmedia/Resources/TCD-TIMIT) is unavailable. May I ask if you have found a solution. Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants