Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any non-English document images in the dataset? #23

Open
jocelynguo opened this issue Apr 8, 2020 · 1 comment
Open

Any non-English document images in the dataset? #23

jocelynguo opened this issue Apr 8, 2020 · 1 comment

Comments

@jocelynguo
Copy link

This is a nice dataset for research on NLP and CV. Thank you for making it publicly available.
Wondering any foreign language document image is included in the PubLayNet dataset?

@zhxgj
Copy link
Contributor

zhxgj commented Apr 8, 2020

@jocelynguo Thanks. It is a good questions. I do not have statistics, but I think nearly all the documents are in English. There may be a few documents with some foreign characters of medicine names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants