GitHub - kongfnajie/SUST-and-RUST-datasets-for-Uyghur-STR

#SUST-and-RUST-datasets-for-Uyghur-STR

SUST is a synthetic dataset containing 600,000 Uyghur images, each image contains one Uyghur word, the maximum word length is 15 characters and the word consists of only 32 basic Uyghur alphabets without other symbols. Therefore, this dataset is only suitable for training word-level scene-based Uyghur text recognition models. The synthesis tool used is TextRecognitionDataGenerator. The background of each picture is randomly selected from more than 8000 background materials, the font is randomly selected from more than 300 kinds of Uyghur fonts, and the text in the picture has the effects of tilting, blurring and distorting in random degrees, as shown in Figure. The backgrounds, fonts, and corpus in SUST are sourced from the internet.

RUST is a real dataset shot in Xinjiang, China, as shown in Fig. 5, which contains a total of 4,000 real Uyghur scene text images. as shown in Figure.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
README.md		README.md
RUST.png		RUST.png
SUST.png		SUST.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

kongfnajie/SUST-and-RUST-datasets-for-Uyghur-STR

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages