Skip to content

kongfnajie/SUST-and-RUST-datasets-for-Uyghur-STR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 

Repository files navigation

#SUST-and-RUST-datasets-for-Uyghur-STR

SUST&RUST Download

SUST is a synthetic dataset containing 600,000 Uyghur images, each image contains one Uyghur word, the maximum word length is 15 characters and the word consists of only 32 basic Uyghur alphabets without other symbols. Therefore, this dataset is only suitable for training word-level scene-based Uyghur text recognition models. The synthesis tool used is TextRecognitionDataGenerator. The background of each picture is randomly selected from more than 8000 background materials, the font is randomly selected from more than 300 kinds of Uyghur fonts, and the text in the picture has the effects of tilting, blurring and distorting in random degrees, as shown in Figure. The backgrounds, fonts, and corpus in SUST are sourced from the internet.

SUST

RUST is a real dataset shot in Xinjiang, China, as shown in Fig. 5, which contains a total of 4,000 real Uyghur scene text images. as shown in Figure.

RUST

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published