用于benchmark检测的数据集 #93

wanghaisheng · 2018-04-23T10:54:35Z

No description provided.

wanghaisheng · 2018-04-23T10:55:31Z

医疗类病历

*从互助平台收集的用于评估手机拍照类文本定位识别的数据集
https://github.com/wanghaisheng/huzhucases

wanghaisheng · 2018-04-26T05:52:35Z

www.icst.pku.edu.cn/cpdp/data/marmot_data.htm
Dataset for table recognition
In total, 2000 pages in PDF format were collected and the corresponding ground-truths were extracted utilizing our semi-automatic ground-truthing tool "Marmot".
The dataset is composed of Chinese and English pages at the proportion of about 1:1.

The Chinese pages were selected from over 120 e-Books with diverse subject areas provided by Founder Apabi library, and no more than 15 pages were selected from each book.
The English pages were crawled from Citeseer website.

The pages show a great variety in language type, page layout, and table styles. Among them, over 1500 conference and journal papers were crawled, covering various fields, spanning from the year 1970, to latest 2011 publications.
The e-Book pages are mostly in one-column layout, while the English pages are mixed with both one-column and two-column layouts.

wanghaisheng · 2018-05-01T04:57:41Z

Open Images数据集&挑战赛：

https://storage.googleapis.com/openimages/web/index.html

wanghaisheng · 2018-05-10T04:57:49Z

https://github.com/cs-chan/Total-Text-Dataset

Total Text Dataset - ICDAR 2017. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

wanghaisheng · 2018-05-10T05:20:35Z

数据集CTW: https://ctwdataset.github.io/
n this paper we provide details of a newly created dataset of Chinese text with about 1 million Chinese characters annotated by experts in over 30 thousand street view images. This is a challenging dataset with good diversity. It contains planar text, raised text, text in cities, text in rural areas, text under poor illumination, distant text, partially occluded text, etc. For each character in the dataset, the annotation includes its underlying character, its bounding box, and 6 attributes. The attributes indicate whether it has complex background, whether it is raised, whether it is handwritten or printed, etc.

32,285 high resolution images
1,018,402 character instances
3,850 character categories
6 kinds of attributes

wanghaisheng · 2018-05-15T11:41:31Z

http://rrc.cvc.uab.es/?com=introduction
"Robust Reading" refers to the research area dealing with the interpretation of written communication in unconstrained settings. Typically Robust Reading is linked to the detection and recognition of textual information in scene images, but in the wider sense it refers to techniques and methodologies that have been developed specifically for text containers other than scanned paper documents, and include born-digital images and videos to mention a few.

Robust Reading is at the meeting point between camera based document analysis and scene interpretation, and serves as common ground between the document analysis community and the wider computer vision community.

The ICDAR Robust Reading Competition has been held five times [1-5], in 2003, 2005, 2011, 2013 and 2015. The competition is organized around challenges that represent specific application domains for robust reading. Challenges are selected to cover a wide range of real-world situations. Each challenge is set up around different tasks.

ICDAR2017

wanghaisheng · 2018-06-05T12:45:06Z

The Text Recognition Algorithm Independent Evaluation (TRAIT)
https://nvlpubs.nist.gov/nistpubs/ir/2017/NIST.IR.8199.pdf

wanghaisheng · 2018-06-07T03:20:57Z

链接: https://pan.baidu.com/s/12Wstdz_u8iwr7NEJGQtnZg 密码: 7p2m
HWDB2.2手写体VOC，需要的同志自取

mvprasad58 · 2018-10-25T13:08:25Z

in marmot data set the table BBOX are not matching with original images

cloudfool · 2018-12-25T13:07:42Z

我想问下，有没有中文或者英文的文本行的数据集？类似caffe-ocr人工合成的那种。

wanghaisheng · 2018-12-25T15:23:11Z

@cloudfool 大家都是结合自己实际处理的场景套用现有的一些生成工具来造的
真实场景的话英文的还挺多中文的比较少但可以用其他一些来造(比如你处理的是论文类型的文档)

cloudfool · 2018-12-26T05:27:01Z

请问英文的文本行数据集有哪些开源的？我找了很多，都是那种单词级的（比如ICDAR），我想要的是句子级别的。

wanghaisheng · 2018-12-26T08:43:17Z

@cloudfool 我上面列的你都看过了不~
https://github.com/NVlabs/ocroseg/tree/master/testdata
句子级别你要什么样的句子古登堡计划的电子书小说诗歌啥的txt直接可以造啊用numpy这些

mttbx · 2019-06-09T07:38:04Z

@wanghaisheng 你好，给你github上展示的163邮箱发了一个邮件，需要你的帮助兄弟！

wanghaisheng · 2019-06-09T13:42:06Z

@mttbx 我找不到原始文件了。

LinnaWang76 · 2019-11-06T08:50:34Z

链接: https://pan.baidu.com/s/12Wstdz_u8iwr7NEJGQtnZg 密码: 7p2m
HWDB2.2手写体VOC，需要的同志自取

兄弟，链接过期了！

wanghaisheng · 2019-11-08T16:24:24Z

@LinnaWang76 sorry 我已经忘记文件名称，无法在pan中找到文件对其重新进行分享

chixma · 2019-11-21T08:51:11Z

in marmot data set the table BBOX are not matching with original images

I am faced with the same issue. Do you have any idea about it later?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

用于benchmark检测的数据集 #93

用于benchmark检测的数据集 #93

wanghaisheng commented Apr 23, 2018

wanghaisheng commented Apr 23, 2018

wanghaisheng commented Apr 26, 2018

wanghaisheng commented May 1, 2018

wanghaisheng commented May 10, 2018

wanghaisheng commented May 10, 2018

wanghaisheng commented May 15, 2018 •

edited

Loading

wanghaisheng commented Jun 5, 2018

wanghaisheng commented Jun 7, 2018

mvprasad58 commented Oct 25, 2018

cloudfool commented Dec 25, 2018

wanghaisheng commented Dec 25, 2018

cloudfool commented Dec 26, 2018

wanghaisheng commented Dec 26, 2018

mttbx commented Jun 9, 2019

wanghaisheng commented Jun 9, 2019

LinnaWang76 commented Nov 6, 2019

wanghaisheng commented Nov 8, 2019

chixma commented Nov 21, 2019

用于benchmark检测的数据集 #93

用于benchmark检测的数据集 #93

Comments

wanghaisheng commented Apr 23, 2018

wanghaisheng commented Apr 23, 2018

wanghaisheng commented Apr 26, 2018

wanghaisheng commented May 1, 2018

wanghaisheng commented May 10, 2018

wanghaisheng commented May 10, 2018

wanghaisheng commented May 15, 2018 • edited Loading

wanghaisheng commented Jun 5, 2018

wanghaisheng commented Jun 7, 2018

mvprasad58 commented Oct 25, 2018

cloudfool commented Dec 25, 2018

wanghaisheng commented Dec 25, 2018

cloudfool commented Dec 26, 2018

wanghaisheng commented Dec 26, 2018

mttbx commented Jun 9, 2019

wanghaisheng commented Jun 9, 2019

LinnaWang76 commented Nov 6, 2019

wanghaisheng commented Nov 8, 2019

chixma commented Nov 21, 2019

wanghaisheng commented May 15, 2018 •

edited

Loading