Skip to content

Commit

Permalink
Reorganize README (czi-hca-comp-tools#7)
Browse files Browse the repository at this point in the history
* Reorganize README

inspired by https://github.com/sindresorhus/awesome

* Add readme

* Reorganize everything

* Reorganize datasets folder

* Add organs for tabula muris

* Deconstruct front page

* make beefier example

* Add pull request tempalte
  • Loading branch information
olgabot authored and batson committed Apr 26, 2018
1 parent 815f853 commit 3241d68
Show file tree
Hide file tree
Showing 4 changed files with 77 additions and 3 deletions.
8 changes: 8 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Thank you for contributing to `easy-data`! Please make sure your markdown file explaining the dataset has the following:

- [ ] a description of the dataset including a link to the appropriate publication or
reference.
- [ ] direct links to download the count matrix (in the form of an easy-to-load file, like a `rds` file containing an sparse matrix for R and an [AnnData](https://github.com/theislab/anndata) `hdf5` file or
a `mtx` file for python).
- [ ] direct links to download the metadata (in a `csv` with rows indexed by cell names).
- [ ] sample loading code for R and python.
32 changes: 29 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,36 @@

Easy access to a small collection of benchmark datasets for methods development.

# Instructions

<p align="center">
<a href="benchmarks.md">Why do we need data benchmarks?</a>&nbsp;&nbsp;&nbsp;
<a href="contributing.md">Contribution guide</a>&nbsp;&nbsp;&nbsp;
</p>

## Contents

- [Imaging](#imaging)
- [Multiomics](#multiomics)
- [RNA-Seq](#rna-seq)


## Datasets

Instructions for downloading and loading each dataset are in text files in the `datasets` folder.

For example, Tabula Muris is described in [datasets/tabula_muris.md](datasets/tabula_muris.md).
### RNA-Seq

- [Loyale Developing mouse retina](datasets/developing_mouse_retina.md) - 10x single cell RNA-seq data from the developing mouse retina. [Code repo](https://github.com/gofflab/developing_mouse_retina_scRNASeq) | [Download instructions](datasets/developing_mouse_retina.md) #retina #mouse #umi #10x #droplet
- [Tabula Muris](datasets/tabula_muris.md) - 20 different mouse organs, both full transcript (SmartSeq2) and UMI-based droplet counting (10x Genomics). [Code repo](https://github.com/czbiohub/tabula-muris) | [Vignette repo](https://github.com/czbiohub/tabula-muris-vignettes) | [Interactive website](http://tabula-muris.ds.czbiohub.org/) | [Download instructions](datasets/tabula_muris.md) #mouse #aorta #bladder #brain #diaphragm #fat #heart #kidney #large_intestine #muscle #liver #lung #mammary_gland #marrow #pancreas #skin #spleen #thymus #tongue #rnaseq #smartseq2 #10x #umi #droplet



### Imaging


- [Example dataset](datasets/example.md) - 1-2 sentence summary of dataset. [Code Repo](https://github.com/) | [Vignette Repo](https://github.com/) | [Interactive Website](https://github.com/) | [Download instructions](datasets/example.md) #mouse


### Multiomics

If you would like to add a dataset, follow the instructions in [CONTRIBUTING.md](CONTRIBUTING.md).
- [Example dataset](datasets/example.md) - 1-2 sentence summary of dataset. [Code Repo](https://github.com/) | [Vignette Repo](https://github.com/) | [Interactive Website](https://github.com/) | [Download instructions](datasets/example.md) #mouse
38 changes: 38 additions & 0 deletions datasets/example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Example dataset

Long-form description of the data, the organs and species collected, any perturbations of the data. May link to the appropriate publication or reference

This dataset contains 100,000 cells from the mouse *organ*.

## How to download the metadata

[Here](example.com) is a direct link to the metadata as a `csv`, where rows are indexed by cell names.

## How to download the counts data

You can download complete count files as sparse matrices in `.rds` format for easy loading into `R`. Download [this](example.com) file and unzip.

## Example code

Here are some code snippets for loading the data

### Python

Here is a code snippet for loading the data in Python:

```python
import pandas as pd
from anndata import read_h5ad

metadata = pd.read_csv('data/metadata.csv')
data = read_h5ad('data/matrix.h5ad').T
```

### R

```R
library(tidyverse)

matrix = readRDS("TM_droplet_mat.rds")
metadata = read_csv("TM_droplet_metadata.csv")
```
2 changes: 2 additions & 0 deletions datasets/tabula_muris.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ Version-controlled metadata are available on [github](https://github.com/czbioh
You can download complete count files as sparse matrices in `.rds` format for easy loading into `R`. Unzip [TabulaMuris.zip](https://s3.amazonaws.com/czbiohub-tabula-muris/TabulaMuris.zip). Load:

```R
library(tidyverse)

tm.droplet.matrix = readRDS("TM_droplet_mat.rds")
tm.droplet.metadata = read_csv("TM_droplet_metadata.csv")
```
Expand Down

0 comments on commit 3241d68

Please sign in to comment.