Reorganize README (czi-hca-comp-tools#7)

* Reorganize README inspired by https://github.com/sindresorhus/awesome * Add readme * Reorganize everything * Reorganize datasets folder * Add organs for tabula muris * Deconstruct front page * make beefier example * Add pull request tempalte
sam-morris · Apr 26, 2018 · 3241d68 · 3241d68
1 parent 815f853
commit 3241d68
Show file tree

Hide file tree

Showing 4 changed files with 77 additions and 3 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,8 @@
+Thank you for contributing to `easy-data`! Please make sure your markdown file explaining the dataset has the following:
+
+- [ ] a description of the dataset including a link to the appropriate publication or
+reference.
+- [ ] direct links to download the count matrix (in the form of an easy-to-load file, like a `rds` file containing an sparse matrix for R and an [AnnData](https://github.com/theislab/anndata) `hdf5` file or
+  a `mtx` file for python).
+- [ ] direct links to download the metadata (in a `csv` with rows indexed by cell names).
+- [ ] sample loading code for R and python.
diff --git a/README.md b/README.md
@@ -2,10 +2,36 @@
 
 Easy access to a small collection of benchmark datasets for methods development.
 
-# Instructions
+
+<p align="center">
+	<a href="benchmarks.md">Why do we need data benchmarks?</a>&nbsp;&nbsp;&nbsp;
+	<a href="contributing.md">Contribution guide</a>&nbsp;&nbsp;&nbsp;
+</p>
+
+## Contents
+
+- [Imaging](#imaging)
+- [Multiomics](#multiomics)
+- [RNA-Seq](#rna-seq)
+
+
+## Datasets
 
 Instructions for downloading and loading each dataset are in text files in the `datasets` folder.
 
-For example, Tabula Muris is described in [datasets/tabula_muris.md](datasets/tabula_muris.md).
+### RNA-Seq
+
+- [Loyale Developing mouse retina](datasets/developing_mouse_retina.md) - 10x single cell RNA-seq data from the developing mouse retina. [Code repo](https://github.com/gofflab/developing_mouse_retina_scRNASeq) | [Download instructions](datasets/developing_mouse_retina.md) #retina #mouse #umi #10x #droplet
+- [Tabula Muris](datasets/tabula_muris.md) - 20 different mouse organs, both full transcript (SmartSeq2) and UMI-based droplet counting (10x Genomics). [Code repo](https://github.com/czbiohub/tabula-muris) | [Vignette repo](https://github.com/czbiohub/tabula-muris-vignettes) | [Interactive website](http://tabula-muris.ds.czbiohub.org/) | [Download instructions](datasets/tabula_muris.md) #mouse #aorta #bladder #brain #diaphragm #fat #heart #kidney #large_intestine #muscle #liver #lung #mammary_gland #marrow #pancreas #skin #spleen #thymus #tongue #rnaseq #smartseq2 #10x #umi #droplet
+
+
+
+### Imaging
+
+
+- [Example dataset](datasets/example.md) - 1-2 sentence summary of dataset. [Code Repo](https://github.com/) | [Vignette Repo](https://github.com/) | [Interactive Website](https://github.com/) | [Download instructions](datasets/example.md) #mouse
+
+
+### Multiomics
 
-If you would like to add a dataset, follow the instructions in [CONTRIBUTING.md](CONTRIBUTING.md).
+- [Example dataset](datasets/example.md) - 1-2 sentence summary of dataset. [Code Repo](https://github.com/) | [Vignette Repo](https://github.com/) | [Interactive Website](https://github.com/) | [Download instructions](datasets/example.md) #mouse
diff --git a/datasets/example.md b/datasets/example.md
@@ -0,0 +1,38 @@
+# Example dataset
+
+Long-form description of the data, the organs and species collected, any perturbations of the data. May link to the appropriate publication or reference
+
+This dataset contains 100,000 cells from the mouse *organ*.
+
+## How to download the metadata
+
+[Here](example.com) is a direct link to the metadata as a `csv`, where rows are indexed by cell names.
+
+## How to download the counts data
+
+You can download complete count files as sparse matrices in `.rds` format for easy loading into `R`. Download [this](example.com) file and unzip.
+
+## Example code
+
+Here are some code snippets for loading the data
+
+### Python
+
+Here is a code snippet for loading the data in Python:
+
+```python
+import pandas as pd
+from anndata import read_h5ad
+
+metadata = pd.read_csv('data/metadata.csv')
+data = read_h5ad('data/matrix.h5ad').T
+```
+
+### R
+
+```R
+library(tidyverse)
+
+matrix = readRDS("TM_droplet_mat.rds")
+metadata = read_csv("TM_droplet_metadata.csv")
+```
diff --git a/datasets/tabula_muris.md b/datasets/tabula_muris.md
@@ -19,6 +19,8 @@ Version-controlled metadata are available on  [github](https://github.com/czbioh
 You can download complete count files as sparse matrices in `.rds` format for easy loading into `R`. Unzip [TabulaMuris.zip](https://s3.amazonaws.com/czbiohub-tabula-muris/TabulaMuris.zip). Load:
 
 ```R
+library(tidyverse)
+
 tm.droplet.matrix = readRDS("TM_droplet_mat.rds")
 tm.droplet.metadata = read_csv("TM_droplet_metadata.csv")
 ```