Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First sections of methods #38

Merged
merged 12 commits into from
Feb 28, 2024

Conversation

allyhawkins
Copy link
Member

Here I'm adding the first few sections of the methods.

  • I combined the data generation and data processing into one section, since how data was generated dictates who processed it.
  • In that same section, I wasn't quite sure how much detail to go into regarding processing? We could just say all libraries were processed in the contributor's lab and remove the rest if we think its too much?
  • Here, I added the sections for processing with alevin-fry and post-processing of just regular single-cell/single-nuclei RNA-seq libraries. Is this level of detail okay?

I want to break this up so it's not one giant PR, so I'll file issues to track completing the rest of the sections.

Copy link

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit 479f968.

Manuscript build

Copy link

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit 16079cf.

Manuscript build

@jaclyn-taroni
Copy link
Member

I will review for the level of detail. When we sprint plan, I want to discuss potentially spreading methods pull request review across the team. We want to avoid too many cooks at first, of course, but this seems like the section that is safest to have multiple reviewers, and I anticipate that it would make me much less of a bottleneck.

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Level of detail looks good 🎉

Let's have someone who has spent more time with the scpca-nf and scpcaTools code bases take the final look.

content/04.methods.md Outdated Show resolved Hide resolved
content/04.methods.md Outdated Show resolved Hide resolved
Copy link

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit 6519ca6.

Manuscript build

@allyhawkins
Copy link
Member Author

I addressed both of @jaclyn-taroni comments. I'm going to send this over to @jashapiro for review now.

Copy link

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit 371f392.

Manuscript build

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks good! My comments are mostly nitpicky edits and some attempts to smooth flow. I think we could reduce the alevin-fry description just a bit, but that was really the biggest thing I saw.

- Parameter choices for alevin-fry

To quantify each single-cell and single-nuclei RNA-seq gene expression, `scpca-nf` uses `salmon alevin` [@doi:10.1186/s13059-020-02151-8] and `alevin-fry`[@doi:10.1038/s41592-022-01408-3] to generate a gene by cell counts matrix.
Prior to mapping, we generated an index using transcripts from both spliced cDNA and intronic regions, denoted as the `splici` index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that we do include flanking regions too. I might just say "spliced and unspliced cDNA sequences" but also we might want to cite something about splici?

content/04.methods.md Outdated Show resolved Hide resolved
content/04.methods.md Outdated Show resolved Hide resolved
Raw data was generated, and sample metadata was compiled by each lab and institution contributing to the Portal.
Single-cell or single-nuclei libraries were generated using one of the commercially available kits from 10X Genomics.
For bulk RNA-seq, RNA was collected and sequenced using either paired-end or single-end sequencing.
For spatial transcriptomics, cDNA libraries were generated using the Visium kit from 10X Genomics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For spatial transcriptomics, cDNA libraries were generated using the Visium kit from 10X Genomics.
For spatial transcriptomics, cDNA libraries were generated using the Visium kit from 10x Genomics.

content/04.methods.md Outdated Show resolved Hide resolved
content/04.methods.md Outdated Show resolved Hide resolved
content/04.methods.md Outdated Show resolved Hide resolved
content/04.methods.md Outdated Show resolved Hide resolved
This output is read into R to create a `SingleCellExperiment` using the `fishpond::load_fry()` function.
The resulting `SingleCellExperiment` contains a `counts` assay with a gene-by-cell counts matrix where all spliced and unspliced reads for a given gene are totaled together.
We also include a `spliced` assay, which includes a gene-by-cell counts matrix for only spliced reads.
These matrices include all potential cells, including empty droplets, and are provided in the unfiltered objects included in downloads from the Portal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we might want to put "unfiltered" and "filtered" and "processed" in quotes to make it clear that these are labels more than anything?

Suggested change
These matrices include all potential cells, including empty droplets, and are provided in the unfiltered objects included in downloads from the Portal.
These matrices include all potential cells, including empty droplets, and are provided in the "unfiltered" objects included in downloads from the Portal.

content/04.methods.md Outdated Show resolved Hide resolved
Copy link

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit 45344ec.

Manuscript build

Copy link

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit 15f4f75.

Manuscript build

@allyhawkins
Copy link
Member Author

Just noting that we do include flanking regions too. I might just say "spliced and unspliced cDNA sequences" but also we might want to cite something about splici?

I updated the text to reflect this comment and added the ref for the alevin-fry paper, since that's where they introduce the splici index. Alternatively, we could link to the tutorial that describes the splici index, but I think the paper is better?

I wonder if we might want to put "unfiltered" and "filtered" and "processed" in quotes to make it clear that these are labels more than anything?

I'm torn on this. If we do it here I think we need to do it everytime we refer to these objects throughout the manuscript. Are we okay with that?

Also for the gene by cell/droplets, I changed to gene by barcode? What do you think of that?
I removed the hyphens throughout, because I think in the figure legends I've been looking at we aren't using hyphens and I can also remove any ones that have slipped into the results so far.

And then I updated the alevin-fry description of the parameters based on your suggestions, this should be ready for another look.

Copy link

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit 4bf0b99.

Manuscript build

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I personally think the quotes for "unfiltered" etc. are fine here, even we don't use them everywhere, but it might depend a bit on the context, which I haven't looked at yet! This might also be something where other people have stronger opinions than mine.

An alternative might be to rephrase to say something like "objects labeled as unfiltered." But that gets a bit clunky.

I also suggested a more specific update to the alevin-fry section. Feel free to update/modify that to your taste.

- Parameter choices for alevin-fry

To quantify RNA-seq gene expression for each cell or nucleus in a library, `scpca-nf` uses `salmon alevin` [@doi:10.1186/s13059-020-02151-8] and `alevin-fry`[@doi:10.1038/s41592-022-01408-3] to generate a gene by barcode counts matrix.
Prior to mapping, we generated an index using transcripts from both spliced cDNA and unspliced cDNA sequences, denoted as the `splici` index [@doi:10.1038/s41592-022-01408-3].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the paper is the better reference here.

- HVG selection
- PCA and UMAP calculation

The output from running `alevin-fry` includes a gene by cell counts matrix, with reads from both spliced and unspliced reads.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to say "barcode" here too, and then convert this to "cell" after filtering?
That or we should just always use "gene by cell" and maybe add a note that some of the "cells" correspond to barcodes that were not actually observed before this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made an update to use "gene by cell" throughout, but including a caveat that the output from alevin-fry is all potential cell barcodes. I think this should be sufficient?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

content/04.methods.md Outdated Show resolved Hide resolved
Co-authored-by: Joshua Shapiro <[email protected]>
Copy link

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit dd17e57.

Manuscript build

@allyhawkins
Copy link
Member Author

I personally think the quotes for "unfiltered" etc. are fine here, even we don't use them everywhere, but it might depend a bit on the context, which I haven't looked at yet! This might also be something where other people have stronger opinions than mine.

Just noting that I'm going to leave the quotes for now, but I think we can gather opinions on them when going through everything.

Copy link

Click the link below to download the manuscript build as a ZIP file.
This build is associated with commit b4b962c.

Manuscript build

@allyhawkins allyhawkins merged commit de8bf69 into main Feb 28, 2024
1 check passed
@allyhawkins allyhawkins deleted the allyhawkins/methods-data-processing-and-single-cell branch February 28, 2024 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants