-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft of results - portal overview #26
Merged
Merged
Changes from 1 commit
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
28fc875
add scpca overview to results
allyhawkins ffdef5f
Add JNT edits to overview section of results
jaclyn-taroni 8a2b9f1
Merge pull request #46 from AlexsLemonade/jaclyn-taroni/overview-resu…
allyhawkins 1d0f5e6
actually describe figure 1A and 1B
allyhawkins 01d558b
wording
allyhawkins a5eaf5a
Merge branch 'main' into allyhawkins/draft-results-overview
allyhawkins 73a6a73
Merge branch 'main' into allyhawkins/draft-results-overview
allyhawkins b99cf72
Apply suggestions from code review
allyhawkins d76f191
indicate which ontology was used
allyhawkins 93f6cce
add citations for ontologies
allyhawkins 03ac73c
missing spaces
allyhawkins File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,25 +2,30 @@ | |
|
||
## The Single-cell Pediatric Cancer Atlas Portal | ||
|
||
1. History and overview of the Portal | ||
- In 2022, the Childhood Cancer Data Lab launched the Single-cell Pediatric Cancer Atlas (ScPCA) Portal to make uniformly processed, summarized single-cell and single-nuclei RNA-seq data and de-identified metadata available for download | ||
- The Portal currently holds X amount of samples from X amount of tumor types | ||
- Data available on the Portal was obtained using two mechanisms - accepting raw data from ALSF-funded investigators and investigators who used our open-source pipeline to produce summarized gene expression data for inclusion on the portal. | ||
- In addition to providing summarized gene expression data, we collect a core set of metadata that is provided on the Portal for all samples including, age, sex, diagnosis, subdiagnosis (if applicable), tissue location, and disease stage. | ||
- All metadata that is provided by the submitter is reviewed to standardize as much as possible. We also utilize ontology ID's where possible. | ||
- Fig. 1A shows how many samples we have from each type of tumor. For each diagnosis, we also indicate what proportion of the samples come from each disease stage (e.g., initial diagnosis, recurrence, post-mortem). | ||
- The samples obtained on the portal are mostly from patient tumors, although some are from patient-derived xenografts and human cell lines | ||
- In addition to single-cell and single-nuclei RNA-seq, many samples have associated bulk RNA-seq, ADT data (CITE-seq), cell hashing, or spatial transcriptomics. | ||
- Fig. 1B summarizes the total number of samples that are single-cell vs. single-nuclei. Additionally, we show how many of the samples on the portal also have either bulk, CITE, cell hashing, or spatial data. | ||
- Supplemental Table 1 shows a breakdown of how many of each modality is found in each project. | ||
|
||
2. Obtaining additional project information | ||
- On the Portal, samples are organized by project. Each project is a collection of similar samples from a single investigator. | ||
- To select projects of interest, users can filter based on diagnosis, modality included, single-cell or single-nuclei and 10X version. Additionally, users will be able to filter based on if the project includes cell line samples or xenografts. | ||
- A summary of each project, including a list of samples found in each project, is displayed on the Portal. | ||
- Fig.1C shows an example of this summary which include an abstract, links to any external information about the projects such as any associated publication information, and links to external places where data may be stored such as SRA or GEO. | ||
- If a project includes bulk, CITE, spatial, or multiplexing, this will also be indicated on the project card. | ||
In March of 2022, the Childhood Cancer Data Lab launched the Single-cell Pediatric Cancer Atlas (ScPCA) Portal to make uniformly processed, summarized single-cell and single-nuclei RNA-seq data and de-identified metadata from pediatric tumor samples available for download. | ||
Today, the Portal contains data from 500 samples and over 50 tumor types. | ||
Data available on the Portal was obtained using two different mechanisms. | ||
Raw data was accepted from ALSF-funded investigators and processed using our open-source pipeline, `scpca-nf`, or investigators processed their raw data using `scpca-nf`, producing summarized gene expression data submitted for inclusion on the Portal. | ||
|
||
All samples on the Portal include a core set of metadata obtained from investigators, including age, sex, diagnosis, subdiagnosis (if applicable), tissue location, and disease stage. | ||
Some investigators submitted additional metadata, such as treatment and tumor stage also found on the Portal. | ||
All submitted metadata was standardized as much as possible to maintain consistency across projects before adding to the Portal. | ||
In addition to providing a human-readable value for the submitted metadata, we also provide an ontology term ID, if applicable. | ||
The total number of samples for each diagnosis is shown in Figure 1A, along with a breakdown of the proportion of samples from each disease stage within a diagnosis group. | ||
allyhawkins marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Figure 1A summarizes all samples from patient tumors or patient-derived xenografts currently available on the Portal. | ||
Along with the patient tumors, the Portal contains a handful of samples from human tumor cell lines. | ||
|
||
Each available sample has at minimum summarized gene expression data from either single-cell or single-nuclei RNA-seq. | ||
However, some samples include additional data, such as quantified data from tagging cells with Antibody-derived tags (ADT), like CITE-seq[@doi:10.1038/nmeth.4380], or multiplexing samples with hashtag oligonucleotides (HTO)[@doi:10.1186/s13059-018-1603-1]. | ||
In some cases, multiple libraries from the same sample were collected to conduct either bulk RNA-seq or spatial transcriptomics. | ||
Downloading a sample on the Portal will include sequencing data from all associated libraries, including data from any additional modalities mentioned here. | ||
A summary of the number of samples with each additional modality is shown in Figure 1B, and a detailed summary of the total samples with each sequencing method broken-down by project, is available in Supplemental Table 1. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to the comment above, how you're talking about the figure in the main text should add additional benefits for the audience, not just summarize the legend. For example, what proportion of samples in the Portal have an additional modality? |
||
|
||
Samples on the Portal are organized by project, where each project is a collection of similar samples from a single investigator. | ||
Users can download all samples in a project or navigate to projects of interest and choose individual samples to download. | ||
To identify projects of interest, users can filter based on diagnosis, included modalities (e.g., CITE-seq, bulk RNA-seq), 10X Genomics version (e.g., 10Xv2, 10Xv3), and whether or not a project includes samples derived from patient-derived xenografts or cell lines. | ||
The project card displays an abstract, the total number of samples included, a list of diagnoses for all samples included in the Project, and links to any external information associated with the project, such as publications and links to external data, such as SRA or GEO (Figure 1C). | ||
The project card will also indicate the type(s) of sequencing performed, including the 10X Genomics kit version, the suspension type (cell or nucleus), and if additional sequencing is present, like bulk RNA-seq or multiplexing. | ||
|
||
## Uniform processing of data available on the ScPCA Portal | ||
|
||
|
@@ -91,4 +96,4 @@ | |
- `scpca-nf` is able to quantify both of these additional sequencing methods. | ||
- Bulk RNA FASTQ are first trimmed using `fastp` and then aligned using `salmon`. The bulk output is a single tsv file with the sample by gene matrix for all samples in that project. | ||
- For spatial transcriptomics, the spatial RNA FASTQ and slide image are input into `scpca-nf` and quantified using `spaceranger`. The output includes the spot by gene matrix along with a summary report, produced by `spaceranger`. | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and it would be helpful to specify which ontologies are used, too, in my opinion. You're underselling the value-add (not to mention the work that went into the metadata) the way this is currently written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the first methods PR reminded me that we should probably file an issue to track methods for the ontologies!