Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made changes for SnapATAC2 v2.6.0 #1273

Merged
merged 22 commits into from
May 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
801c59f
Made changes for SnapATAC2 v2.6.0
ekiernan Apr 25, 2024
b69f57b
updated min tsse parameter
ekiernan Apr 25, 2024
7f31bf6
updated docker to include one with bgzip
ekiernan Apr 25, 2024
2416c9a
calculate snap metrics earlier
ekiernan Apr 29, 2024
ff85447
adding tsse metrics back into ATAC workflow
ekiernan May 3, 2024
259784c
took out annotations gtf from parsebarcodes call
ekiernan May 3, 2024
d50aebc
added anndata to atac
ekiernan May 6, 2024
df1729a
update snapatac2 dockers to v2.6.2
ekiernan May 9, 2024
64a59f5
Merge branch 'develop' into lk-PD-2608-snapatac2
ekiernan May 10, 2024
f48ef54
added more threads
ekiernan May 13, 2024
58c0818
updated snapatac docker to snapatac v2.6.3 to fix bug
ekiernan May 16, 2024
c528e23
Update PairedTag.changelog.md
ekiernan May 16, 2024
a61798f
Update PairedTag.changelog.md
ekiernan May 16, 2024
480ab57
Update PairedTag.wdl
ekiernan May 16, 2024
5e39adb
Merge branch 'develop' into lk-PD-2608-snapatac2
ekiernan May 16, 2024
0cc6de1
Merge branch 'develop' into lk-PD-2608-snapatac2
ekiernan May 20, 2024
7a84034
added annotations gtf back into atac preindex
ekiernan May 20, 2024
e9c8e00
updated changelogs for snapatac2
ekiernan May 21, 2024
b55654d
updated slideseq version
ekiernan May 21, 2024
088f95f
Update pipelines/skylab/paired_tag/PairedTag.changelog.md
ekiernan May 21, 2024
08af91f
fixing multiome and optimus versions and changelogs
ekiernan May 22, 2024
09b681f
Merge branch 'lk-PD-2608-snapatac2' of https://github.com/broadinstit…
ekiernan May 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions pipelines/skylab/multiome/Multiome.changelog.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
# 3.4.5
# 5.0.0
2024-05-20 (Date of Last Commit)

* Updated SnapATAC2 docker to SnapATAC2 v2.6.3; this impacts the workflow output metrics

# 4.0.2
2024-05-14 (Date of Last Commit)

* Updated the Paired-tag Demultiplex task so that some intermediate input names have been renamed; this change does not impact the Multiome workflow

# 3.4.4
# 4.0.1
2024-05-10 (Date of Last Commit)

* Updated the Paired-tag Demultiplex task; this change does not impact the Multiome workflow

# 3.4.3
# 4.0.0
2024-04-24 (Date of Last Commit)

* Updated the input parameters for STARsolo in STARsoloFastq task. These include the parameters: soloCBmatchWLtype, soloUMIdedup and soloUMIfiltering
Expand Down
4 changes: 2 additions & 2 deletions pipelines/skylab/multiome/Multiome.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import "https://raw.githubusercontent.com/broadinstitute/CellBender/v0.3.0/wdl/c

workflow Multiome {

String pipeline_version = "3.4.5"
String pipeline_version = "5.0.0"

input {
String input_id
Expand Down Expand Up @@ -79,10 +79,10 @@ workflow Multiome {
read3_fastq_gzipped = atac_r3_fastq,
input_id = input_id + "_atac",
tar_bwa_reference = tar_bwa_reference,
annotations_gtf = annotations_gtf,
chrom_sizes = chrom_sizes,
whitelist = atac_whitelist,
adapter_seq_read1 = adapter_seq_read1,
annotations_gtf = annotations_gtf,
adapter_seq_read3 = adapter_seq_read3
}
call H5adUtils.JoinMultiomeBarcodes as JoinBarcodes {
Expand Down
5 changes: 5 additions & 0 deletions pipelines/skylab/multiome/atac.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 2.0.0
2024-05-20 (Date of Last Commit)

* Updated SnapATAC2 docker to SnapATAC2 v2.6.3; this impacts the workflow output metrics

# 1.2.3
2024-05-14 (Date of Last Commit)

Expand Down
24 changes: 15 additions & 9 deletions pipelines/skylab/multiome/atac.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,10 @@ workflow ATAC {
Int mem_size_bwa = 512
String cpu_platform_bwa = "Intel Ice Lake"

# GTF for SnapATAC2 to calculate TSS sites of fragment file
File annotations_gtf
# Text file containing chrom_sizes for genome build (i.e. hg38)
File chrom_sizes
#File for annotations for calculating ATAC TSSE
File annotations_gtf
# Whitelist
File whitelist

Expand All @@ -41,7 +41,7 @@ workflow ATAC {
String adapter_seq_read3 = "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG"
}

String pipeline_version = "1.2.3"
String pipeline_version = "2.0.0"

parameter_meta {
read1_fastq_gzipped: "read 1 FASTQ file as input for the pipeline, contains read 1 of paired reads"
Expand Down Expand Up @@ -436,21 +436,21 @@ task BWAPairedEndAlignment {
task CreateFragmentFile {
input {
File bam
File annotations_gtf
File chrom_sizes
File annotations_gtf
Boolean preindex
Int disk_size = 500
Int mem_size = 16
Int nthreads = 1
Int nthreads = 4
String cpuPlatform = "Intel Cascade Lake"
}

String bam_base_name = basename(bam, ".bam")

parameter_meta {
bam: "Aligned bam with CB in CB tag. This is the output of the BWAPairedEndAlignment task."
annotations_gtf: "GTF for SnapATAC2 to calculate TSS sites of fragment file."
chrom_sizes: "Text file containing chrom_sizes for genome build (i.e. hg38)."
annotations_gtf: "GTF for SnapATAC2 to calculate TSS sites of fragment file."
disk_size: "Disk size used in create fragment file step."
mem_size: "The size of memory used in create fragment file."
}
Expand All @@ -461,10 +461,10 @@ task CreateFragmentFile {
python3 <<CODE

# set parameters
atac_gtf = "~{annotations_gtf}"
bam = "~{bam}"
bam_base_name = "~{bam_base_name}"
chrom_sizes = "~{chrom_sizes}"
atac_gtf = "~{annotations_gtf}"
preindex = "~{preindex}"

# calculate chrom size dictionary based on text file
Expand All @@ -477,6 +477,7 @@ task CreateFragmentFile {
# use snap atac2
import snapatac2.preprocessing as pp
import snapatac2 as snap
import anndata as ad

# extract CB or BB (if preindex is true) tag from bam file to create fragment file
if preindex == "true":
Expand All @@ -487,13 +488,18 @@ task CreateFragmentFile {

# calculate quality metrics; note min_num_fragments and min_tsse are set to 0 instead of default
# those settings allow us to retain all barcodes
pp.import_data("~{bam_base_name}.fragments.tsv", file="~{bam_base_name}.metrics.h5ad", chrom_size=chrom_size_dict, gene_anno="~{annotations_gtf}", min_num_fragments=0, min_tsse=0)
pp.import_data("~{bam_base_name}.fragments.tsv", file="temp_metrics.h5ad", chrom_sizes=chrom_size_dict, min_num_fragments=0)
atac_data = ad.read_h5ad("temp_metrics.h5ad")
# calculate tsse metrics
snap.metrics.tsse(atac_data, atac_gtf)
# Write new atac file
atac_data.write_h5ad("~{bam_base_name}.metrics.h5ad")

CODE
>>>

runtime {
docker: "us.gcr.io/broad-gotc-prod/snapatac2:1.0.4-2.3.1"
docker: "us.gcr.io/broad-gotc-prod/snapatac2:1.0.9-2.6.3-1715865353"
disks: "local-disk ${disk_size} SSD"
memory: "${mem_size} GiB"
cpu: nthreads
Expand Down
7 changes: 6 additions & 1 deletion pipelines/skylab/optimus/Optimus.changelog.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# 6.6.2
# 7.1.0
2024-05-20 (Date of Last Commit)

* Updated SnapATAC2 docker to SnapATAC2 v2.6.3; this does not impact the Optimus workflow

# 7.0.0
2024-04-24 (Date of Last Commit)

* Updated the input parameters for STARsolo in STARsoloFastq task. These include the parameters: soloCBmatchWLtype, soloUMIdedup and soloUMIfiltering
Expand Down
2 changes: 1 addition & 1 deletion pipelines/skylab/optimus/Optimus.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ workflow Optimus {
# version of this pipeline


String pipeline_version = "6.6.2"
String pipeline_version = "7.1.0"


# this is used to scatter matched [r1_fastq, r2_fastq, i1_fastq] arrays
Expand Down
9 changes: 6 additions & 3 deletions pipelines/skylab/paired_tag/PairedTag.changelog.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
# 0.7.0
2024-05016 (Date of Last Commit)
# 0.7.0
2024-05-20

* Updated SnapATAC2 docker and tasks to run SnapATAC v2.6.3
* Added testing infrastructure for paired-tag plumbing data and example data sets

* Added Paired-tag testing infrastructure and example test inputs

# 0.6.1
2024-05-14 (Date of Last Commit)

* Updated the demultiplex task so that some intermediate input names have been renamed. There is no change to the outputs.


# 0.6.0
2024-05-10 (Date)

Expand Down
2 changes: 1 addition & 1 deletion pipelines/skylab/paired_tag/PairedTag.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,11 @@ workflow PairedTag {
read3_fastq_gzipped = demultiplex.fastq3,
input_id = input_id + "_atac",
tar_bwa_reference = tar_bwa_reference,
annotations_gtf = annotations_gtf,
chrom_sizes = chrom_sizes,
whitelist = atac_whitelist,
adapter_seq_read1 = adapter_seq_read1,
adapter_seq_read3 = adapter_seq_read3,
annotations_gtf = annotations_gtf,
preindex = preindex
}

Expand Down
5 changes: 5 additions & 0 deletions pipelines/skylab/slideseq/SlideSeq.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 3.1.6
2024-05-20 (Date of Last Commit)

* Updated SnapATAC2 docker to SnapATAC2 v2.6.3; this does not impact the SlideSeq workflow

# 3.1.5
2024-04-12 (Date of Last Commit)

Expand Down
2 changes: 1 addition & 1 deletion pipelines/skylab/slideseq/SlideSeq.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import "../../../tasks/skylab/MergeSortBam.wdl" as Merge

workflow SlideSeq {

String pipeline_version = "3.1.5"
String pipeline_version = "3.1.6"

input {
Array[File] r1_fastq
Expand Down
6 changes: 4 additions & 2 deletions tasks/skylab/H5adUtils.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ task JoinMultiomeBarcodes {
# import anndata to manipulate h5ad files
import anndata as ad
import pandas as pd
import snapatac2 as snap
print("Reading ATAC h5ad:")
print("~{atac_h5ad}")
print("Read ATAC fragment file:")
Expand All @@ -234,7 +235,7 @@ task JoinMultiomeBarcodes {
atac_tsv = pd.read_csv("~{atac_fragment}", sep="\t", names=['chr','start', 'stop', 'barcode','n_reads'])
whitelist_gex = pd.read_csv("~{gex_whitelist}", header=None, names=["gex_barcodes"])
whitelist_atac = pd.read_csv("~{atac_whitelist}", header=None, names=["atac_barcodes"])

# get dataframes
df_atac = atac_data.obs
df_gex = gex_data.obs
Expand All @@ -261,6 +262,7 @@ task JoinMultiomeBarcodes {
# set gene_data.obs to new dataframe
print("Setting Optimus obs to new dataframe")
gex_data.obs = df_gex

# write out the files
gex_data.write("~{gex_base_name}.h5ad")
atac_data.write_h5ad("~{atac_base_name}.h5ad")
Expand All @@ -277,7 +279,7 @@ task JoinMultiomeBarcodes {
>>>

runtime {
docker: "us.gcr.io/broad-gotc-prod/snapatac2:1.0.4-2.3.1-1700590229"
docker: "us.gcr.io/broad-gotc-prod/snapatac2:1.0.9-2.6.3-1715865353"
disks: "local-disk ~{disk} HDD"
memory: "${machine_mem_mb} MiB"
cpu: nthreads
Expand Down
6 changes: 4 additions & 2 deletions tasks/skylab/PairedTagUtils.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -227,11 +227,13 @@ task ParseBarcodes {
# import anndata to manipulate h5ad files
import anndata as ad
import pandas as pd
import snapatac2 as snap
print("Reading ATAC h5ad:")
atac_data = ad.read_h5ad("~{atac_h5ad}")
print("Reading ATAC fragment file:")
test_fragment = pd.read_csv("~{atac_fragment}", sep="\t", names=['chr','start', 'stop', 'barcode','n_reads'])



# Separate out CB and preindex in the h5ad and identify sample barcodes assigned to more than one cell barcode
print("Setting preindex and CB columns in h5ad")
df_h5ad = atac_data.obs
Expand Down Expand Up @@ -271,7 +273,7 @@ task ParseBarcodes {
>>>

runtime {
docker: "us.gcr.io/broad-gotc-prod/snapatac2:1.0.4-2.3.1-1700590229"
docker: "us.gcr.io/broad-gotc-prod/snapatac2:1.0.9-2.6.3-1715865353"
disks: "local-disk ~{disk} HDD"
memory: "${machine_mem_mb} MiB"
cpu: nthreads
Expand Down
Loading