From 49e4c1aa1aa736e9af219f497ba37df4457d5647 Mon Sep 17 00:00:00 2001 From: Antonio Gonzalez Date: Fri, 13 Sep 2024 11:16:26 -0600 Subject: [PATCH] doc improvemnts (#3434) --- CHANGELOG.md | 2 +- .../processingdata/processing-recommendations.rst | 2 +- .../doc/source/processingdata/woltka_pairedend.rst | 13 ++++++------- 3 files changed, 8 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e69df9a9a..03ad1593d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,7 +11,7 @@ Deployed on September 23rd, 2024 * Initial changes in `qiita_client` to have more accurate variable names: `QIITA_SERVER_CERT` -> `QIITA_ROOTCA_CERT`. Thank you @charles-cowart! * Added `get_artifact_html_summary` to `qiita_client` to retrieve the summary file of an artifact. * Re-added github actions to `https://github.com/qiita-spots/qiita_client`. -* `Woltka v0.1.4, paired-end` superseded `Woltka v0.1.4` in `qp-woltka`; [more information](https://qiita.ucsd.edu/static/doc/html/processingdata/woltka_pairedend.html). Thank you to @qiyunzhu for the benchmarks! +* `Woltka v0.1.6, paired-end` superseded `Woltka v0.1.6` in `qp-woltka`; [more information](https://qiita.ucsd.edu/static/doc/html/processingdata/woltka_pairedend.html). Thank you to @qiyunzhu for the benchmarks! * Other general fixes, like [#3424](https://github.com/qiita-spots/qiita/pull/3424), [#3425](https://github.com/qiita-spots/qiita/pull/3425). diff --git a/qiita_pet/support_files/doc/source/processingdata/processing-recommendations.rst b/qiita_pet/support_files/doc/source/processingdata/processing-recommendations.rst index 14ce87f0f..0abc62e51 100755 --- a/qiita_pet/support_files/doc/source/processingdata/processing-recommendations.rst +++ b/qiita_pet/support_files/doc/source/processingdata/processing-recommendations.rst @@ -125,7 +125,7 @@ Note that the command produces up to 5 output artifacts based on the aligner and .. note:: - Woltka 0.1.4 only produces per-genome, per-gene and functional profiles as we are moving + Woltka 0.1.6 only produces per-genome, per-gene and functional profiles as we are moving to Operational Genomic Units (OGUs), which have higher resolution than taxonomic units for community ecology, and were shown to deliver stronger biological signals in downstream analyses. For more information please read: `Phylogeny-Aware Analysis of diff --git a/qiita_pet/support_files/doc/source/processingdata/woltka_pairedend.rst b/qiita_pet/support_files/doc/source/processingdata/woltka_pairedend.rst index fe154b6d3..0084c61dc 100644 --- a/qiita_pet/support_files/doc/source/processingdata/woltka_pairedend.rst +++ b/qiita_pet/support_files/doc/source/processingdata/woltka_pairedend.rst @@ -6,16 +6,16 @@ Benchmarks created by Qiyun Zhu (@qiyunzhu) on Aug 1, 2024. Summary ------- -I tested alternative read pairing schemes in the analysis of shotgun metagenomic sequencing data. Sequencing reads were aligned against a reference microbial genome database as unpaired or paired, with or without singleton and/or discordant alignments suppressed. A series of synthetic datasets were used in the analysis. +I tested alternative read pairing schemes in the analysis of shotgun metagenomic sequencing data. Sequencing reads were aligned against a reference microbial genome database as unpaired or paired. A series of synthetic datasets were used in the analysis. -The results reveal that treating reads as paired is always advantageous over unpaired. Suppressing singleton alignments further increases the accuracy of results, despite the cost of lower mapping rate. Suppressing discordant alignments has no obvious impact on the result. Regardless of accuracy, the downstream community ecology analyses are not obviously impacted by the choice of parameters. +The results reveal that treating reads as paired is always advantageous over unpaired. Regardless of accuracy, the downstream community ecology analyses are not obviously impacted by the choice of parameters. -Therefore, I recommend the general adoption of paired alignments as a standard procedure. I also endorse suppressing singleton and discordant alignments, but note the favor of further tests on whether they may reduce sensitivity with complex communities. +Therefore, I recommend the general adoption of paired alignments as a standard procedure. Alignment parameters -------------------- -Sequencing data were aligned using Bowtie2 v2.5.1 in the “very sensitive” mode against the WoL2 database. They were treated as either unpaired or paired-end: +Sequencing data were aligned using Bowtie2 v2.5.1 in the "very sensitive" mode against the WoL2 database. They were treated as either unpaired or paired-end: - SE: Reads are treated as unpaired (Bowtie2 input: -U merged.fq) - PE: Reads are treated as paired (Bowtie2 input: -1 fwd.fq, -2 rev.fq) @@ -30,11 +30,10 @@ Five synthetic datasets were generated with 25 samples each consisting of random The results of the five Bowtie2 parameter sets were compared using nine metrics: -Three metrics that only rely on each result. +Two metrics that only rely on each result. - Mapping rate (%) - Number of taxa -- Entropy (i.e., Shannon index, but without subsampling) Six metrics that rely on comparing each result against the ground truth (higher is better): @@ -59,4 +58,4 @@ The results revealed: #. PE outperforms SE in all metrics. Most importantly, it reduces false positive rate (higher precision) while retaining mapping rate. Meanwhile, the sensitivity (recall) of identifying true taxa is not obviously compromised (note the y-axis scale). #. PE.NU the two additional parameters had minimum effect on the result and make the alignment step faster. This may suggest that the additional parameters are safe to use. -Therefore, I would recommend adopting paired alignment in preference to unpaired alignment. I may suggest no mixing as it has improved accuracy, but the potential adverse effect of lower mapping rate may be further explored before making a compelling recommendation. Although not having a visible effect, no discordance may be added for logical coherency. +Therefore, I would recommend adopting paired alignment in preference to unpaired alignment.