diff --git a/.github/workflows/qiita-ci.yml b/.github/workflows/qiita-ci.yml index bbc6c25f5..6c3b06be1 100644 --- a/.github/workflows/qiita-ci.yml +++ b/.github/workflows/qiita-ci.yml @@ -154,6 +154,8 @@ jobs: echo "5. Setting up qiita" conda activate qiita + # adapt environment_script for private qiita plugins from travis to github actions. + sed 's#export PATH="/home/travis/miniconda3/bin:$PATH"; source #source /home/runner/.profile; conda #' -i qiita_db/support_files/patches/54.sql qiita-env make --no-load-ontologies qiita-test-install qiita plugins update diff --git a/CHANGELOG.md b/CHANGELOG.md index 24ad9a78a..c21fa43e7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,18 @@ # Qiita changelog +Version 2024.02 +--------------- + +Deployed on February 27th, 2024 + +* Default workflows now accept commands with multiple inputs. +* The loading time of the main study page was improved [#3350](https://github.com/qiita-spots/qiita/pull/3350). +* SPP improvements - mainly @charles-cowart, thank you! Errors are now show to the user in the GUI [#127](hhttps://github.com/biocore/mg-scripts/pull/127), admins can restart jobs [#129](hhttps://github.com/biocore/mg-scripts/pull/129), adapter-trimmer files now are stored and their sequence counts are part of the prep-info [#126](hhttps://github.com/biocore/mg-scripts/pull/126), and support for per instrument/data-type configuration [#123](hhttps://github.com/biocore/mg-scripts/pull/123). +* The internal Sequence Processing Pipeline is now using the https://www.gencodegenes.org human transcripts v44 for Metatranscriptomic data - additional to the human pan-genome reference, with the GRCh38 genome + PhiX and T2T-CHM13v2.0 genome - for human host filtering. +* Added a command to qp-woltka: 'Calculate RNA Copy Counts'. +* Other fixes - mainly by @sjanssen2, thank you!: [#3345](https://github.com/qiita-spots/qiita/pull/3345),[#3224](https://github.com/qiita-spots/qiita/pull/3224), [#3357](https://github.com/qiita-spots/qiita/pull/3357), [#3358](https://github.com/qiita-spots/qiita/pull/3358), [#3359](https://github.com/qiita-spots/qiita/pull/3359), [#3362](https://github.com/qiita-spots/qiita/pull/3362), [#3364](https://github.com/qiita-spots/qiita/pull/3364). + + Version 2023.12 --------------- @@ -14,6 +27,7 @@ Deployed on January 8th, 2024 * Updated the Adapter and host filtering plugin (qp-fastp-minimap2) to v2023.12 addressing a bug in adapter filtering; [more information](https://qiita.ucsd.edu/static/doc/html/processingdata/qp-fastp-minimap2.html). * Other fixes: [3334](https://github.com/qiita-spots/qiita/pull/3334), [3338](https://github.com/qiita-spots/qiita/pull/3338). Thank you @sjanssen2. * The internal Sequence Processing Pipeline is now using the human pan-genome reference, together with the GRCh38 genome + PhiX and T2T-CHM13v2.0 genome for human host filtering. +* Added two new commands to qp-woltka: 'SynDNA Woltka' & 'Calculate Cell Counts'. Version 2023.10 diff --git a/INSTALL.md b/INSTALL.md index f23e85f38..89b63cabb 100644 --- a/INSTALL.md +++ b/INSTALL.md @@ -162,9 +162,9 @@ Navigate to the cloned directory and ensure your conda environment is active: cd qiita source activate qiita ``` -If you are using Ubuntu or a Windows Subsystem for Linux (WSL), you will need to ensure that you have a C++ compiler and that development libraries and include files for PostgreSQL are available. Type `cc` into your system to ensure that it doesn't result in `program not found`. The following commands will install a C++ compiler and `libpq-dev`: +If you are using Ubuntu or a Windows Subsystem for Linux (WSL), you will need to ensure that you have a C++ compiler and that development libraries and include files for PostgreSQL are available. Type `cc` into your system to ensure that it doesn't result in `program not found`. If you use the the GNU Compiler Collection, make sure to have `gcc` and `g++` available. The following commands will install a C++ compiler and `libpq-dev`: ```bash -sudo apt install gcc # alternatively, you can install clang instead +sudo apt install gcc g++ # alternatively, you can install clang instead sudo apt-get install libpq-dev ``` Install Qiita (this occurs through setuptools' `setup.py` file in the qiita directory): @@ -178,7 +178,7 @@ At this point, Qiita will be installed and the system will start. However, you will need to install plugins in order to process any kind of data. For a list of available plugins, visit the [Qiita Spots](https://github.com/qiita-spots) github organization. Each of the plugins have their own installation instructions, we -suggest looking at each individual .travis.yml file to see detailed installation +suggest looking at each individual .github/workflows/qiita-plugin-ci.yml file to see detailed installation instructions. Note that the most common plugins are: - [qtp-biom](https://github.com/qiita-spots/qtp-biom) - [qtp-sequencing](https://github.com/qiita-spots/qtp-sequencing) @@ -224,15 +224,15 @@ export REDBIOM_HOST=http://my_host.com:7379 ## Configure NGINX and supervisor -(NGINX)[https://www.nginx.com/] is not a requirement for Qiita development but it's highly recommended for deploys as this will allow us -to have multiple workers. Note that we are already installing (NGINX)[https://www.nginx.com/] within the Qiita conda environment; also, -that Qiita comes with an example (NGINX)[https://www.nginx.com/] config file: `qiita_pet/nginx_example.conf`, which is used in the Travis builds. +[NGINX](https://www.nginx.com/) is not a requirement for Qiita development but it's highly recommended for deploys as this will allow us +to have multiple workers. Note that we are already installing [NGINX](https://www.nginx.com/) within the Qiita conda environment; also, +that Qiita comes with an example [NGINX](https://www.nginx.com/) config file: `qiita_pet/nginx_example.conf`, which is used in the Travis builds. -Now, (supervisor)[https://github.com/Supervisor/supervisor] will allow us to start all the workers we want based on its configuration file; and we -need that both the (NGINX)[https://www.nginx.com/] and (supervisor)[https://github.com/Supervisor/supervisor] config files to match. For our Travis +Now, [supervisor](https://github.com/Supervisor/supervisor) will allow us to start all the workers we want based on its configuration file; and we +need that both the [NGINX](https://www.nginx.com/) and [supervisor](https://github.com/Supervisor/supervisor) config files to match. For our Travis testing we are creating 3 workers: 21174 for master and 21175-6 as a regular workers. -If you are using (NGINX)[https://www.nginx.com/] via conda, you are going to need to create the NGINX folder within the environment; thus run: +If you are using [NGINX](https://www.nginx.com/) via conda, you are going to need to create the NGINX folder within the environment; thus run: ```bash mkdir -p ${CONDA_PREFIX}/var/run/nginx/ @@ -256,7 +256,7 @@ Start the qiita server: qiita pet webserver start ``` -If all the above commands executed correctly, you should be able to access Qiita by going in your browser to https://localhost:21174 if you are not using NGINX, or https://localhost:8383 if you are using NGINX, to login use `test@foo.bar` and `password` as the credentials. (In the future, we will have a *single user mode* that will allow you to use a local Qiita server without logging in. You can track progress on this on issue [#920](https://github.com/biocore/qiita/issues/920).) +If all the above commands executed correctly, you should be able to access Qiita by going in your browser to https://localhost:21174 if you are not using NGINX, or https://localhost:8383 if you are using NGINX, to login use `test@foo.bar` and `password` as the credentials. (Login as `admin@foo.bar` with `password` to see admin functionality. In the future, we will have a *single user mode* that will allow you to use a local Qiita server without logging in. You can track progress on this on issue [#920](https://github.com/biocore/qiita/issues/920).) diff --git a/notebooks/resource-allocation/generate-allocation-summary-arrays.py b/notebooks/resource-allocation/generate-allocation-summary-arrays.py new file mode 100644 index 000000000..8bca68742 --- /dev/null +++ b/notebooks/resource-allocation/generate-allocation-summary-arrays.py @@ -0,0 +1,239 @@ +from qiita_core.util import MaxRSS_helper +from qiita_db.software import Software +import datetime +from io import StringIO +from subprocess import check_output +import pandas as pd +from os.path import join + +# This is an example script to collect the data we need from SLURM, the plan +# is that in the near future we will clean up and add these to the Qiita's main +# code and then have cronjobs to run them. + +# at time of writting we have: +# qp-spades spades +# (*) qp-woltka Woltka v0.1.4 +# qp-woltka SynDNA Woltka +# qp-woltka Calculate Cell Counts +# (*) qp-meta Sortmerna v2.1b +# (*) qp-fastp-minimap2 Adapter and host filtering v2023.12 +# ... and the admin plugin +# (*) qp-klp +# Here we are only going to create summaries for (*) + + +sacct = ['sacct', '-p', + '--format=JobName,JobID,ElapsedRaw,MaxRSS,ReqMem', '-j'] +# for the non admin jobs, we will use jobs from the last six months +six_months = datetime.date.today() - datetime.timedelta(weeks=6*4) + +print('The current "sofware - commands" that use job-arrays are:') +for s in Software.iter(): + if 'ENVIRONMENT="' in s.environment_script: + for c in s.commands: + print(f"{s.name} - {c.name}") + +# 1. Command: woltka + +fn = join('/panfs', 'qiita', 'jobs_woltka.tsv.gz') +print(f"Generating the summary for the woltka jobs: {fn}.") + +cmds = [c for s in Software.iter(False) + if 'woltka' in s.name for c in s.commands] +jobs = [j for c in cmds for j in c.processing_jobs if j.status == 'success' and + j.heartbeat.date() > six_months and j.input_artifacts] + +data = [] +for j in jobs: + size = sum([fp['fp_size'] for fp in j.input_artifacts[0].filepaths]) + jid, mjid = j.external_id.strip().split() + rvals = StringIO(check_output(sacct + [jid]).decode('ascii')) + _d = pd.read_csv(rvals, sep='|') + jmem = _d.MaxRSS.apply(lambda x: x if type(x) is not str + else MaxRSS_helper(x)).max() + jwt = _d.ElapsedRaw.max() + + rvals = StringIO(check_output(sacct + [mjid]).decode('ascii')) + _d = pd.read_csv(rvals, sep='|') + mmem = _d.MaxRSS.apply(lambda x: x if type(x) is not str + else MaxRSS_helper(x)).max() + mwt = _d.ElapsedRaw.max() + + data.append({ + 'jid': j.id, 'sjid': jid, 'mem': jmem, 'wt': jwt, 'type': 'main', + 'db': j.parameters.values['Database'].split('/')[-1]}) + data.append( + {'jid': j.id, 'sjid': mjid, 'mem': mmem, 'wt': mwt, 'type': 'merge', + 'db': j.parameters.values['Database'].split('/')[-1]}) +df = pd.DataFrame(data) +df.to_csv(fn, sep='\t', index=False) + +# 2. qp-meta Sortmerna + +fn = join('/panfs', 'qiita', 'jobs_sortmerna.tsv.gz') +print(f"Generating the summary for the woltka jobs: {fn}.") + +# for woltka we will only use jobs from the last 6 months +cmds = [c for s in Software.iter(False) + if 'minimap2' in s.name.lower() for c in s.commands] +jobs = [j for c in cmds for j in c.processing_jobs if j.status == 'success' and + j.heartbeat.date() > six_months and j.input_artifacts] + +data = [] +for j in jobs: + size = sum([fp['fp_size'] for fp in j.input_artifacts[0].filepaths]) + jid, mjid = j.external_id.strip().split() + rvals = StringIO(check_output(sacct + [jid]).decode('ascii')) + _d = pd.read_csv(rvals, sep='|') + jmem = _d.MaxRSS.apply(lambda x: x if type(x) is not str + else MaxRSS_helper(x)).max() + jwt = _d.ElapsedRaw.max() + + rvals = StringIO(check_output(sacct + [mjid]).decode('ascii')) + _d = pd.read_csv(rvals, sep='|') + mmem = _d.MaxRSS.apply(lambda x: x if type(x) is not str + else MaxRSS_helper(x)).max() + mwt = _d.ElapsedRaw.max() + + data.append({ + 'jid': j.id, 'sjid': jid, 'mem': jmem, 'wt': jwt, 'type': 'main'}) + data.append( + {'jid': j.id, 'sjid': mjid, 'mem': mmem, 'wt': mwt, 'type': 'merge'}) +df = pd.DataFrame(data) +df.to_csv(fn, sep='\t', index=False) + + +# 3. Adapter and host filtering. Note that there is a new version deployed on +# Jan 2024 so the current results will not be the most accurate + +fn = join('/panfs', 'qiita', 'jobs_adapter_host.tsv.gz') +print(f"Generating the summary for the woltka jobs: {fn}.") + +# for woltka we will only use jobs from the last 6 months +cmds = [c for s in Software.iter(False) + if 'meta' in s.name.lower() for c in s.commands] +jobs = [j for c in cmds if 'sortmerna' in c.name.lower() + for j in c.processing_jobs if j.status == 'success' and + j.heartbeat.date() > six_months and j.input_artifacts] + +data = [] +for j in jobs: + size = sum([fp['fp_size'] for fp in j.input_artifacts[0].filepaths]) + jid, mjid = j.external_id.strip().split() + rvals = StringIO(check_output(sacct + [jid]).decode('ascii')) + _d = pd.read_csv(rvals, sep='|') + jmem = _d.MaxRSS.apply(lambda x: x if type(x) is not str + else MaxRSS_helper(x)).max() + jwt = _d.ElapsedRaw.max() + + rvals = StringIO(check_output(sacct + [mjid]).decode('ascii')) + _d = pd.read_csv(rvals, sep='|') + mmem = _d.MaxRSS.apply(lambda x: x if type(x) is not str + else MaxRSS_helper(x)).max() + mwt = _d.ElapsedRaw.max() + + data.append({ + 'jid': j.id, 'sjid': jid, 'mem': jmem, 'wt': jwt, 'type': 'main'}) + data.append( + {'jid': j.id, 'sjid': mjid, 'mem': mmem, 'wt': mwt, 'type': 'merge'}) +df = pd.DataFrame(data) +df.to_csv(fn, sep='\t', index=False) + + +# 4. The SPP! + +fn = join('/panfs', 'qiita', 'jobs_spp.tsv.gz') +print(f"Generating the summary for the SPP jobs: {fn}.") + +# for the SPP we will look at jobs from the last year +year = datetime.date.today() - datetime.timedelta(days=365) +cmds = [c for s in Software.iter(False) + if s.name == 'qp-klp' for c in s.commands] +jobs = [j for c in cmds for j in c.processing_jobs if j.status == 'success' and + j.heartbeat.date() > year] + +# for the SPP we need to find the jobs that were actually run, this means +# looping throught the existing slurm jobs and finding them +max_inter = 2000 + +data = [] +for job in jobs: + jei = int(job.external_id) + rvals = StringIO( + check_output(sacct + [str(jei)]).decode('ascii')) + _d = pd.read_csv(rvals, sep='|') + mem = _d.MaxRSS.apply( + lambda x: x if type(x) is not str else MaxRSS_helper(x)).max() + wt = _d.ElapsedRaw.max() + # the current "easy" way to determine if amplicon or other is to check + # the file extension of the filename + stype = 'other' + if job.parameters.values['sample_sheet']['filename'].endswith('.txt'): + stype = 'amplicon' + rid = job.parameters.values['run_identifier'] + data.append( + {'jid': job.id, 'sjid': jei, 'mem': mem, 'stype': stype, 'wt': wt, + 'type': 'main', 'rid': rid, 'name': _d.JobName[0]}) + + # let's look for the convert job + for jid in range(jei + 1, jei + max_inter): + rvals = StringIO(check_output(sacct + [str(jid)]).decode('ascii')) + _d = pd.read_csv(rvals, sep='|') + if [1 for x in _d.JobName.values if x.startswith(job.id)]: + cjid = int(_d.JobID[0]) + mem = _d.MaxRSS.apply( + lambda x: x if type(x) is not str else MaxRSS_helper(x)).max() + wt = _d.ElapsedRaw.max() + + data.append( + {'jid': job.id, 'sjid': cjid, 'mem': mem, 'stype': stype, + 'wt': wt, 'type': 'convert', 'rid': rid, + 'name': _d.JobName[0]}) + + # now let's look for the next step, if amplicon that's fastqc but + # if other that's qc/nuqc + for jid in range(cjid + 1, cjid + max_inter): + rvals = StringIO( + check_output(sacct + [str(jid)]).decode('ascii')) + _d = pd.read_csv(rvals, sep='|') + if [1 for x in _d.JobName.values if x.startswith(job.id)]: + qc_jid = _d.JobIDRaw.apply( + lambda x: int(x.split('.')[0])).max() + qcmem = _d.MaxRSS.apply( + lambda x: x if type(x) is not str + else MaxRSS_helper(x)).max() + qcwt = _d.ElapsedRaw.max() + + if stype == 'amplicon': + data.append( + {'jid': job.id, 'sjid': qc_jid, 'mem': qcmem, + 'stype': stype, 'wt': qcwt, 'type': 'fastqc', + 'rid': rid, 'name': _d.JobName[0]}) + else: + data.append( + {'jid': job.id, 'sjid': qc_jid, 'mem': qcmem, + 'stype': stype, 'wt': qcwt, 'type': 'qc', + 'rid': rid, 'name': _d.JobName[0]}) + for jid in range(qc_jid + 1, qc_jid + max_inter): + rvals = StringIO(check_output( + sacct + [str(jid)]).decode('ascii')) + _d = pd.read_csv(rvals, sep='|') + if [1 for x in _d.JobName.values if x.startswith( + job.id)]: + fqc_jid = _d.JobIDRaw.apply( + lambda x: int(x.split('.')[0])).max() + fqcmem = _d.MaxRSS.apply( + lambda x: x if type(x) is not str + else MaxRSS_helper(x)).max() + fqcwt = _d.ElapsedRaw.max() + data.append( + {'jid': job.id, 'sjid': fqc_jid, + 'mem': fqcmem, 'stype': stype, + 'wt': fqcwt, 'type': 'fastqc', + 'rid': rid, 'name': _d.JobName[0]}) + break + break + break + +df = pd.DataFrame(data) +df.to_csv(fn, sep='\t', index=False) diff --git a/notebooks/resource-allocation/generate-allocation-summary.py b/notebooks/resource-allocation/generate-allocation-summary.py index e081a5d12..7c8634293 100644 --- a/notebooks/resource-allocation/generate-allocation-summary.py +++ b/notebooks/resource-allocation/generate-allocation-summary.py @@ -5,6 +5,7 @@ from json import loads from os.path import join +from qiita_core.util import MaxRSS_helper from qiita_db.exceptions import QiitaDBUnknownIDError from qiita_db.processing_job import ProcessingJob from qiita_db.software import Software @@ -117,19 +118,8 @@ print('Make sure that only 0/K/M exist', set( df.MaxRSS.apply(lambda x: str(x)[-1]))) - -def _helper(x): - if x[-1] == 'K': - y = float(x[:-1]) * 1000 - elif x[-1] == 'M': - y = float(x[:-1]) * 1000000 - else: - y = float(x) - return y - - # Generating new columns -df['MaxRSSRaw'] = df.MaxRSS.apply(lambda x: _helper(str(x))) +df['MaxRSSRaw'] = df.MaxRSS.apply(lambda x: MaxRSS_helper(str(x))) df['ElapsedRawTime'] = df.ElapsedRaw.apply( lambda x: timedelta(seconds=float(x))) diff --git a/qiita_core/__init__.py b/qiita_core/__init__.py index 349b460d8..7fca55a7c 100644 --- a/qiita_core/__init__.py +++ b/qiita_core/__init__.py @@ -6,4 +6,4 @@ # The full license is in the file LICENSE, distributed with this software. # ----------------------------------------------------------------------------- -__version__ = "2023.12" +__version__ = "2024.02" diff --git a/qiita_core/tests/test_util.py b/qiita_core/tests/test_util.py index a3fef6942..dd33e902c 100644 --- a/qiita_core/tests/test_util.py +++ b/qiita_core/tests/test_util.py @@ -10,7 +10,7 @@ from qiita_core.util import ( qiita_test_checker, execute_as_transaction, get_qiita_version, - is_test_environment, get_release_info) + is_test_environment, get_release_info, MaxRSS_helper) from qiita_db.meta_util import ( generate_biom_and_metadata_release, generate_plugin_releases) import qiita_db as qdb @@ -82,6 +82,20 @@ def test_get_release_info(self): self.assertEqual(biom_metadata_release, ('', '', '')) self.assertNotEqual(archive_release, ('', '', '')) + def test_MaxRSS_helper(self): + tests = [ + ('6', 6.0), + ('6K', 6000), + ('6M', 6000000), + ('6G', 6000000000), + ('6.9', 6.9), + ('6.9K', 6900), + ('6.9M', 6900000), + ('6.9G', 6900000000), + ] + for x, y in tests: + self.assertEqual(MaxRSS_helper(x), y) + if __name__ == '__main__': main() diff --git a/qiita_core/util.py b/qiita_core/util.py index b3f6b4142..9692f210b 100644 --- a/qiita_core/util.py +++ b/qiita_core/util.py @@ -151,3 +151,15 @@ def get_release_info(study_status='public'): archive_release = ((md5sum, filepath, timestamp)) return (biom_metadata_release, archive_release) + + +def MaxRSS_helper(x): + if x[-1] == 'K': + y = float(x[:-1]) * 1000 + elif x[-1] == 'M': + y = float(x[:-1]) * 1000000 + elif x[-1] == 'G': + y = float(x[:-1]) * 1000000000 + else: + y = float(x) + return y diff --git a/qiita_db/__init__.py b/qiita_db/__init__.py index c2b6f34a3..63fbc26fc 100644 --- a/qiita_db/__init__.py +++ b/qiita_db/__init__.py @@ -27,7 +27,7 @@ from . import user from . import processing_job -__version__ = "2023.12" +__version__ = "2024.02" __all__ = ["analysis", "artifact", "archive", "base", "commands", "environment_manager", "exceptions", "investigation", "logger", diff --git a/qiita_db/artifact.py b/qiita_db/artifact.py index 2604ee6ef..e4ac92a34 100644 --- a/qiita_db/artifact.py +++ b/qiita_db/artifact.py @@ -1277,10 +1277,8 @@ def _add_edge(edges, src, dest): qdb.sql_connection.TRN.add(sql, [self.id]) sql_edges = qdb.sql_connection.TRN.execute_fetchindex() - lineage = nx.DiGraph() - edges = set() - nodes = {} - if sql_edges: + # helper function to reduce code duplication + def _helper(sql_edges, edges, nodes): for jid, pid, cid in sql_edges: if jid not in nodes: nodes[jid] = ('job', @@ -1291,9 +1289,29 @@ def _add_edge(edges, src, dest): nodes[cid] = ('artifact', qdb.artifact.Artifact(cid)) edges.add((nodes[pid], nodes[jid])) edges.add((nodes[jid], nodes[cid])) + + lineage = nx.DiGraph() + edges = set() + nodes = dict() + extra_edges = set() + extra_nodes = dict() + if sql_edges: + _helper(sql_edges, edges, nodes) else: nodes[self.id] = ('artifact', self) lineage.add_node(nodes[self.id]) + # if this is an Analysis we need to check if there are extra + # edges/nodes as there is a chance that there are connecions + # between them + if self.analysis is not None: + roots = [a for a in self.analysis.artifacts + if not a.parents and a != self] + for r in roots: + # add the root to the options then their children + extra_nodes[r.id] = ('artifact', r) + qdb.sql_connection.TRN.add(sql, [r.id]) + sql_edges = qdb.sql_connection.TRN.execute_fetchindex() + _helper(sql_edges, extra_edges, extra_nodes) # The code above returns all the jobs that have been successfully # executed. We need to add all the jobs that are in all the other @@ -1329,8 +1347,10 @@ def _add_edge(edges, src, dest): # need to check both the input_artifacts and the # pending properties for in_art in n_obj.input_artifacts: - _add_edge(edges, nodes[in_art.id], - nodes[n_obj.id]) + iid = in_art.id + if iid not in nodes and iid in extra_nodes: + nodes[iid] = extra_nodes[iid] + _add_edge(edges, nodes[iid], nodes[n_obj.id]) pending = n_obj.pending for pred_id in pending: diff --git a/qiita_db/handlers/tests/test_plugin.py b/qiita_db/handlers/tests/test_plugin.py index 0194ea608..f5291c277 100644 --- a/qiita_db/handlers/tests/test_plugin.py +++ b/qiita_db/handlers/tests/test_plugin.py @@ -18,7 +18,7 @@ class UtilTests(TestCase): def test_get_plugin(self): - obs = _get_plugin("QIIME", "1.9.1") + obs = _get_plugin("QIIMEq2", "1.9.1") exp = qdb.software.Software(1) self.assertEqual(obs, exp) @@ -27,7 +27,7 @@ def test_get_plugin(self): _get_plugin("QiIME", "1.9.1") def test_get_command(self): - obs = _get_command('QIIME', '1.9.1', 'Split libraries FASTQ') + obs = _get_command('QIIMEq2', '1.9.1', 'Split libraries FASTQ') exp = qdb.software.Command(1) self.assertEqual(obs, exp) @@ -38,18 +38,18 @@ def test_get_command(self): class PluginHandlerTests(OauthTestingBase): def test_get_plugin_does_not_exist(self): - obs = self.get('/qiita_db/plugins/QIIME/1.9.0/', headers=self.header) + obs = self.get('/qiita_db/plugins/QIIMEq2/1.9.0/', headers=self.header) self.assertEqual(obs.code, 404) def test_get_no_header(self): - obs = self.get('/qiita_db/plugins/QIIME/1.9.0/') + obs = self.get('/qiita_db/plugins/QIIMEq2/1.9.0/') self.assertEqual(obs.code, 400) def test_get(self): - obs = self.get('/qiita_db/plugins/QIIME/1.9.1/', headers=self.header) + obs = self.get('/qiita_db/plugins/QIIMEq2/1.9.1/', headers=self.header) self.assertEqual(obs.code, 200) exp = { - 'name': 'QIIME', + 'name': 'QIIMEq2', 'version': '1.9.1', 'description': 'Quantitative Insights Into Microbial Ecology ' '(QIIME) is an open-source bioinformatics pipeline ' @@ -85,10 +85,10 @@ def test_post(self): 'param2': '2.4', 'param3': 'False'}}) } - obs = self.post('/qiita_db/plugins/QIIME/1.9.1/commands/', data=data, + obs = self.post('/qiita_db/plugins/QIIMEq2/1.9.1/commands/', data=data, headers=self.header) self.assertEqual(obs.code, 200) - obs = _get_command('QIIME', '1.9.1', 'New Command') + obs = _get_command('QIIMEq2', '1.9.1', 'New Command') self.assertEqual(obs.name, 'New Command') self.assertFalse(obs.analysis_only) @@ -106,10 +106,10 @@ def test_post(self): 'default_parameter_sets': dumps({'dflt1': {'param1': 'test'}}), 'analysis_only': True } - obs = self.post('/qiita_db/plugins/QIIME/1.9.1/commands/', data=data, + obs = self.post('/qiita_db/plugins/QIIMEq2/1.9.1/commands/', data=data, headers=self.header) self.assertEqual(obs.code, 200) - obs = _get_command('QIIME', '1.9.1', 'New analysis command') + obs = _get_command('QIIMEq2', '1.9.1', 'New analysis command') self.assertEqual(obs.name, 'New analysis command') self.assertTrue(obs.analysis_only) self.assertEqual(obs.merging_scheme, @@ -125,12 +125,12 @@ def test_get_command_does_not_exist(self): def test_get_no_header(self): obs = self.get( - '/qiita_db/plugins/QIIME/1.9.1/commands/Split%20libraries/') + '/qiita_db/plugins/QIIMEq2/1.9.1/commands/Split%20libraries/') self.assertEqual(obs.code, 400) def test_get(self): obs = self.get( - '/qiita_db/plugins/QIIME/1.9.1/commands/Split%20libraries/', + '/qiita_db/plugins/QIIMEq2/1.9.1/commands/Split%20libraries/', headers=self.header) self.assertEqual(obs.code, 200) exp = {'name': 'Split libraries', @@ -195,20 +195,20 @@ def test_get(self): class CommandActivateHandlerTests(OauthTestingBase): def test_post_command_does_not_exist(self): - obs = self.post('/qiita_db/plugins/QIIME/1.9.1/commands/' + obs = self.post('/qiita_db/plugins/QIIMEq2/1.9.1/commands/' 'UNKNOWN/activate/', headers=self.header, data={}) self.assertEqual(obs.code, 404) def test_post_no_header(self): - obs = self.post('/qiita_db/plugins/QIIME/1.9.1/commands/' + obs = self.post('/qiita_db/plugins/QIIMEq2/1.9.1/commands/' 'Split%20libraries/activate/', data={}) self.assertEqual(obs.code, 400) def test_post(self): qdb.software.Software.deactivate_all() self.assertFalse(qdb.software.Command(2).active) - obs = self.post('/qiita_db/plugins/QIIME/1.9.1/commands/' + obs = self.post('/qiita_db/plugins/QIIMEq2/1.9.1/commands/' 'Split%20libraries/activate/', headers=self.header, data={}) self.assertEqual(obs.code, 200) diff --git a/qiita_db/handlers/tests/test_processing_job.py b/qiita_db/handlers/tests/test_processing_job.py index 09e4d39ce..5ef82669a 100644 --- a/qiita_db/handlers/tests/test_processing_job.py +++ b/qiita_db/handlers/tests/test_processing_job.py @@ -277,7 +277,8 @@ class ProcessingJobAPItestHandlerTests(OauthTestingBase): def test_post_processing_job(self): data = { 'user': 'demo@microbio.me', - 'command': dumps(['QIIME', '1.9.1', 'Pick closed-reference OTUs']), + 'command': dumps(['QIIMEq2', '1.9.1', + 'Pick closed-reference OTUs']), 'parameters': dumps({"reference": 1, "sortmerna_e_value": 1, "sortmerna_max_pos": 10000, @@ -298,7 +299,8 @@ def test_post_processing_job(self): def test_post_processing_job_status(self): data = { 'user': 'demo@microbio.me', - 'command': dumps(['QIIME', '1.9.1', 'Pick closed-reference OTUs']), + 'command': dumps(['QIIMEq2', '1.9.1', + 'Pick closed-reference OTUs']), 'status': 'running', 'parameters': dumps({"reference": 1, "sortmerna_e_value": 1, diff --git a/qiita_db/metadata_template/prep_template.py b/qiita_db/metadata_template/prep_template.py index f39aaacb7..4a4cc98fa 100644 --- a/qiita_db/metadata_template/prep_template.py +++ b/qiita_db/metadata_template/prep_template.py @@ -793,16 +793,26 @@ def _get_node_info(workflow, node): def _get_predecessors(workflow, node): # recursive method to get predecessors of a given node pred = [] - for pnode in workflow.graph.predecessors(node): + + parents = list(workflow.graph.predecessors(node)) + for pnode in parents: pred = _get_predecessors(workflow, pnode) cxns = {x[0]: x[2] for x in workflow.graph.get_edge_data( pnode, node)['connections'].connections} data = [pnode, node, cxns] if pred is None: - pred = [data] - else: - pred.append(data) + pred = [] + + # making sure that if the node has extra parents they are + # generated first + parents.remove(pnode) + if parents: + for pnode in parents: + # [-1] just adding the parent and not its ancestors + pred.extend([_get_predecessors(workflow, pnode)[-1]]) + + pred.append(data) return pred # Note: we are going to use the final BIOMs to figure out which @@ -864,7 +874,8 @@ def _get_predecessors(workflow, node): if wk_params['sample']: df = ST(self.study_id).to_dataframe(samples=list(self)) for k, v in wk_params['sample'].items(): - if k not in df.columns or v not in df[k].unique(): + if k not in df.columns or (v != '*' and v not in + df[k].unique()): reqs_satisfied = False else: total_conditions_satisfied += 1 @@ -872,7 +883,8 @@ def _get_predecessors(workflow, node): if wk_params['prep']: df = self.to_dataframe() for k, v in wk_params['prep'].items(): - if k not in df.columns or v not in df[k].unique(): + if k not in df.columns or (v != '*' and v not in + df[k].unique()): reqs_satisfied = False else: total_conditions_satisfied += 1 @@ -890,117 +902,139 @@ def _get_predecessors(workflow, node): # let's just keep one, let's give it preference to the one with the # most total_conditions_satisfied - workflows = sorted(workflows, key=lambda x: x[0], reverse=True)[:1] + _, wk = sorted(workflows, key=lambda x: x[0], reverse=True)[0] + GH = wk.graph missing_artifacts = dict() - for _, wk in workflows: - missing_artifacts[wk] = dict() - for node, degree in wk.graph.out_degree(): - if degree != 0: - continue - mscheme = _get_node_info(wk, node) - if mscheme not in merging_schemes: - missing_artifacts[wk][mscheme] = node - if not missing_artifacts[wk]: - del missing_artifacts[wk] + for node, degree in GH.out_degree(): + if degree != 0: + continue + mscheme = _get_node_info(wk, node) + if mscheme not in merging_schemes: + missing_artifacts[mscheme] = node if not missing_artifacts: # raises option b. raise ValueError('This preparation is complete') # 3. - for wk, wk_data in missing_artifacts.items(): - previous_jobs = dict() - for ma, node in wk_data.items(): - predecessors = _get_predecessors(wk, node) - predecessors.reverse() - cmds_to_create = [] - init_artifacts = None - for i, (pnode, cnode, cxns) in enumerate(predecessors): - cdp = cnode.default_parameter - cdp_cmd = cdp.command - params = cdp.values.copy() - - icxns = {y: x for x, y in cxns.items()} - reqp = {x: icxns[y[1][0]] - for x, y in cdp_cmd.required_parameters.items()} - cmds_to_create.append([cdp_cmd, params, reqp]) - - info = _get_node_info(wk, pnode) - if info in merging_schemes: - if set(merging_schemes[info]) >= set(cxns): - init_artifacts = merging_schemes[info] - break - if init_artifacts is None: - pdp = pnode.default_parameter - pdp_cmd = pdp.command - params = pdp.values.copy() - # verifying that the workflow.artifact_type is included - # in the command input types or raise an error - wkartifact_type = wk.artifact_type - reqp = dict() - for x, y in pdp_cmd.required_parameters.items(): - if wkartifact_type not in y[1]: - raise ValueError(f'{wkartifact_type} is not part ' - 'of this preparation and cannot ' - 'be applied') - reqp[x] = wkartifact_type - - cmds_to_create.append([pdp_cmd, params, reqp]) - - if starting_job is not None: - init_artifacts = { - wkartifact_type: f'{starting_job.id}:'} - else: - init_artifacts = {wkartifact_type: self.artifact.id} - - cmds_to_create.reverse() - current_job = None - loop_starting_job = starting_job - for i, (cmd, params, rp) in enumerate(cmds_to_create): - if loop_starting_job is not None: - previous_job = loop_starting_job - loop_starting_job = None - else: - previous_job = current_job - if previous_job is None: - req_params = dict() - for iname, dname in rp.items(): - if dname not in init_artifacts: - msg = (f'Missing Artifact type: "{dname}" in ' - 'this preparation; this might be due ' - 'to missing steps or not having the ' - 'correct raw data.') - # raises option c. + previous_jobs = dict() + for ma, node in missing_artifacts.items(): + predecessors = _get_predecessors(wk, node) + predecessors.reverse() + cmds_to_create = [] + init_artifacts = None + for i, (pnode, cnode, cxns) in enumerate(predecessors): + cdp = cnode.default_parameter + cdp_cmd = cdp.command + params = cdp.values.copy() + + icxns = {y: x for x, y in cxns.items()} + reqp = {x: icxns[y[1][0]] + for x, y in cdp_cmd.required_parameters.items()} + cmds_to_create.append([cdp, cdp_cmd, params, reqp]) + + info = _get_node_info(wk, pnode) + if info in merging_schemes: + if set(merging_schemes[info]) >= set(cxns): + init_artifacts = merging_schemes[info] + break + if init_artifacts is None: + pdp = pnode.default_parameter + pdp_cmd = pdp.command + params = pdp.values.copy() + # verifying that the workflow.artifact_type is included + # in the command input types or raise an error + wkartifact_type = wk.artifact_type + reqp = dict() + for x, y in pdp_cmd.required_parameters.items(): + if wkartifact_type not in y[1]: + raise ValueError(f'{wkartifact_type} is not part ' + 'of this preparation and cannot ' + 'be applied') + reqp[x] = wkartifact_type + + cmds_to_create.append([pdp, pdp_cmd, params, reqp]) + + if starting_job is not None: + init_artifacts = { + wkartifact_type: f'{starting_job.id}:'} + else: + init_artifacts = {wkartifact_type: self.artifact.id} + + cmds_to_create.reverse() + current_job = None + loop_starting_job = starting_job + previous_dps = dict() + for i, (dp, cmd, params, rp) in enumerate(cmds_to_create): + if loop_starting_job is not None: + previous_job = loop_starting_job + loop_starting_job = None + else: + previous_job = current_job + + req_params = dict() + if previous_job is None: + for iname, dname in rp.items(): + if dname not in init_artifacts: + msg = (f'Missing Artifact type: "{dname}" in ' + 'this preparation; this might be due ' + 'to missing steps or not having the ' + 'correct raw data.') + # raises option c. + raise ValueError(msg) + req_params[iname] = init_artifacts[dname] + if len(dp.command.required_parameters) > 1: + for pn in GH.predecessors(node): + info = _get_node_info(wk, pn) + n, cnx, _ = GH.get_edge_data( + pn, node)['connections'].connections[0] + if info not in merging_schemes or \ + n not in merging_schemes[info]: + msg = ('This workflow contains a step with ' + 'multiple inputs so it cannot be ' + 'completed automatically, please add ' + 'the commands by hand.') raise ValueError(msg) - req_params[iname] = init_artifacts[dname] - else: - req_params = dict() - connections = dict() + req_params[cnx] = merging_schemes[info][n] + else: + if len(dp.command.required_parameters) == 1: + cxns = dict() for iname, dname in rp.items(): req_params[iname] = f'{previous_job.id}{dname}' - connections[dname] = iname - params.update(req_params) - job_params = qdb.software.Parameters.load( - cmd, values_dict=params) - - if params in previous_jobs.values(): - for x, y in previous_jobs.items(): - if params == y: - current_job = x + cxns[dname] = iname + connections = {previous_job: cxns} + else: + connections = dict() + for pn in GH.predecessors(node): + pndp = pn.default_parameter + n, cnx, _ = GH.get_edge_data( + pn, node)['connections'].connections[0] + _job = previous_dps[pndp.id] + req_params[cnx] = f'{_job.id}{n}' + connections[_job] = {n: cnx} + params.update(req_params) + job_params = qdb.software.Parameters.load( + cmd, values_dict=params) + + if params in previous_jobs.values(): + for x, y in previous_jobs.items(): + if params == y: + current_job = x + else: + if workflow is None: + PW = qdb.processing_job.ProcessingWorkflow + workflow = PW.from_scratch(user, job_params) + current_job = [ + j for j in workflow.graph.nodes()][0] else: - if workflow is None: - PW = qdb.processing_job.ProcessingWorkflow - workflow = PW.from_scratch(user, job_params) - current_job = [ - j for j in workflow.graph.nodes()][0] + if previous_job is None: + current_job = workflow.add( + job_params, req_params=req_params) else: - if previous_job is None: - current_job = workflow.add( - job_params, req_params=req_params) - else: - current_job = workflow.add( - job_params, req_params=req_params, - connections={previous_job: connections}) - previous_jobs[current_job] = params + current_job = workflow.add( + job_params, req_params=req_params, + connections=connections) + previous_jobs[current_job] = params + previous_dps[dp.id] = current_job return workflow diff --git a/qiita_db/metadata_template/test/test_prep_template.py b/qiita_db/metadata_template/test/test_prep_template.py index 41ec15077..81769082d 100644 --- a/qiita_db/metadata_template/test/test_prep_template.py +++ b/qiita_db/metadata_template/test/test_prep_template.py @@ -1477,6 +1477,46 @@ def test_artifact_setter(self): "the parameters are the same as jobs"): pt.add_default_workflow(qdb.user.User('test@foo.bar')) + # Then, let's clean up again and add a new command/step with 2 + # BIOM input artifacts + for pj in wk.graph.nodes: + pj._set_error('Killed') + cmd = qdb.software.Command.create( + qdb.software.Software(1), "Multiple BIOM as inputs", "", { + 'req_artifact_1': ['artifact:["BIOM"]', None], + 'req_artifact_2': ['artifact:["BIOM"]', None], + }, outputs={'MB-output': 'BIOM'}) + cmd_dp = qdb.software.DefaultParameters.create("", cmd) + # creating the new node for the cmd and linking it's two inputs with + # two inputs + sql = f""" + INSERT INTO qiita.default_workflow_node ( + default_workflow_id, default_parameter_set_id) + VALUES (1, {cmd_dp.id}); + INSERT INTO qiita.default_workflow_edge ( + parent_id, child_id) + VALUES (8, 10); + INSERT INTO qiita.default_workflow_edge ( + parent_id, child_id) + VALUES (9, 10); + INSERT INTO qiita.default_workflow_edge_connections ( + default_workflow_edge_id, parent_output_id, child_input_id) + VALUES (6, 3, 99); + INSERT INTO qiita.default_workflow_edge_connections ( + default_workflow_edge_id, parent_output_id, child_input_id) + VALUES (7, 3, 100) + """ + qdb.sql_connection.perform_as_transaction(sql) + wk = pt.add_default_workflow(qdb.user.User('test@foo.bar')) + self.assertEqual(len(wk.graph.nodes), 6) + self.assertEqual(len(wk.graph.edges), 5) + self.assertCountEqual( + [x.command.name for x in wk.graph.nodes], + # we should have 2 split libraries and 3 close reference + ['Split libraries FASTQ', 'Split libraries FASTQ', + 'Pick closed-reference OTUs', 'Pick closed-reference OTUs', + 'Pick closed-reference OTUs', 'Multiple BIOM as inputs']) + # now let's test that an error is raised when there is no valid initial # input data; this moves the data type from FASTQ to taxa_summary for # the default_workflow_id = 1 diff --git a/qiita_db/processing_job.py b/qiita_db/processing_job.py index 9c9aa1186..a1f7e5baa 100644 --- a/qiita_db/processing_job.py +++ b/qiita_db/processing_job.py @@ -1020,7 +1020,7 @@ def submit(self, parent_job_id=None, dependent_jobs_list=None): # names to know if it should be executed differently and the # plugin should let Qiita know that a specific command should be ran # as job array or not - cnames_to_skip = {'Calculate Cell Counts'} + cnames_to_skip = {'Calculate Cell Counts', 'Calculate RNA Copy Counts'} if 'ENVIRONMENT' in plugin_env_script and cname not in cnames_to_skip: # the job has to be in running state so the plugin can change its` # status @@ -2392,6 +2392,20 @@ def add(self, dflt_params, connections=None, req_params=None, with qdb.sql_connection.TRN: self._raise_if_not_in_construction() + # checking that the new number of artifacts is not above + # max_artifacts_in_workflow + current_artifacts = sum( + [len(j.command.outputs) for j in self.graph.nodes()]) + to_add_artifacts = len(dflt_params.command.outputs) + total_artifacts = current_artifacts + to_add_artifacts + max_artifacts = qdb.util.max_artifacts_in_workflow() + if total_artifacts > max_artifacts: + raise ValueError( + "Cannot add new job because it will create more " + f"artifacts (current: {current_artifacts} + new: " + f"{to_add_artifacts} = {total_artifacts}) that what is " + f"allowed in a single workflow ({max_artifacts})") + if connections: # The new Job depends on previous jobs in the workflow req_params = req_params if req_params else {} diff --git a/qiita_db/study.py b/qiita_db/study.py index 367b846a1..3c4989f3b 100644 --- a/qiita_db/study.py +++ b/qiita_db/study.py @@ -1175,17 +1175,37 @@ def has_access(self, user, no_public=False): Whether user has access to study or not """ with qdb.sql_connection.TRN: - # if admin or superuser, just return true + # return True if the user is one of the admins if user.level in {'superuser', 'admin'}: return True - if no_public: - study_set = user.user_studies | user.shared_studies - else: - study_set = user.user_studies | user.shared_studies | \ - self.get_by_status('public') + # if no_public is False then just check if the study is public + # and return True + if not no_public and self.status == 'public': + return True + + # let's check if the study belongs to this user or has been + # shared with them + sql = """SELECT EXISTS ( + SELECT study_id + FROM qiita.study + JOIN qiita.study_portal USING (study_id) + JOIN qiita.portal_type USING (portal_type_id) + WHERE email = %s AND portal = %s AND study_id = %s + UNION + SELECT study_id + FROM qiita.study_users + JOIN qiita.study_portal USING (study_id) + JOIN qiita.portal_type USING (portal_type_id) + WHERE email = %s AND portal = %s AND study_id = %s + ) + """ + qdb.sql_connection.TRN.add( + sql, [user.email, qiita_config.portal, self.id, + user.email, qiita_config.portal, self.id]) + result = qdb.sql_connection.TRN.execute_fetchlast() - return self in study_set + return result def can_edit(self, user): """Returns whether the given user can edit the study diff --git a/qiita_db/support_files/patches/90.sql b/qiita_db/support_files/patches/90.sql new file mode 100644 index 000000000..a0b5d58c9 --- /dev/null +++ b/qiita_db/support_files/patches/90.sql @@ -0,0 +1,6 @@ +-- Jan 9, 2024 +-- add control of max artifacts in analysis to the settings +-- using 35 as default considering that a core div creates ~17 so allowing +-- for 2 of those + 1 +ALTER TABLE settings + ADD COLUMN IF NOT EXISTS max_artifacts_in_workflow INT DEFAULT 35; diff --git a/qiita_db/support_files/patches/91.sql b/qiita_db/support_files/patches/91.sql new file mode 100644 index 000000000..eb05c079c --- /dev/null +++ b/qiita_db/support_files/patches/91.sql @@ -0,0 +1,15 @@ +-- Feb 19, 2024 +-- update qp-target-gene command name to "QIIMEq2" to be in sync with plugin repo +-- When setting up a new instance of Qiita, we end up using qiita-env make +-- which creates entries in the postgress database, also for qiita.software. +-- One of these entries belongs to the qp-target-gene plugin with name "QIIME" +-- and version "1.9.1". However, with +-- qp_target_gene/support_files/patches/171029_QIIME_v191_to_QIIMEq2_v191.sql +-- the plugin was renamed into QIIMEq2, but this change was not relected in +-- the qiita.software table. Thus, updating plugin information finds a mismatch +-- between old (QIIME) and new (QIIMEq2) names and therefor creates a new +-- command. However, the also provided default workflows hold command_ids to +-- the old version and subsequently the commands for artifact processing +-- will result in an empty list, even though the plugin is available. +-- Therefore, this patch updates the name of QIIME. +UPDATE qiita.software SET name = 'QIIMEq2' WHERE name = 'QIIME'; diff --git a/qiita_db/test/test_artifact.py b/qiita_db/test/test_artifact.py index b9ba78149..dd44739b4 100644 --- a/qiita_db/test/test_artifact.py +++ b/qiita_db/test/test_artifact.py @@ -469,10 +469,10 @@ def test_merging_scheme(self): ('Split libraries FASTQ | N/A', 'N/A')) self.assertEqual(qdb.artifact.Artifact(4).merging_scheme, ('Pick closed-reference OTUs | Split libraries FASTQ', - 'QIIME v1.9.1')) + 'QIIMEq2 v1.9.1')) self.assertEqual(qdb.artifact.Artifact(5).merging_scheme, ('Pick closed-reference OTUs | Split libraries FASTQ', - 'QIIME v1.9.1')) + 'QIIMEq2 v1.9.1')) def test_jobs(self): # Returning all jobs @@ -1406,6 +1406,42 @@ def test_has_human(self): self.assertTrue(artifact.has_human) + def test_descendants_with_jobs(self): + # let's tests that we can connect two artifacts with different root + # in the same analysis + # 1. make sure there are 3 nodes + a = qdb.artifact.Artifact(8) + self.assertEqual(len(a.descendants_with_jobs.nodes), 3) + self.assertEqual(len(a.analysis.artifacts), 2) + # 2. add a new root and make sure we see it + c = qdb.artifact.Artifact.create( + self.filepaths_root, "BIOM", analysis=a.analysis, + data_type="16S") + self.assertEqual(len(a.analysis.artifacts), 3) + # 3. add jobs conencting the new artifact to the other root + # a -> job -> b + # c + # job1 connects b & c + # job2 connects a & c + cmd = qdb.software.Command.create( + qdb.software.Software(1), + "CommandWithMultipleInputs", "", { + 'input_b': ['artifact:["BIOM"]', None], + 'input_c': ['artifact:["BIOM"]', None]}, {'out': 'BIOM'}) + params = qdb.software.Parameters.load( + cmd, values_dict={'input_b': a.children[0].id, 'input_c': c.id}) + job1 = qdb.processing_job.ProcessingJob.create( + qdb.user.User('test@foo.bar'), params) + params = qdb.software.Parameters.load( + cmd, values_dict={'input_b': a.id, 'input_c': c.id}) + job2 = qdb.processing_job.ProcessingJob.create( + qdb.user.User('test@foo.bar'), params) + + jobs = [j[1] for e in a.descendants_with_jobs.edges + for j in e if j[0] == 'job'] + self.assertIn(job1, jobs) + self.assertIn(job2, jobs) + @qiita_test_checker() class ArtifactArchiveTests(TestCase): diff --git a/qiita_db/test/test_meta_util.py b/qiita_db/test/test_meta_util.py index 93e96f10b..1f8a30a23 100644 --- a/qiita_db/test/test_meta_util.py +++ b/qiita_db/test/test_meta_util.py @@ -381,15 +381,15 @@ def test_generate_biom_and_metadata_release(self): 'processed_data/1_study_1001_closed_reference_otu_table.biom\t' '%s\t%s\t4\tIllumina\t16S rRNA\t' 'Pick closed-reference OTUs | Split libraries FASTQ\t' - 'QIIME v1.9.1\tQIIME v1.9.1\n' % (fn_sample, fn_prep), + 'QIIMEq2 v1.9.1\tQIIMEq2 v1.9.1\n' % (fn_sample, fn_prep), 'processed_data/1_study_1001_closed_reference_otu_table.biom\t' '%s\t%s\t5\tIllumina\t16S rRNA\t' 'Pick closed-reference OTUs | Split libraries FASTQ\t' - 'QIIME v1.9.1\tQIIME v1.9.1\n' % (fn_sample, fn_prep), + 'QIIMEq2 v1.9.1\tQIIMEq2 v1.9.1\n' % (fn_sample, fn_prep), 'processed_data/1_study_1001_closed_reference_otu_table_Silva.bio' 'm\t%s\t%s\t6\tIllumina\t16S rRNA\t' 'Pick closed-reference OTUs | Split libraries FASTQ\t' - 'QIIME v1.9.1\tQIIME v1.9.1' % (fn_sample, fn_prep)] + 'QIIMEq2 v1.9.1\tQIIMEq2 v1.9.1' % (fn_sample, fn_prep)] self.assertEqual(txt_obs, txt_exp) # whatever the configuration was, we will change to settings so we can @@ -465,15 +465,15 @@ def test_generate_biom_and_metadata_release(self): 'processed_data/1_study_1001_closed_reference_otu_table.biom\t' '%s\t%s\t4\tIllumina\t16S rRNA\t' 'Pick closed-reference OTUs | Split libraries FASTQ\t' - 'QIIME v1.9.1\tQIIME v1.9.1\n' % (fn_sample, fn_prep), + 'QIIMEq2 v1.9.1\tQIIMEq2 v1.9.1\n' % (fn_sample, fn_prep), 'processed_data/1_study_1001_closed_reference_otu_table.biom\t' '%s\t%s\t5\tIllumina\t16S rRNA\t' 'Pick closed-reference OTUs | Split libraries FASTQ\t' - 'QIIME v1.9.1\tQIIME v1.9.1\n' % (fn_sample, fn_prep), + 'QIIMEq2 v1.9.1\tQIIMEq2 v1.9.1\n' % (fn_sample, fn_prep), 'processed_data/1_study_1001_closed_reference_otu_table_Silva.bio' 'm\t%s\t%s\t6\tIllumina\t16S rRNA\t' 'Pick closed-reference OTUs | Split libraries FASTQ' - '\tQIIME v1.9.1\tQIIME v1.9.1' % (fn_sample, fn_prep)] + '\tQIIMEq2 v1.9.1\tQIIMEq2 v1.9.1' % (fn_sample, fn_prep)] self.assertEqual(txt_obs, txt_exp) # returning configuration diff --git a/qiita_db/test/test_processing_job.py b/qiita_db/test/test_processing_job.py index b3b8be2c6..4940e039c 100644 --- a/qiita_db/test/test_processing_job.py +++ b/qiita_db/test/test_processing_job.py @@ -1205,6 +1205,18 @@ def test_add_error(self): qdb.exceptions.QiitaDBOperationNotPermittedError): qdb.processing_job.ProcessingWorkflow(1).add({}, None) + # test that the qdb.util.max_artifacts_in_workflow + with qdb.sql_connection.TRN: + qdb.sql_connection.perform_as_transaction( + "UPDATE settings set max_artifacts_in_workflow = 1") + with self.assertRaisesRegex( + ValueError, "Cannot add new job because it will create " + "more artifacts "): + qdb.processing_job.ProcessingWorkflow(2).add( + qdb.software.DefaultParameters(1), + req_params={'input_data': 1}, force=True) + qdb.sql_connection.TRN.rollback() + def test_remove(self): exp_command = qdb.software.Command(1) json_str = ( diff --git a/qiita_db/test/test_software.py b/qiita_db/test/test_software.py index a9e3dc5c3..0b5094f27 100644 --- a/qiita_db/test/test_software.py +++ b/qiita_db/test/test_software.py @@ -342,8 +342,8 @@ def test_create_error(self): self.outputs) # the output type doesn't exist - with self.assertRaisesRegex(ValueError, "Error creating QIIME, Split " - "libraries - wrong output, This is a " + with self.assertRaisesRegex(ValueError, "Error creating QIIMEq2, Split" + " libraries - wrong output, This is a " "command for testing - Unknown " "artifact_type: BLA!"): qdb.software.Command.create( @@ -567,7 +567,7 @@ def tearDown(self): remove(f) def test_from_name_and_version(self): - obs = qdb.software.Software.from_name_and_version('QIIME', '1.9.1') + obs = qdb.software.Software.from_name_and_version('QIIMEq2', '1.9.1') exp = qdb.software.Software(1) self.assertEqual(obs, exp) @@ -578,13 +578,13 @@ def test_from_name_and_version(self): # Wrong name with self.assertRaises(qdb.exceptions.QiitaDBUnknownIDError): - qdb.software.Software.from_name_and_version('QiIME', '1.9.1') + qdb.software.Software.from_name_and_version('QiIMEq2', '1.9.1') # Wrong version with self.assertRaises(qdb.exceptions.QiitaDBUnknownIDError): - qdb.software.Software.from_name_and_version('QIIME', '1.9.0') + qdb.software.Software.from_name_and_version('QIIMEq2', '1.9.0') def test_name(self): - self.assertEqual(qdb.software.Software(1).name, "QIIME") + self.assertEqual(qdb.software.Software(1).name, "QIIMEq2") def test_version(self): self.assertEqual(qdb.software.Software(1).version, "1.9.1") @@ -694,7 +694,7 @@ def test_from_file(self): self._clean_up_files.append(fp) with open(fp, 'w') as f: f.write(CONF_TEMPLATE % - ('QIIME', '1.9.1', + ('QIIMEq2', '1.9.1', 'Quantitative Insights Into Microbial Ecology (QIIME) ' 'is an open-source bioinformatics pipeline for ' 'performing microbiome analysis from raw DNA ' @@ -711,7 +711,7 @@ def test_from_file(self): self._clean_up_files.append(fp) with open(fp, 'w') as f: f.write(CONF_TEMPLATE % - ('QIIME', '1.9.1', 'Different description', + ('QIIMEq2', '1.9.1', 'Different description', 'source activate qiime', 'start_qiime', 'artifact transformation', '[["10.1038/nmeth.f.303", "20383131"]]', client_id, @@ -719,10 +719,10 @@ def test_from_file(self): with warnings.catch_warnings(record=True) as warns: obs = qdb.software.Software.from_file(fp) obs_warns = [str(w.message) for w in warns] - exp_warns = ['Plugin "QIIME" version "1.9.1" config file does not ' - 'match with stored information. Check the config file' - ' or run "qiita plugin update" to update the plugin ' - 'information. Offending values: description, ' + exp_warns = ['Plugin "QIIMEq2" version "1.9.1" config file does ' + 'not match with stored information. Check the config ' + 'file or run "qiita plugin update" to update the ' + 'plugin information. Offending values: description, ' 'environment_script, start_script'] self.assertCountEqual(obs_warns, exp_warns) @@ -829,7 +829,7 @@ def test_from_file(self): self.assertEqual(obs.client_secret, 'client_secret') def test_exists(self): - self.assertTrue(qdb.software.Software.exists("QIIME", "1.9.1")) + self.assertTrue(qdb.software.Software.exists("QIIMEq2", "1.9.1")) self.assertFalse(qdb.software.Software.exists("NewPlugin", "1.9.1")) self.assertFalse(qdb.software.Software.exists("QIIME", "2.0.0")) diff --git a/qiita_db/test/test_util.py b/qiita_db/test/test_util.py index 112cb3e6b..33766f3e4 100644 --- a/qiita_db/test/test_util.py +++ b/qiita_db/test/test_util.py @@ -45,6 +45,11 @@ def test_max_preparation_samples(self): obs = qdb.util.max_preparation_samples() self.assertEqual(obs, 800) + def test_max_artifacts_in_workflow(self): + """Test that we get the correct max_artifacts_in_workflow""" + obs = qdb.util.max_artifacts_in_workflow() + self.assertEqual(obs, 35) + def test_filepath_id_to_object_id(self): # filepaths 1, 2 belongs to artifact 1 self.assertEqual(qdb.util.filepath_id_to_object_id(1), 1) diff --git a/qiita_db/util.py b/qiita_db/util.py index df7153bb4..76d27e90b 100644 --- a/qiita_db/util.py +++ b/qiita_db/util.py @@ -416,6 +416,20 @@ def max_preparation_samples(): return qdb.sql_connection.TRN.execute_fetchlast() +def max_artifacts_in_workflow(): + r"""Returns the max number of artifacts allowed in a single workflow + + Returns + ------- + int + The max number of artifacts allowed in a single workflow + """ + with qdb.sql_connection.TRN: + qdb.sql_connection.TRN.add( + "SELECT max_artifacts_in_workflow FROM settings") + return qdb.sql_connection.TRN.execute_fetchlast() + + def compute_checksum(path): r"""Returns the checksum of the file pointed by path diff --git a/qiita_pet/__init__.py b/qiita_pet/__init__.py index 349b460d8..7fca55a7c 100644 --- a/qiita_pet/__init__.py +++ b/qiita_pet/__init__.py @@ -6,4 +6,4 @@ # The full license is in the file LICENSE, distributed with this software. # ----------------------------------------------------------------------------- -__version__ = "2023.12" +__version__ = "2024.02" diff --git a/qiita_pet/handlers/analysis_handlers/base_handlers.py b/qiita_pet/handlers/analysis_handlers/base_handlers.py index bc3de16d0..bd9c208e2 100644 --- a/qiita_pet/handlers/analysis_handlers/base_handlers.py +++ b/qiita_pet/handlers/analysis_handlers/base_handlers.py @@ -196,7 +196,9 @@ def analyisis_graph_handler_get_request(analysis_id, user): # This should never happen, but worth having a useful message raise ValueError('More than one workflow in a single analysis') - return {'edges': edges, 'nodes': nodes, 'workflow': wf_id, + # the list(set()) is to remove any duplicated nodes + return {'edges': list(set(edges)), 'nodes': list(set(nodes)), + 'workflow': wf_id, 'artifacts_being_deleted': artifacts_being_deleted} diff --git a/qiita_pet/handlers/analysis_handlers/tests/test_base_handlers.py b/qiita_pet/handlers/analysis_handlers/tests/test_base_handlers.py index 735c6db1d..dec54fb9c 100644 --- a/qiita_pet/handlers/analysis_handlers/tests/test_base_handlers.py +++ b/qiita_pet/handlers/analysis_handlers/tests/test_base_handlers.py @@ -43,17 +43,17 @@ def test_analysis_description_handler_get_request(self): 'artifacts': { 4: (1, 'Identification of the Microbiomes for Cannabis ' 'Soils', ('Pick closed-reference OTUs | Split ' - 'libraries FASTQ', 'QIIME v1.9.1'), [ + 'libraries FASTQ', 'QIIMEq2 v1.9.1'), [ '1.SKB7.640196', '1.SKB8.640193', '1.SKD8.640184', '1.SKM4.640180', '1.SKM9.640192'], {'1'}), 5: (1, 'Identification of the Microbiomes for Cannabis ' 'Soils', ('Pick closed-reference OTUs | Split ' - 'libraries FASTQ', 'QIIME v1.9.1'), [ + 'libraries FASTQ', 'QIIMEq2 v1.9.1'), [ '1.SKB7.640196', '1.SKB8.640193', '1.SKD8.640184', '1.SKM4.640180', '1.SKM9.640192'], {'1'}), 6: (1, 'Identification of the Microbiomes for Cannabis ' 'Soils', ('Pick closed-reference OTUs | Split ' - 'libraries FASTQ', 'QIIME v1.9.1'), [ + 'libraries FASTQ', 'QIIMEq2 v1.9.1'), [ '1.SKB7.640196', '1.SKB8.640193', '1.SKD8.640184', '1.SKM4.640180', '1.SKM9.640192'], {'1'})}, 'analysis_reservation': '', @@ -72,17 +72,17 @@ def test_analysis_description_handler_get_request(self): 'artifacts': { 4: (1, 'Identification of the Microbiomes for Cannabis ' 'Soils', ('Pick closed-reference OTUs | Split ' - 'libraries FASTQ', 'QIIME v1.9.1'), [ + 'libraries FASTQ', 'QIIMEq2 v1.9.1'), [ '1.SKB7.640196', '1.SKB8.640193', '1.SKD8.640184', '1.SKM4.640180', '1.SKM9.640192'], {'1'}), 5: (1, 'Identification of the Microbiomes for Cannabis ' 'Soils', ('Pick closed-reference OTUs | Split ' - 'libraries FASTQ', 'QIIME v1.9.1'), [ + 'libraries FASTQ', 'QIIMEq2 v1.9.1'), [ '1.SKB7.640196', '1.SKB8.640193', '1.SKD8.640184', '1.SKM4.640180', '1.SKM9.640192'], {'1'}), 6: (1, 'Identification of the Microbiomes for Cannabis ' 'Soils', ('Pick closed-reference OTUs | Split ' - 'libraries FASTQ', 'QIIME v1.9.1'), [ + 'libraries FASTQ', 'QIIMEq2 v1.9.1'), [ '1.SKB7.640196', '1.SKB8.640193', '1.SKD8.640184', '1.SKM4.640180', '1.SKM9.640192'], {'1'})}, 'alert_msg': 'An artifact is being deleted from this analysis', @@ -103,17 +103,17 @@ def test_analysis_description_handler_get_request(self): 'artifacts': { 4: (1, 'Identification of the Microbiomes for Cannabis ' 'Soils', ('Pick closed-reference OTUs | Split ' - 'libraries FASTQ', 'QIIME v1.9.1'), [ + 'libraries FASTQ', 'QIIMEq2 v1.9.1'), [ '1.SKB7.640196', '1.SKB8.640193', '1.SKD8.640184', '1.SKM4.640180', '1.SKM9.640192'], {'1'}), 5: (1, 'Identification of the Microbiomes for Cannabis ' 'Soils', ('Pick closed-reference OTUs | Split ' - 'libraries FASTQ', 'QIIME v1.9.1'), [ + 'libraries FASTQ', 'QIIMEq2 v1.9.1'), [ '1.SKB7.640196', '1.SKB8.640193', '1.SKD8.640184', '1.SKM4.640180', '1.SKM9.640192'], {'1'}), 6: (1, 'Identification of the Microbiomes for Cannabis ' 'Soils', ('Pick closed-reference OTUs | Split ' - 'libraries FASTQ', 'QIIME v1.9.1'), [ + 'libraries FASTQ', 'QIIMEq2 v1.9.1'), [ '1.SKB7.640196', '1.SKB8.640193', '1.SKD8.640184', '1.SKM4.640180', '1.SKM9.640192'], {'1'})}, 'alert_msg': 'Error deleting artifact', diff --git a/qiita_pet/handlers/api_proxy/__init__.py b/qiita_pet/handlers/api_proxy/__init__.py index 85dace443..1f747aba1 100644 --- a/qiita_pet/handlers/api_proxy/__init__.py +++ b/qiita_pet/handlers/api_proxy/__init__.py @@ -38,7 +38,7 @@ from .user import (user_jobs_get_req) from .util import check_access, check_fp -__version__ = "2023.12" +__version__ = "2024.02" __all__ = ['prep_template_summary_get_req', 'data_types_get_req', 'study_get_req', 'sample_template_filepaths_get_req', diff --git a/qiita_pet/handlers/api_proxy/studies.py b/qiita_pet/handlers/api_proxy/studies.py index 431583c57..547c7621f 100644 --- a/qiita_pet/handlers/api_proxy/studies.py +++ b/qiita_pet/handlers/api_proxy/studies.py @@ -11,6 +11,8 @@ from qiita_core.exceptions import IncompetentQiitaDeveloperError from qiita_core.util import execute_as_transaction from qiita_core.qiita_settings import r_client +from qiita_db.artifact import Artifact +from qiita_db.sql_connection import TRN from qiita_db.user import User from qiita_db.study import Study from qiita_db.exceptions import QiitaDBColumnError, QiitaDBLookupError @@ -114,8 +116,8 @@ def study_get_req(study_id, user_id): study_info['has_access_to_raw_data'] = study.has_access( User(user_id), True) or study.public_raw_download - study_info['show_biom_download_button'] = 'BIOM' in [ - a.artifact_type for a in study.artifacts()] + study_info['show_biom_download_button'] = len( + study.artifacts(artifact_type='BIOM')) != 0 study_info['show_raw_download_button'] = any([ True for pt in study.prep_templates() if pt.artifact is not None]) @@ -201,53 +203,71 @@ def study_prep_get_req(study_id, user_id): access_error = check_access(study_id, user_id) if access_error: return access_error - # Can only pass ids over API, so need to instantiate object + study = Study(int(study_id)) - prep_info = defaultdict(list) + prep_info = {dtype: [] for dtype in study.data_types} editable = study.can_edit(User(user_id)) - for dtype in study.data_types: - dtype_infos = list() - for prep in study.prep_templates(dtype): - if prep.status != 'public' and not editable: + with TRN: + sql = """SELECT prep_template_id, pt.name as name, data_type, + artifact_id, + creation_timestamp, modification_timestamp, visibility, + (SELECT COUNT(sample_id) + FROM qiita.prep_template_sample + WHERE prep_template_id = spt.prep_template_id) + as total_samples, + (SELECT COUNT(sample_id) + FROM qiita.prep_template_sample + WHERE prep_template_id = spt.prep_template_id + AND ebi_experiment_accession != '') + as ebi_experiment + FROM qiita.study_prep_template spt + LEFT JOIN qiita.prep_template pt USING (prep_template_id) + LEFT JOIN qiita.data_type USING (data_type_id) + LEFT JOIN qiita.artifact USING (artifact_id) + LEFT JOIN qiita.visibility USING (visibility_id) + WHERE study_id = %s + GROUP BY prep_template_id, pt.name, data_type, artifact_id, + creation_timestamp, modification_timestamp, + visibility + ORDER BY creation_timestamp""" + + TRN.add(sql, [study_id]) + for row in TRN.execute_fetchindex(): + row = dict(row) + if row['visibility'] != 'public' and not editable: continue - start_artifact = prep.artifact + # for those preps that have no artifact + if row['visibility'] is None: + row['visibility'] = 'sandbox' + info = { - 'name': prep.name, - 'id': prep.id, - 'status': prep.status, - 'total_samples': len(prep), - 'creation_timestamp': prep.creation_timestamp, - 'modification_timestamp': prep.modification_timestamp + 'name': row['name'], + 'id': row['prep_template_id'], + 'status': row['visibility'], + 'total_samples': row['total_samples'], + 'creation_timestamp': row['creation_timestamp'], + 'modification_timestamp': row['modification_timestamp'], + 'start_artifact': None, + 'start_artifact_id': None, + 'youngest_artifact': None, + 'num_artifact_children': 0, + 'youngest_artifact_name': None, + 'youngest_artifact_type': None, + 'ebi_experiment': row['ebi_experiment'] } - if start_artifact is not None: - youngest_artifact = prep.artifact.youngest_artifact + if row['artifact_id'] is not None: + start_artifact = Artifact(row['artifact_id']) + youngest_artifact = start_artifact.youngest_artifact info['start_artifact'] = start_artifact.artifact_type - info['start_artifact_id'] = start_artifact.id + info['start_artifact_id'] = row['artifact_id'] info['num_artifact_children'] = len(start_artifact.children) info['youngest_artifact_name'] = youngest_artifact.name info['youngest_artifact_type'] = \ youngest_artifact.artifact_type info['youngest_artifact'] = '%s - %s' % ( youngest_artifact.name, youngest_artifact.artifact_type) - info['ebi_experiment'] = len( - [v for _, v in prep.ebi_experiment_accessions.items() - if v is not None]) - else: - info['start_artifact'] = None - info['start_artifact_id'] = None - info['youngest_artifact'] = None - info['num_artifact_children'] = 0 - info['youngest_artifact_name'] = None - info['youngest_artifact_type'] = None - info['ebi_experiment'] = 0 - - dtype_infos.append(info) - - # default sort is in ascending order of creation timestamp - sorted_info = sorted(dtype_infos, - key=lambda k: k['creation_timestamp'], - reverse=False) - prep_info[dtype] = sorted_info + + prep_info[row['data_type']].append(info) return {'status': 'success', 'message': '', diff --git a/qiita_pet/handlers/api_proxy/tests/test_processing.py b/qiita_pet/handlers/api_proxy/tests/test_processing.py index 63b52d1b8..cd8079b7e 100644 --- a/qiita_pet/handlers/api_proxy/tests/test_processing.py +++ b/qiita_pet/handlers/api_proxy/tests/test_processing.py @@ -103,7 +103,7 @@ def test_job_ajax_get_req(self): 'command': 'Split libraries FASTQ', 'command_description': 'Demultiplexes and applies quality ' 'control to FASTQ data', - 'software': 'QIIME', + 'software': 'QIIMEq2', 'software_version': '1.9.1'} self.assertEqual(obs, exp) diff --git a/qiita_pet/handlers/api_proxy/tests/test_studies.py b/qiita_pet/handlers/api_proxy/tests/test_studies.py index ec938099f..4c574ecfd 100644 --- a/qiita_pet/handlers/api_proxy/tests/test_studies.py +++ b/qiita_pet/handlers/api_proxy/tests/test_studies.py @@ -229,8 +229,7 @@ def test_study_prep_get_req_failed_EBI(self): # actual test obs = study_prep_get_req(study.id, user_email) - temp_info = defaultdict(list) - temp_info['16S'] = [ + temp_info = {'16S': [ {"status": 'sandbox', 'name': 'Prep information %d' % pt.id, 'start_artifact': None, 'youngest_artifact': None, @@ -241,7 +240,7 @@ def test_study_prep_get_req_failed_EBI(self): 'num_artifact_children': 0, 'youngest_artifact_name': None, 'youngest_artifact_type': None, - 'total_samples': 3}] + 'total_samples': 3}]} exp = { 'info': temp_info, diff --git a/qiita_pet/handlers/artifact_handlers/tests/test_base_handlers.py b/qiita_pet/handlers/artifact_handlers/tests/test_base_handlers.py index f386b3c24..b1f283700 100644 --- a/qiita_pet/handlers/artifact_handlers/tests/test_base_handlers.py +++ b/qiita_pet/handlers/artifact_handlers/tests/test_base_handlers.py @@ -258,7 +258,7 @@ def test_artifact_summary_get_request(self): 'rev_comp_mapping_barcodes': 'False', 'min_per_read_length_fraction': '0.75', 'barcode_type': 'golay_12'}, - 'software_version': '1.9.1', 'software': 'QIIME'}, + 'software_version': '1.9.1', 'software': 'QIIMEq2'}, 'files': exp_files, 'is_from_analysis': False, 'summary': None, diff --git a/qiita_pet/handlers/study_handlers/prep_template.py b/qiita_pet/handlers/study_handlers/prep_template.py index e02d2c477..167f981bd 100644 --- a/qiita_pet/handlers/study_handlers/prep_template.py +++ b/qiita_pet/handlers/study_handlers/prep_template.py @@ -5,12 +5,13 @@ # # The full license is in the file LICENSE, distributed with this software. # ----------------------------------------------------------------------------- -from os.path import join +from os.path import join, relpath from tornado.web import authenticated from tornado.escape import url_escape import pandas as pd +from qiita_core.qiita_settings import qiita_config from qiita_pet.handlers.util import to_int from qiita_pet.handlers.base_handlers import BaseHandler from qiita_db.util import (get_files_from_uploads_folders, get_mountpoint, @@ -79,6 +80,14 @@ def get(self): fp = res['creation_job'].parameters.values['sample_sheet'] res['creation_job_filename'] = fp['filename'] res['creation_job_filename_body'] = fp['body'] + summary = None + if res['creation_job'].status == 'success': + if res['creation_job'].outputs: + # [0] is the id, [1] is the filepath + _file = res['creation_job'].outputs[ + 'output'].html_summary_fp[1] + summary = relpath(_file, qiita_config.base_data_dir) + res['creation_job_artifact_summary'] = summary self.render('study_ajax/prep_summary.html', **res) diff --git a/qiita_pet/handlers/study_handlers/tests/test_processing.py b/qiita_pet/handlers/study_handlers/tests/test_processing.py index 0a7f616a6..53e607dcf 100644 --- a/qiita_pet/handlers/study_handlers/tests/test_processing.py +++ b/qiita_pet/handlers/study_handlers/tests/test_processing.py @@ -99,7 +99,7 @@ def test_get(self): 'command': 'Split libraries FASTQ', 'command_description': 'Demultiplexes and applies quality ' 'control to FASTQ data', - 'software': 'QIIME', + 'software': 'QIIMEq2', 'software_version': '1.9.1'} self.assertEqual(loads(response.body), exp) diff --git a/qiita_pet/nginx_example.conf b/qiita_pet/nginx_example.conf index 400963dc7..36c3d7809 100644 --- a/qiita_pet/nginx_example.conf +++ b/qiita_pet/nginx_example.conf @@ -13,6 +13,12 @@ http { server localhost:21177; } + # define variables for the actions that shall be taken for websocket handshake + map $http_upgrade $connection_upgrade { + default upgrade; + '' close; + } + # listening to 8080 and redirecting to https server { listen 8080; @@ -56,6 +62,19 @@ http { alias /Users/username/qiita/qiita_db/support_files/test_data/; } + # enables communiction through websockets. + # Currently, only endpoints /consumer/, /analysis/selected/socket/, and /study/list/socket/ use websockets + location ~ ^/(consumer|analysis/selected/socket|study/list/socket)/ { + proxy_pass $scheme://mainqiita; + proxy_set_header Host $http_host; + proxy_redirect http:// https://; + proxy_http_version 1.1; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header Upgrade $http_upgrade; + proxy_set_header Connection $connection_upgrade; + proxy_set_header X-Forwarded-Host $http_host; + } + location / { proxy_pass $scheme://mainqiita; proxy_redirect off; diff --git a/qiita_pet/templates/study_ajax/prep_summary.html b/qiita_pet/templates/study_ajax/prep_summary.html index a943a8fdb..f7eeaf1ad 100644 --- a/qiita_pet/templates/study_ajax/prep_summary.html +++ b/qiita_pet/templates/study_ajax/prep_summary.html @@ -435,6 +435,9 @@

{{name}} - ID {{prep_id}} ({{data_type}}) {% if user_level in ('admin', 'wet-lab admin') and creation_job is not None %} SampleSheet + {% if creation_job_artifact_summary is not None %} + Creation Job Output + {% end %} {% end %} Edit name Prep info diff --git a/qiita_ware/__init__.py b/qiita_ware/__init__.py index 349b460d8..7fca55a7c 100644 --- a/qiita_ware/__init__.py +++ b/qiita_ware/__init__.py @@ -6,4 +6,4 @@ # The full license is in the file LICENSE, distributed with this software. # ----------------------------------------------------------------------------- -__version__ = "2023.12" +__version__ = "2024.02" diff --git a/qiita_ware/commands.py b/qiita_ware/commands.py index 83efcf5a4..249864ebe 100644 --- a/qiita_ware/commands.py +++ b/qiita_ware/commands.py @@ -215,40 +215,64 @@ def submit_EBI(artifact_id, action, send, test=False, test_size=False): LogEntry.create( 'Runtime', 'The submission: %d is larger than allowed (%d), will ' 'try to fix: %d' % (artifact_id, max_size, total_size)) - # transform current metadata to dataframe for easier curation - rows = {k: dict(v) for k, v in ebi_submission.samples.items()} - df = pd.DataFrame.from_dict(rows, orient='index') - # remove unique columns and same value in all columns - nunique = df.apply(pd.Series.nunique) - nsamples = len(df.index) - cols_to_drop = set( - nunique[(nunique == 1) | (nunique == nsamples)].index) - # maximize deletion by removing also columns that are almost all the - # same or almost all unique - cols_to_drop = set( - nunique[(nunique <= int(nsamples * .01)) | - (nunique >= int(nsamples * .5))].index) - cols_to_drop = cols_to_drop - {'taxon_id', 'scientific_name', - 'description', 'country', - 'collection_date'} - all_samples = ebi_submission.sample_template.ebi_sample_accessions - samples = [k for k in ebi_submission.samples if all_samples[k] is None] - if samples: - ebi_submission.write_xml_file( - ebi_submission.generate_sample_xml(samples, cols_to_drop), - ebi_submission.sample_xml_fp) + def _reduce_metadata(low=0.01, high=0.5): + # helper function to + # transform current metadata to dataframe for easier curation + rows = {k: dict(v) for k, v in ebi_submission.samples.items()} + df = pd.DataFrame.from_dict(rows, orient='index') + # remove unique columns and same value in all columns + nunique = df.apply(pd.Series.nunique) + nsamples = len(df.index) + cols_to_drop = set( + nunique[(nunique == 1) | (nunique == nsamples)].index) + # maximize deletion by removing also columns that are almost all + # the same or almost all unique + cols_to_drop = set( + nunique[(nunique <= int(nsamples * low)) | + (nunique >= int(nsamples * high))].index) + cols_to_drop = cols_to_drop - {'taxon_id', 'scientific_name', + 'description', 'country', + 'collection_date'} + all_samples = ebi_submission.sample_template.ebi_sample_accessions + + if action == 'ADD': + samples = [k for k in ebi_submission.samples + if all_samples[k] is None] + else: + samples = [k for k in ebi_submission.samples + if all_samples[k] is not None] + if samples: + ebi_submission.write_xml_file( + ebi_submission.generate_sample_xml(samples, cols_to_drop), + ebi_submission.sample_xml_fp) + + # let's try with the default pameters + _reduce_metadata() # now let's recalculate the size to make sure it's fine new_total_size = sum([stat(tr).st_size for tr in to_review if tr is not None]) LogEntry.create( - 'Runtime', 'The submission: %d after cleaning is %d and was %d' % ( + 'Runtime', + 'The submission: %d after defaul cleaning is %d and was %d' % ( artifact_id, total_size, new_total_size)) if new_total_size > max_size: - raise ComputeError( - 'Even after cleaning the submission: %d is too large. Before ' - 'cleaning: %d, after: %d' % ( + LogEntry.create( + 'Runtime', 'Submission %d still too big, will try more ' + 'stringent parameters' % (artifact_id)) + + _reduce_metadata(0.05, 0.4) + new_total_size = sum([stat(tr).st_size + for tr in to_review if tr is not None]) + LogEntry.create( + 'Runtime', + 'The submission: %d after defaul cleaning is %d and was %d' % ( artifact_id, total_size, new_total_size)) + if new_total_size > max_size: + raise ComputeError( + 'Even after cleaning the submission: %d is too large. ' + 'Before cleaning: %d, after: %d' % ( + artifact_id, total_size, new_total_size)) st_acc, sa_acc, bio_acc, ex_acc, run_acc = None, None, None, None, None if send: diff --git a/setup.py b/setup.py index 7649c4fc5..caa72a46f 100644 --- a/setup.py +++ b/setup.py @@ -10,7 +10,7 @@ from setuptools import setup from glob import glob -__version__ = "2023.12" +__version__ = "2024.02" classes = """