From f751481123860449eeda987dea121abbc7ecfbd2 Mon Sep 17 00:00:00 2001 From: "41898282+github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 21 Jan 2025 16:12:09 +0000 Subject: [PATCH] Deployed fa18d60 to dev with MkDocs 1.6.1 and mike 2.1.3 --- dev/404.html | 2 +- dev/index.html | 2 +- dev/sitemap.xml.gz | Bin 127 -> 127 bytes dev/user-guide/contributions/index.html | 2 +- dev/user-guide/getting-started/index.html | 2 +- dev/user-guide/output/index.html | 2 +- dev/user-guide/preparing-files/index.html | 2 +- dev/user-guide/run/index.html | 2 +- dev/user-guide/test/index.html | 2 +- dev/user-guide/troubleshooting/index.html | 2 +- 10 files changed, 9 insertions(+), 9 deletions(-) diff --git a/dev/404.html b/dev/404.html index 8808946..a54227d 100644 --- a/dev/404.html +++ b/dev/404.html @@ -1 +1 @@ - MAPLE

404 - Not found

\ No newline at end of file + MAPLE

404 - Not found

\ No newline at end of file diff --git a/dev/index.html b/dev/index.html index fd5c281..834ac52 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1 +1 @@ - MAPLE

Background

MAPLE ([M]NaseSeq [A]nalysis [P]ipe[l]i[n]e) was developed in support of NIH's Dr. Zhurkin Laboratory. It has been developed and tested solely on NIH HPC Biowulf.

\ No newline at end of file + MAPLE

Background

MAPLE ([M]NaseSeq [A]nalysis [P]ipe[l]i[n]e) was developed in support of NIH's Dr. Zhurkin Laboratory. It has been developed and tested solely on NIH HPC Biowulf.

\ No newline at end of file diff --git a/dev/sitemap.xml.gz b/dev/sitemap.xml.gz index b2e54f8279754a64d32038ed513795c657fa8da3..8ddcf7d47487a36114e02ffbfd29ad5ab01187fd 100644 GIT binary patch delta 13 Ucmb=gXP58h;AnW Contributions - MAPLE
Skip to content
\ No newline at end of file + Contributions - MAPLE
Skip to content
\ No newline at end of file diff --git a/dev/user-guide/getting-started/index.html b/dev/user-guide/getting-started/index.html index d3b45a0..802302d 100644 --- a/dev/user-guide/getting-started/index.html +++ b/dev/user-guide/getting-started/index.html @@ -1,4 +1,4 @@ - 1. Getting Started - MAPLE
Skip to content

Overview

The MAPLE (**M**NaseSeq **A**nalysis **P**ipe**l**i**n**e) github repository is stored locally, and will be used for project deployment. Multiple projects can be deployed from this one point simultaneously, without concern.

1. Getting Started

1.1 Introduction

MAPLE beings with raw FASTQ files and performs adaptor trimming, assembly, and alignment. Bed files are created, and depending on user input, selected regions of interst may be used. Fragment centers (DYAD's) are then determined, and histograms of occurences are created. QC reports are also generated with each project.

The following are sub-commands used within MNaseSeq:

  • init: initalize the pipeline
  • dryrun: predict the binding of peptides to any MHC molecule
  • run: execute the pipeline on the Biowulf HPC
  • runlocal: execute a local, interactive, session
  • unlock: unlock directory
  • reset: delete a workdir, and re-initialize

1.2 Setup Dependencies

MNaseSeq has several dependencies listed below. These dependencies can be installed by a sysadmin. All dependencies will be automatically loaded if running from Biowulf.

  • bedtools: "bedtools/2.30.0"
  • bowtie2: "bowtie/2-2.4.2"
  • cutadapt: "cutadapt/1.18"
  • pear: "pear/0.9.11"
  • python: "python/3.7"
  • R: "R/4.0.3"
  • samtools: "samtools/1.11"

1.3 Login to the cluster

MAPLE has been exclusively tested on Biowulf HPC. Login to the cluster's head node and move into the pipeline location.

# ssh into cluster's head node
+ 1. Getting Started - MAPLE      

Overview

The MAPLE (**M**NaseSeq **A**nalysis **P**ipe**l**i**n**e) github repository is stored locally, and will be used for project deployment. Multiple projects can be deployed from this one point simultaneously, without concern.

1. Getting Started

1.1 Introduction

MAPLE beings with raw FASTQ files and performs adaptor trimming, assembly, and alignment. Bed files are created, and depending on user input, selected regions of interst may be used. Fragment centers (DYAD's) are then determined, and histograms of occurences are created. QC reports are also generated with each project.

The following are sub-commands used within MNaseSeq:

  • init: initalize the pipeline
  • dryrun: predict the binding of peptides to any MHC molecule
  • run: execute the pipeline on the Biowulf HPC
  • runlocal: execute a local, interactive, session
  • unlock: unlock directory
  • reset: delete a workdir, and re-initialize

1.2 Setup Dependencies

MNaseSeq has several dependencies listed below. These dependencies can be installed by a sysadmin. All dependencies will be automatically loaded if running from Biowulf.

  • bedtools: "bedtools/2.30.0"
  • bowtie2: "bowtie/2-2.4.2"
  • cutadapt: "cutadapt/1.18"
  • pear: "pear/0.9.11"
  • python: "python/3.7"
  • R: "R/4.0.3"
  • samtools: "samtools/1.11"

1.3 Login to the cluster

MAPLE has been exclusively tested on Biowulf HPC. Login to the cluster's head node and move into the pipeline location.

# ssh into cluster's head node
 ssh -Y $USER@biowulf.nih.gov
 

1.4 Load an interactive session

An interactive session should be started before performing any of the pipeline sub-commands, even if the pipeline is to be executed on the cluster.

# Grab an interactive node
 srun -N 1 -n 1 --time=12:00:00 -p interactive --mem=8gb  --cpus-per-task=4 --pty bash
diff --git a/dev/user-guide/output/index.html b/dev/user-guide/output/index.html
index 421edac..b46ac73 100644
--- a/dev/user-guide/output/index.html
+++ b/dev/user-guide/output/index.html
@@ -1 +1 @@
- 4. Expected Output - MAPLE      

4. Expected Outputs

The following directories are created under the output_directory, dependent on the Pass of the pipeline

First Pass (first_pass)

  • 01_trim: this directory includes trimmed FASTQ files
  • 02_assembled: this directory includes assembled FASTQ files
  • 03_aligned: this directory includes aligned BAM files and BED files
    • 01_bam: BAM files after alignment
    • 02_bed: converted bed files
    • 03_histograms: histograms of bed files

Second Pass (second_pass)

  • 04_dyads: this directory contains DYAD calculated files
    • 01_DYADs: this includes direct DYAD calculations
    • 02_histograms: this includes histogram occurances
    • 03_CSV: this includes the occurance data in CSV format

Third Pass (third_pass)

  • /path/to/output/contrast: this includes the contrast file for each sample provided in the contrasts.tsv manifest

All Passes

  • log: this includes log files
    • [date of run]: the slurm output files of the pipeline sorted by pipeline start time; copies of config and manifest files used in this specific pipeline run; error reporting script
\ No newline at end of file + 4. Expected Output - MAPLE

4. Expected Outputs

The following directories are created under the output_directory, dependent on the Pass of the pipeline

First Pass (first_pass)

  • 01_trim: this directory includes trimmed FASTQ files
  • 02_assembled: this directory includes assembled FASTQ files
  • 03_aligned: this directory includes aligned BAM files and BED files
    • 01_bam: BAM files after alignment
    • 02_bed: converted bed files
    • 03_histograms: histograms of bed files

Second Pass (second_pass)

  • 04_dyads: this directory contains DYAD calculated files
    • 01_DYADs: this includes direct DYAD calculations
    • 02_histograms: this includes histogram occurances
    • 03_CSV: this includes the occurance data in CSV format

Third Pass (third_pass)

  • /path/to/output/contrast: this includes the contrast file for each sample provided in the contrasts.tsv manifest

All Passes

  • log: this includes log files
    • [date of run]: the slurm output files of the pipeline sorted by pipeline start time; copies of config and manifest files used in this specific pipeline run; error reporting script
\ No newline at end of file diff --git a/dev/user-guide/preparing-files/index.html b/dev/user-guide/preparing-files/index.html index b2ce009..1fda458 100644 --- a/dev/user-guide/preparing-files/index.html +++ b/dev/user-guide/preparing-files/index.html @@ -1,4 +1,4 @@ - 2. Preparing Files - MAPLE

2. Preparing Files

The pipeline is controlled through editing configuration and manifest files. Defaults are found in the /WORKDIR/ after initialization.

2.1 Configs

The configuration files control parameters and software of the pipeline. These files are listed below:

  • resources/cluster.yaml
  • resources/tools.yaml
  • config.yaml

2.1.1 Cluster YAML (REQUIRED)

The cluster configuration file dictates the resouces to be used during submission to Biowulf HPC. There are two differnt ways to control these parameters - first, to control the default settings, and second, to create or edit individual rules. These parameters should be edited with caution, after significant testing.

2.1.2 Tools YAML (REQUIRED)

The tools configuration file dictates the version of each tool that is being used. Updating the versions may break specific rules if versions are not backwards compatible with the defaults listed.

2.1.3 Config YAML (REQUIRED)

There are several groups of parameters that are editable for the user to control the various aspects of the pipeline. These are :

  • Folders and Paths
    • These parameters will include the input and ouput files of the pipeline, as well as list all manifest names.
  • User parameters
    • These parameters will control the pipeline features. These include thresholds and whether to perform processes.

2.2 Preparing Manifests

There are two manifests used for the pipeline. These files describe information on the samples and desired contrasts. The paths of these files are defined in the config.yaml file. These files are:

  • sampleManifest (REQUIRED for all Passes)
  • contrastManifest (REQUIRED for third_pass)

2.2.1 Samples Manifest

This manifest will include information to sample level information. It includes the following column headers: sampleName type path_to_R1_fastq path_to_R2_fastq

  • sampleName: the sampleID associated with the fasta file; which are unique. This may be a shorthand name, and will be used throughout the analysis.
  • type: demographic information regarding the sample; example 'tumor'
  • path_to_R1_fastq: the full path to the R1.fastq.gz file
  • path_to_R1_fastq: the full path to the R2.fastq.gz file

An example sampleManifest file with multiplexing of one sample. Notice that the multiplexID test_1 is repeated, as Ro_Clip and Control_Clip are both found in the same fastq file, whereas test_2 is not multiplexed:

sampleName  type    path_to_R1_fastq                path_to_R2_fastq
+ 2. Preparing Files - MAPLE      

2. Preparing Files

The pipeline is controlled through editing configuration and manifest files. Defaults are found in the /WORKDIR/ after initialization.

2.1 Configs

The configuration files control parameters and software of the pipeline. These files are listed below:

  • resources/cluster.yaml
  • resources/tools.yaml
  • config.yaml

2.1.1 Cluster YAML (REQUIRED)

The cluster configuration file dictates the resouces to be used during submission to Biowulf HPC. There are two differnt ways to control these parameters - first, to control the default settings, and second, to create or edit individual rules. These parameters should be edited with caution, after significant testing.

2.1.2 Tools YAML (REQUIRED)

The tools configuration file dictates the version of each tool that is being used. Updating the versions may break specific rules if versions are not backwards compatible with the defaults listed.

2.1.3 Config YAML (REQUIRED)

There are several groups of parameters that are editable for the user to control the various aspects of the pipeline. These are :

  • Folders and Paths
    • These parameters will include the input and ouput files of the pipeline, as well as list all manifest names.
  • User parameters
    • These parameters will control the pipeline features. These include thresholds and whether to perform processes.

2.2 Preparing Manifests

There are two manifests used for the pipeline. These files describe information on the samples and desired contrasts. The paths of these files are defined in the config.yaml file. These files are:

  • sampleManifest (REQUIRED for all Passes)
  • contrastManifest (REQUIRED for third_pass)

2.2.1 Samples Manifest

This manifest will include information to sample level information. It includes the following column headers: sampleName type path_to_R1_fastq path_to_R2_fastq

  • sampleName: the sampleID associated with the fasta file; which are unique. This may be a shorthand name, and will be used throughout the analysis.
  • type: demographic information regarding the sample; example 'tumor'
  • path_to_R1_fastq: the full path to the R1.fastq.gz file
  • path_to_R1_fastq: the full path to the R2.fastq.gz file

An example sampleManifest file with multiplexing of one sample. Notice that the multiplexID test_1 is repeated, as Ro_Clip and Control_Clip are both found in the same fastq file, whereas test_2 is not multiplexed:

sampleName  type    path_to_R1_fastq                path_to_R2_fastq
 Sample1     tumor   /path/to/sample1.R1.fastq.gz    /path/to/sample1.R2.fastq.gz
 Sample2     tumor   /path/to/sample2.R1.fastq.gz    /path/to/sample2.R2.fastq.gz
 Sample3     tumor   /path/to/sample3.R1.fastq.gz    /path/to/sample3.R2.fastq.gz
diff --git a/dev/user-guide/run/index.html b/dev/user-guide/run/index.html
index 53ae14d..6753e7b 100644
--- a/dev/user-guide/run/index.html
+++ b/dev/user-guide/run/index.html
@@ -1,4 +1,4 @@
- 3. Running the Pipeline - MAPLE      

3. Running the Pipeline

3.1 Pipeline Overview

The Snakemake workflow has a multiple options:

Usage:
+ 3. Running the Pipeline - MAPLE      

3. Running the Pipeline

3.1 Pipeline Overview

The Snakemake workflow has a multiple options:

Usage:
     ./run -m/--runmode=<RUNMODE> -w/--workdir=<WORKDIR>
 
     Required Arguments:
diff --git a/dev/user-guide/test/index.html b/dev/user-guide/test/index.html
index 4c80858..f748bf1 100644
--- a/dev/user-guide/test/index.html
+++ b/dev/user-guide/test/index.html
@@ -1,4 +1,4 @@
- 5. Running Example Data - MAPLE      

5. Pipeline Tutorial

Welcome to the MNaseSeq Pipeline Tutorial!

5.1 Getting Started

Review the information on the Getting Started for a complete overview the pipeline. The tutorial below will use test data available on NIH Biowulf HPC only. All example code will assume you are running v1.0 of the pipeline, from the shared [tobedetermined] storage directory, using test_1 data.

A. Change working directory to the iCLIP repository

# general format
+ 5. Running Example Data - MAPLE      

5. Pipeline Tutorial

Welcome to the MNaseSeq Pipeline Tutorial!

5.1 Getting Started

Review the information on the Getting Started for a complete overview the pipeline. The tutorial below will use test data available on NIH Biowulf HPC only. All example code will assume you are running v1.0 of the pipeline, from the shared [tobedetermined] storage directory, using test_1 data.

A. Change working directory to the iCLIP repository

# general format
 cd /path/to/pipeline/[version number]
 
 # example
diff --git a/dev/user-guide/troubleshooting/index.html b/dev/user-guide/troubleshooting/index.html
index 0f47661..60cba49 100644
--- a/dev/user-guide/troubleshooting/index.html
+++ b/dev/user-guide/troubleshooting/index.html
@@ -1,4 +1,4 @@
- Troubleshooting - MAPLE      

Troubleshooting

Recommended steps to troubleshoot the pipeline.

1.1 Email

Check your email for an email regarding pipeline failure. You will receive an email from slurm@biowulf.nih.gov with the subject: Slurm Job_id=[#] Name=ccbr1214 Failed, Run time [time], FAILED, ExitCode 1

1.2 Error Report

Run the error report script

cd /[output_dir]/log/[time_of_run]
+ Troubleshooting - MAPLE      

Troubleshooting

Recommended steps to troubleshoot the pipeline.

1.1 Email

Check your email for an email regarding pipeline failure. You will receive an email from slurm@biowulf.nih.gov with the subject: Slurm Job_id=[#] Name=ccbr1214 Failed, Run time [time], FAILED, ExitCode 1

1.2 Error Report

Run the error report script

cd /[output_dir]/log/[time_of_run]
 sh 00_create_error_report.sh
 cat error.log
 

Review the report for the rules that erred, and the sample information. An example report is listed below:

The following error(s) were found in rules: