HoMi is your friend for Host-Microbiome dual transcriptome data processing.
HoMi is a pipeline developed to streamline processing host-microbiome dual transcriptome data, but it can also work with solely metagenomic or metatranscriptomic data for host filtering and mapping-based processing to taxonomic and functional profiles.
HoMi manages software environments and deployment on a Slurm-managed compute cluster, but it can also be run locally. For statistics tools, see the R package HoMiStats.
HoMi is currently under development! Please feel free to raise an issue or contribute.
mamba create -n HoMi python=3.11
conda activate HoMi
pip install homi-pipeline
HoMi.py <config_file> --cores <n_cores> --profile <profile_name>
An example config file is provided in tests/example_config.yaml
. This config file should contain paths to relevant files, like the metadata and databases. It can also be used to alter rule-specific resource requirements (details in example config).
Do you want to make sure your config file fits the requirements? Run check_config.py <config_filepath>
to find out! (See src/check_config.py
). HoMi will also automatically check your config before running the pipeline.
An example metadata file is provided in tests/example_metadata.csv
.
Metadata files should contain (at the minimum) a Sample column (named Sample
), a forward reads filepath column (column name specified in the config file under fwd_reads_path
), and a reverse reads filepath column (column name specified in the config file under rev_reads_path
). These filepaths should be relative to the directory from which you are running HoMi
.
If running HoMi on a cluster with SLURM, please setup a Snakemake SLURM profile. This will handle submitting batch jobs for each sample for each step of the pipeline. Then, pass this the name of this profile to HoMi.py <config_file> --profile <profile_name>
, no cores need to be passed.
src/profile_setup.py
is a script that can be used to setup a cluster profile for Slurm integration, with options for clusters with and without hyperthreaded cores.
If conda environments have already been built, and you'd like snakemake to not build them, pass the argument --conda_prebuilt
. This is particularly useful if running HoMi on a system with ARM architecture, like a Mac with M1/M2 chip.
The working directory for HoMi can be set using the --workdir
flag. This will set the "home base" for any relative filepaths within HoMi or your config, but it will not affect the filepath of the config or profile provided to HoMi.
# Create the mock community (too big for github)
python benchmarking/synthetic/create_mock_community.py tests/mock_community_sim_params.csv --work_dir tests/mock_community/
# Use the example config and example metadata provided
HoMi.py tests/example_config.yaml --cores 1
Sometimes, when snakemake unexpectedly exits (e.g., due to a server connection timeout), the directory may be locked. Pass the argument --unlock
to unlock the directory before running HoMi.py
.
- Create a symbolic link to the sequencing files.
- Trim reads using Trimmomatic
- Trims adapter sequences from reads
- Trims reads with a starting PHRED quality below
readstart_qual_min
- Trims reads with an ending PHRED quality below
readend_qual_min
- Trims reads with wherever there's a 4-base sliding window average PHRED quality score below 20
- Removes reads with a length below
min_readlen
- Generates quality report with FastQC + MultiQC
- Trim read ends using Seqtk
- Consider this a second pass, in case Trimmomatic didn't catch something
- Generates a second pass quality report with FastQC + MultiQC after the second trimming step
-
Remove host reads using Hostile
- User should pass a database in the config file. Currently supported options are
human-t2t-hla
andhuman-t2t-hla-argos985
. - OR users can pass a filepath to a bowtie2 index without the
.bt
extensions (e.g.,index/example_index
, where files exist namedindex/example_index.bt1
,index/example_index.bt2
, etc.).
- User should pass a database in the config file. Currently supported options are
-
Microbial read profiling
- (A) Run HUMAnN pipeline on nonhost reads to profile microbial reads
- Makes microshades taxa barplot from MetaPhlan and HUMAnN outputs
- (B) Run Kraken + Bracken on nonhost reads to profile microbial taxa
- Second taxonomy method for redundancy/validation/comparison
- (A) Run HUMAnN pipeline on nonhost reads to profile microbial reads
-
Align all reads against the host genome
- HoMi will by default download the GRCh38 human reference genome, but you can provide an alternative genome (fna + gtf) if it's already downloaded
- Either BBmap or HISAT2 is used to map the reads (HISAT2 by default), and featureCounts is used to generate a read count table
To use this pipeline for metagenomics/metatranscriptomics, you can add an (optional) column in the metadata, titled map_host
. If map_host
doesn't exist in your metadata, the entire pipeline (including mapping to the host genome) will be performed for all samples. If map_host
exists, it should only contain boolean values (True/False), and the host genome will only be mapped for samples where map_host
is True. This will still run host decontamination before microbial taxonomic/functional profiling.
Example:
Sample,map_host
sample_nohost,False
sample_withhost,True
snakefile
contains the bulk of the pipelinesrc/
containsHoMi.py
, a wrapper controlling the behavior of the snakemake pipeline, as well as other auxilliary utility scripts.conda_envs/
contains the conda environments for each rule in the snakemake pipelinedata/
contains relevant data, such as adapter sequences to be removed during trimming.
src/HoMi_cleanup.py
contains a script that can be used to clean up unecessary intermediate files, if you decide you don't want them. Current functionality deletes temporary files from failed HUMAnN runs and HUMANnN.bam
files across all samples provided in the metadata.src/profile_setup.py
is a script that can be used to setup a cluster profile for Slurm integration, with options for clusters with and without hyperthreaded cores. Researchers at CU Boulder/Anschutz using Alpine should run this script asprofile_setup.py --cluster_type slurm-nosmt
to setup a Slurm snakemake profile compatible with Alpine, which does not have hyperthreaded cores.