- Raw read files in fastq format from NIH SRA database BioProject PRJNA910585
- ./Fasta/SARS2-FP_with_flank.fa: Fusion peptide sequences with 21 nt upstream (5' flank)
- ./Fasta/FP_ref.fa: Reference (i.e. wild type) amino acid sequences (primer regions not included)
- ./data/Tree_aa_fitness.csv: S mutational fitness estimated from phylogeny by Bloom & Neher. The orignal data are from here.
- ./data/BA1_DMS_muteffects_observed.csv: S mutational fitness measured by a pseudovirus system by Dadonaite et al. The orignal data are from here.
-
Generating foward (NNK + internal barcode) and reverse primers (constant)
python3 script/lib_primer_design.py
- Input file:
- Output files:
-
Generating barcode file
python3 script/check_barcode.py
- Input files:
- Output file:
-
Merge overlapping paired-end reads using PEAR
pear -f [FASTQ FILE FOR FORWARD READ] -r [FASTQ FILE FOR FORWARD READ] -o [OUTPUT FASTQ FILE]
- Output files should be placed in the fastq_merged/ folder and named as described in ./doc/filename_merged_fastq.tsv
-
Counting variants based on nucleotide sequences
python3 script/FP_fastq2count.py
- Input files:
- Merged read files in fastq_merged/ folder
- Output files:
- result/FP_DMS_count_nuc.tsv
- Input files:
-
Convert nucleotide sequences to amino acid mutations
python3 script/FP_count_nuc2aa.py
- Input files:
- ./data/barcodes.tsv
- ./Fasta/FP_ref.fa
- result/FP_DMS_count_nuc.tsv
- Output files:
- Input files:
-
Convert nucleotide sequences to codon variants
python3 script/FP_count_nuc2codon.py
- Input files:
- ./data/barcodes.tsv
- ./Fasta/FP_ref.fa
- result/FP_DMS_count_nuc.tsv
- Output files:
- Input files:
-
Compute fitness
python3 script/FP_count2fit.py
- Input files:
- Output file:
-
Convert B factor in PDB file into mean fitness value
python3 script/convert_Bfactor_to_fit.py
- Input file:
- Output file:
-
Plot correlation between replicates and compare silent/missense/nonsense
Rscript script/plot_QC.R
- Input file:
- Output files:
- graph/QC_*.png
-
Plot correlation between fitness measurements in this study and those in previous studies
python3 script/plot_cor_measures.py
- Input file:
- Output files:
- graph/cor_*.png
-
Plot heatmap for the fitnss of individual mutations
Rscript script/plot_heatmap_fit.R
- Input file:
- Ouput file:
- graph/FP_fit_heatmap_*.png
-
Plot mean fitness (i.e. mutational tolerance) of individual residue positions
Rscript script/plot_mean_fit.R
- Input file:
- Output file:
-
Plot meanfitness on structure
pymol script/plot_Bfactor_as_fit.pml
- Input file:
- Output file:
-
Plot heatmap for the antibody escape of individual mutations
Rscript script/plot_heatmap_escape.R
- Input file:
- Ouput file:
- graph/FP_escape_*.png
-
Plot heatmap for the codon variants
Rscript script/plot_heatmap_codon_freq.R
- Input file:
- Ouput file:
- graph/FP_codon_freq_*.png
-
Identity the number of amino acid mutations on each merged read
python3 script/analyze_mut_rate.py
- Input file:
- Merged read files in fastq_merged/ folder
- ./Fasta/FP_ref.fa
- Output file:
- Input file:
-
Plot mutation rate
Rscript script/plot_lib_mut_count.R
- Input file:
- Output file: