RNA-MosaicHunter

1. Software requirement

java 1.8 (https://java.com/en/)
UCSC blat (http://genome.ucsc.edu/FAQ/FAQblat.html#blat3)
STAR 2.5.4
Picard 1.138
GATK 3.6
SAMtools 0.1.19
BEDTools 2.23.0

2. Prepare reference

RNA-MosaicHunter requires a fasta file (.fasta, .fa) for your reference genome. It is better to make sure that your reference file has the same name and order of contigs to your .bam file(s) and .bed file(s). Reads aligned to any contigs which do not appear in the reference file will be ignored. When running RNA-MosaicHunter, you need to set the top parameter reference_file.

3. Prepare data for RNA-MosaicHunter Input

RNA-MosaicHunter currently accepts aligned reads from RNA sequencing (RNA-seq). We recommend STAR for read mapping and GATK/Picard for read pre-processing. These pre-processing steps are important in order to reduce false positives in identifying mosaic sites. The following is an example of how the reads are pre-processed.

Note that default config file and resources files are designed for GRCh37 (hg19). To run on other versions (e.g. GRCh38), users should prepare hg38 reference files and change the parameters "valid_references", "chr_x_name", "chr_y_name" in the config file to the corresponding chromosome names (e.g. with "chr" for GRCh38).

3.1 Alignment -- STAR

STAR --runThreadN ${THREAD_NUM} --twopassMode Basic --outSAMattributes All --outSAMtype BAM Unsorted --readFilesCommand zcat --genomeDir ${REFERENCE_DIR}/human_v37_contig_hg19_hs37d5_STAR_index10_gencode19 --readFilesIn "rnaseq/fastq/${IND_NAME}/${SAMPLE_ID}_1.fastq.gz" "rnaseq/fastq/${IND_NAME}/${SAMPLE_ID}_2.fastq.gz" --outFileNamePrefix "$OUTPUT_DIR/${IND_NAME}/${SAMPLE_ID}/"

3.2 Remove duplicates -- Picard

java -jar picard.jar AddOrReplaceReadGroups INPUT=rnaseq/STAR/${IND_NAME}/${SAMPLE_ID}/Aligned.out.bam OUTPUT=rnaseq/Picard/${IND_NAME}_${SAMPLE_ID}.sorted.bam SO=coordinate ID=${IND_NAME}_${SAMPLE_ID} LB=unknown PL=illumina SM=${IND_NAME}_${SAMPLE_ID} PU=unknown
java -jar picard.jar MarkDuplicates INPUT=rnaseq/Picard/${IND_NAME}_${SAMPLE_ID}.sorted.bam OUTPUT=rnaseq/Picard/${IND_NAME}_${SAMPLE_ID}.masked.bam M=rnaseq/Picard/${IND_NAME}_${SAMPLE_ID}.matrix REMOVE_DUPLICATES=true AS=true VALIDATION_STRINGENCY=SILENT
samtools index rnaseq/Picard/${IND_NAME}_${SAMPLE_ID}.masked.bam

3.3 Split'N'Trim and mapping quality reassignment -- GATK

java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R ${REFERENCE_DIR}/human_v37_contig_hg19_hs37d5/human_v37_contig_hg19_hs37d5.fasta -I rnaseq/Picard/${IND_NAME}_${SAMPLE_ID}.masked.f.bam -o rnaseq/GATK/${IND_NAME}_${SAMPLE_ID}.split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

3.4 Local realignment -- GATK

java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -nt ${THREAD_NUM} -R ${REFERENCE_DIR}/human_v37_contig_hg19_hs37d5/human_v37_contig_hg19_hs37d5.fasta -I rnaseq/GATK/${IND_NAME}_${SAMPLE_ID}.split.bam -o rnaseq/GATK/${IND_NAME}_${SAMPLE_ID}.intervals

java -jar GenomeAnalysisTK.jar -T IndelRealigner -R ${REFERENCE_DIR}/human_v37_contig_hg19_hs37d5/human_v37_contig_hg19_hs37d5.fasta -I rnaseq/GATK/${IND_NAME}_${SAMPLE_ID}.split.bam -targetIntervals rnaseq/GATK/${IND_NAME}_${SAMPLE_ID}.intervals -o rnaseq/GATK/${IND_NAME}_${SAMPLE_ID}.realigned.bam
echo "Job ended at $(date)"

3.5 Read calibration -- GATK

java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R ${REFERENCE_DIR}/human_v37_contig_hg19_hs37d5/human_v37_contig_hg19_hs37d5.fasta -I rnaseq/GATK/${IND_NAME}_${SAMPLE_ID}.realigned.bam -knownSites ${REFERENCE_DIR}/dbsnp_137.b37.vcf -o rnaseq/GATK/${IND_NAME}_${SAMPLE_ID}.recal_data.grp

java -jar GenomeAnalysisTK.jar -T PrintReads -nct ${THREAD_NUM} -R ${REFERENCE_DIR}/human_v37_contig_hg19_hs37d5/human_v37_contig_hg19_hs37d5.fasta -I rnaseq/GATK/${IND_NAME}_${SAMPLE_ID}.realigned.bam -BQSR rnaseq/GATK/${IND_NAME}_${SAMPLE_ID}.recal_data.grp -o rnaseq/GATK/${IND_NAME}_${SAMPLE_ID}.BQSR.bam

3.6 Remove improper-paired and multiple-hit reads -- Samtools

samtools view -f 2 -F 768 -h rnaseq/GATK/${IND_NAME}_${SAMPLE_ID}.BQSR.bam | samtools view -Sb - > rnaseq/bam_ready/${IND_NAME}_${SAMPLE_ID}.final.bam
samtools index rnaseq/bam_ready/${IND_NAME}_${SAMPLE_ID}.final.bam

3.7 Calculate coverage -- Bedtools (not necessary)

/BEDtools-2.23.0/bin/coverageBed -abam rnaseq/bam_ready/${IND_NAME}_${SAMPLE_ID}.final.bam -b ${REFERENCE_DIR}/human_gene_gencode19.b37.exon.bed -hist | awk '$1=="all"' > rnaseq/coverageBed/${IND_NAME}_${SAMPLE_ID}.tsv

4. Example data for RNA-MosaicHunter Input

{Path to example data}

5. Run RNA-MosaicHunter

5.1 RNA-MosaicHunter

java -Xmx64G -jar ${MOSAICHUNTER_DIR}/build/mosaichunter.jar -C ${MOSAICHUNTER_CONFIG} -P input_file=rnaseq/bam_ready/${IND_NAME}_${SAMPLE_ID}.final.bam -P mosaic_filter.sex=${SEX} -P output_dir=${OUTPUT_DIR} -P common_site_filter.bed_file=${ERROR_PRONE_BED} -P misaligned_reads_filter.max_NM=3

cat rnaseq/MosaicHunter/${IND_NAME}/${SAMPLE_ID}/final.passed.tsv | awk '$3==$7||$3==$9' | awk '$11~"N/A"&&$11~"1.0"' > rnaseq/MosaicHunter/${IND_NAME}/${SAMPLE_ID}/final.clean.tsv

5.2 RNA-editing filter

We have two different mode of RNA editint filters:

full-removal: removes all A>G / T>C mutations

Rscript RNA_Editing_Filter_MH.R final.clean.tsv full-removal final.clean.filter_RNA_edit.tsv

filter-based-removal: removes RNA editing sites reported in DARNED [1] and REDIportal [2], and A>G mutations on transcribed strand and T>C mutations on untranscribed strand

Rscript RNA_Editing_Filter_MH.R final.clean.tsv filter-based-removal final.clean.filter_RNA_edit.tsv

6. Output format

All of the output files are in the specified directory output_dir, including the final candidate list final.passed.tsv, the filtered and remained remaining variant lists of each filters (.filtered.tsv, .passed.tsv), and other temporary files, such as blat input and output.

The final.clean.filter_RNA_edit.tsv files are in tab-separated format, whose columns' meanings for single mode are listed below:

Contig / chromosome name
Position / coordinate on the contig (1-based)
Base of reference allele
Total depth of this site
Pileuped sequencing bases at this site
Pileuped sequencing baseQs at this site
Base of major allele
Depth of major allele
Base of minor allele
Depth of minor allele
dbSNP allele frequency of major and minor alleles
log10 prior probability of major-homozygous genotype
log10 prior probability of heterozygous genotype
log10 prior probability of minor-homozygous genotype
log10 prior probability of mosaic genotype
log10 likelihood of major-homozygous genotype
log10 likelihood of heterozygous genotype
log10 likelihood of minor-homozygous genotype
log10 likelihood of mosaic genotype
log10 posterior probability of major-homozygous genotype
log10 posterior probability of heterozygous genotype
log10 posterior probability of minor-homozygous genotype
log10 posterior probability of mosaic genotype
Mosaic posterior probability

You can get a high-confidence candidate list of mosaic sites by filtering the 24th column (mosaic posterior probability) or sorting this column from high to low.

References

[1] Kiran,A. and Baranov,P.V. (2010) DARNED: a DAtabase of RNa EDiting in humans. Bioinformatics, 26, 1772–1776.
[2] Picardi,E., D’Erchia,A.M., Lo Giudice,C. and Pesole,G. (2017) REDIportal: a comprehensive database of A-to-I RNA editing events in humans. Nucleic Acids Res., 45, D750–D757.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA-MosaicHunter

1. Software requirement

2. Prepare reference

3. Prepare data for RNA-MosaicHunter Input

3.1 Alignment -- STAR

3.2 Remove duplicates -- Picard

3.3 Split'N'Trim and mapping quality reassignment -- GATK

3.4 Local realignment -- GATK

3.5 Read calibration -- GATK

3.6 Remove improper-paired and multiple-hit reads -- Samtools

3.7 Calculate coverage -- Bedtools (not necessary)

4. Example data for RNA-MosaicHunter Input

5. Run RNA-MosaicHunter

5.1 RNA-MosaicHunter

5.2 RNA-editing filter

6. Output format

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
build		build
conf		conf
lib		lib
resources		resources
src/cn/edu/pku/cbi/mosaichunter		src/cn/edu/pku/cbi/mosaichunter
.gitattributes		.gitattributes
README.md		README.md
RNA_Editing_Filter_MH.R		RNA_Editing_Filter_MH.R
build.xml		build.xml

AugustHuang/RNA-MosaicHunter

Folders and files

Latest commit

History

Repository files navigation

RNA-MosaicHunter

1. Software requirement

2. Prepare reference

3. Prepare data for RNA-MosaicHunter Input

3.1 Alignment -- STAR

3.2 Remove duplicates -- Picard

3.3 Split'N'Trim and mapping quality reassignment -- GATK

3.4 Local realignment -- GATK

3.5 Read calibration -- GATK

3.6 Remove improper-paired and multiple-hit reads -- Samtools

3.7 Calculate coverage -- Bedtools (not necessary)

4. Example data for RNA-MosaicHunter Input

5. Run RNA-MosaicHunter

5.1 RNA-MosaicHunter

5.2 RNA-editing filter

6. Output format

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages