REdiscoverTE

IMPORTANT. THIS IS NOT MY SOFTWARE.

IT IS IMPORTED (and modified) FROM NON-GITHUB SOURCES.

https://www.nature.com/articles/s41467-019-13035-2

http://research-pub.gene.com/REdiscoverTEpaper/

module load r
R

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(version = "3.15")

BiocManager::install(c('tibble','readr','dplyr','Biobase','edgeR','parallel','EDASeq','ggplot2','RColorBrewer','pheatmap','gridExtra','grid','gtable','RColorBrewer','biomaRt'))

Create Salmon Index

zcat original/REdiscoverTE/rollup_annotation/REdiscoverTE_whole_transcriptome_hg38-20/*.fa.gz > genome.fasta

salmon index \
	-t genome.fasta \
	--threads 64 \
	-i REdiscoverTE

Align Samples to Salmon Index

for f in ${SAMPLE_DIR}/???.fastq.gz ; do

	echo $f

	base=${f%.fastq.gz}
	echo $base

	salmon quant --seqBias --gcBias \
		--index REdiscoverTE \
		--libType A --unmatedReads ${f} \
		--validateMappings \
		-o ${base}.salmon.REdiscoverTE \
		--threads 8

done

Rollup / Aggregate Alignments to RE repName

echo -e "sample\tquant_sf_path" > ${SAMPLE_DIR}/REdiscoverTE.tsv
ls -1 ${SAMPLE_DIR}/*.sample.REdiscoverTE/quant.sf \
		| awk -F/ '{split($8,a,".");print a[1]"\t"$0}' \
		>> ${SAMPLE_DIR}/REdiscoverTE.tsv

REdiscoverTE/rollup.R \
		--metadata=${SAMPLE_DIR}/REdiscoverTE.tsv \
		--datadir=REdiscoverTE/rollup_annotation/ \
		--nozero --threads=64 --assembly=hg38 \
		--outdir=${SAMPLE_DIR}/REdiscoverTE_rollup/

Analyze Results with EdgeR

View TCGA matrix

dat <- readRDS("original/REdiscoverTEdata/inst/Fig4_data/eset_TCGA_TE_intergenic_logCPM.RDS")
head(dat)

ExpressionSet (storageMode: lockedEnvironment)
assayData: 6 features, 7353 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: TCGA-OR-A5KX-01A-11R-A29S-07
    TCGA-EJ-7784-01A-11R-2118-07 ... TCGA-CV-7416-01A-11R-2081-07 (7353
    total)
  varLabels: indication patient_ID ... paired (5 total)
  varMetadata: labelDescription
featureData
  featureNames: 7SK 7SLRNA ... AluJb (6 total)
  fvarLabels: repName repClass repFam
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:  

> as.data.frame(dat)[1:3,1:3]
                                  X7SK    X7SLRNA      ACRO1
TCGA-OR-A5KX-01A-11R-A29S-07 -1.386203 -0.7561825  2.2834249
TCGA-EJ-7784-01A-11R-2118-07 -2.439085 -0.2542860 -0.2175945
TCGA-BW-A5NQ-01A-11R-A27V-07 -3.079350 -1.3472844 -1.5813776

#	I don't think that R data frame column names can begin with a number.


> dim(as.data.frame(dat))
[1] 7353 1209

#	7353 subjects
#	~1209 RE/TE counts

> fData(dat)[1:5,]
            repName  repClass repFam
7SK             7SK       RNA    RNA
7SLRNA       7SLRNA    srpRNA srpRNA
ACRO1         ACRO1 Satellite   acro
ALR/Alpha ALR/Alpha Satellite  centr
Alu             Alu      SINE    Alu

#	Perhaps the rows that begin with numbers should have an X added as well?

References

This software was initially from https://www.nature.com/articles/s41467-019-13035-2

http://research-pub.gene.com/REdiscoverTEpaper/

http://research-pub.gene.com/REdiscoverTEpaper/data/

http://research-pub.gene.com/REdiscoverTEpaper/data/REdiscoverTEdata_README.html

http://research-pub.gene.com/REdiscoverTEpaper/data/REdiscoverTEdata_1.0.1.tar.gz

http://research-pub.gene.com/REdiscoverTEpaper/software/

http://research-pub.gene.com/REdiscoverTEpaper/software/REdiscoverTE_README.html

http://research-pub.gene.com/REdiscoverTEpaper/software/REdiscoverTE_1.0.1.tar.gz

These files were downloaded and retained in the original/ directory. They were untarred to minimize file size.

gzip original/REdiscoverTE/EXPECTED_OUTPUT_FILES/Step_2_salmon_counts/quant.sf

mkdir -p original/REdiscoverTE/rollup_annotation/REdiscoverTE_whole_transcriptome_hg38-20
faSplit sequence original/REdiscoverTE/rollup_annotation/REdiscoverTE_whole_transcriptome_hg38.fa 20 original/REdiscoverTE/rollup_annotation/REdiscoverTE_whole_transcriptome_hg38-20/
gzip original/REdiscoverTE/rollup_annotation/REdiscoverTE_whole_transcriptome_hg38-20/*.fa

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
original		original
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
rollup.R		rollup.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

REdiscoverTE

Create Salmon Index

Align Samples to Salmon Index

Rollup / Aggregate Alignments to RE repName

Analyze Results with EdgeR

View TCGA matrix

References

About

Releases

Packages

Languages

License

luoyufei/REdiscoverTE

Folders and files

Latest commit

History

Repository files navigation

REdiscoverTE

Create Salmon Index

Align Samples to Salmon Index

Rollup / Aggregate Alignments to RE repName

Analyze Results with EdgeR

View TCGA matrix

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages