Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try DeepVariant as replacement for freebayes #321

Open
6 of 12 tasks
marc-sturm opened this issue Dec 19, 2024 · 1 comment
Open
6 of 12 tasks

Try DeepVariant as replacement for freebayes #321

marc-sturm opened this issue Dec 19, 2024 · 1 comment
Assignees

Comments

@marc-sturm
Copy link
Member

marc-sturm commented Dec 19, 2024

  • try DeepVariant
    • benchmark performance with NA12878x2_93
    • benchmark runtime (1, 5, 10 threads, maybe on GPU)
      • test -num_shards option of DeepVariant and check CPU usage
      • test deepvariant-gpu
      • benchmark runtime and performance for gpu version
    • check if gVCF can be created and used to perform multi-sample analysis
      • gVCF can be created

        • gVCF merging with GLnexus (https://github.com/dnanexus-rnd/GLnexus) recommended

            /glnexus_cli --config DeepVariant multisample_test/*.gvcf | /mnt/storage2/megSAP/tools/bcftools-1.20/bcftools view | bgzip -@ 4 -c > dv_merged_trio.vcf.gz
          
        • gVCF merging of DeepVariant output gVCFs with gatk not compatible

            The list of input alleles must contain <NON_REF> as an allele but that is not the case within DeepVariant output gVCFs
          
      • check if dragen gVCFs can be merged with GLnexus

      • validate_NA12878.php on child VCF before and after merging

      • compare Mendelian error rate of merged deepvar gVCFs with vc_freebayes trio output

      • compare performance (validate_NA12878.php) of freebayes, dragen and deepvar on child

      • check if it can be used to perform multi-sample analysis

@KilianIlius
Copy link
Collaborator

KilianIlius commented Jan 9, 2025

Runtime/memory (variant calling on NA12878x2_93 WGS):

Algorithm threads runtime RAM
DeepVariant CPU 10 07:32:48 33.956G
DeepVariant CPU 5 21:50:02 14.197G
DeepVariant CPU 1 61:45:49 18.904G
Freebayes 15 00:50:44 n/a

Validation Performance

Freebayes vs Deepvariant on NA12878x2_93

Name Options Date Average Depth Expected SNVs Expected Indels SNV Sensitivity SNV PPV SNV F1 SNV Genotyping Accuracy Indel Sensitivity Indel PPV Indel F1 Indel Genotyping Accuracy All Sensitivity All PPV All F1 All Genotyping Accuracy
Freebayes NA12878x2_93_var 2025-01-10 18:13:04 40.98 3,256,373 495,309 0.9934 0.9944 0.9939 0.9993 0.9754 0.9938 0.9845 0.9847 0.9911 0.9927 0.9943 0.9974
DeepVariant vc_deep_NA12878x2_93_t10 2025-01-09 00:47:46 40.98 3,256,373 495,309 0.9941 0.9993 0.9967 0.9997 0.9910 0.9970 0.9940 0.9986 0.9937 0.9963 0.9990 0.9996

Freebayes vs Deepvariant vs merged gVCFs of NA12878x2_80/NA12891_14/NA12892_18 trio

Name Date Avg Depth Expected SNVs Expected Indels SNV Sens. SNV PPV SNV F1 SNV Genotyping Acc. Indel Sens. Indel PPV Indel F1 Indel Genotyping Acc. All Sens. All PPV All F1 All Genotyping Acc.
glnexus_merged_child_only_no_wt 2025-01-28 16:55:20 113.07 36057 2260 0.9852 0.9978 0.9914 0.9981 0.8805 0.8908 0.8856 0.9950 0.9790 0.9852 0.9915 0.9979
NA12878x2_80_deepvar_var 2025-01-17 20:50:25 113.07 36057 2260 0.9859 0.9984 0.9921 0.9992 0.9695 0.9874 0.9783 0.9963 0.9849 0.9913 0.9978 0.9990
NA12878x2_80_freebayes_var 2023-08-14 15:37:28 113.07 36057 2260 0.9842 0.9832 0.9837 0.9972 0.9593 0.9377 0.9484 0.9673 0.9828 0.9816 0.9805 0.9954
NA12878x2_80_dragen 2023-08-16 09:06:47 112.82 36057 2260 0.9859 0.9863 0.9861 0.9976 0.9221 0.9337 0.9279 0.9933 0.9821 0.9827 0.9832 0.9974

Mendelian Error Rate Comparison (Gonosomes where excluded)

Method Child/Parent1/Parent2 Mendelian Error Rate Total Variants Mendelian Errors
GLnexus merged gVCFs (DeepVariant) NA12878x2_80 / NA12891_14 / NA12892_18 6.31% 190,410 12,018
vc_freebayes (all.vcf.gz) NA12878x2_80 / NA12891_14 / NA12892_18 1.73% 201,148 3,481

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants