-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MBias plots #473
Comments
If you see dramatic biases the M-bias plot they are typically indicative of either technical issues or a consequence of the type of library preparation and/or procedure. As long as these methylation values do not reflect true methylation values they introduce spurios methylation calls, and thus introduce noise. Arguably, if you are looking for very strong effects you might get away with a bit more noise in the system, but ideally you would want to start your downstream analysis with as clean data as possible (that is at least my opinion). Sometimes you may end up with fairly easy-to-remedy technical artefacts, such as end repair fill-in biases (https://sequencing.qcfail.com/articles/library-end-repair-reaction-introduces-methylation-biases-in-paired-end-pe-bisulfite-seq-applications/), which can be simply be corrected by using Some other techniques or kits used introduce biases, e.g. PBAT, single-cell applications, Zymoe Pico-methyl, Accel Swift to name just a few, introduce their own biases (see e.g. here: https://sequencing.qcfail.com/articles/mispriming-in-pbat-libraries-causes-methylation-bias-and-poor-mapping-efficiencies/ or here for recommendations for trimming: https://github.com/FelixKrueger/Bismark/tree/master/Docs#ix-notes-about-different-library-types-and-commercial-kits). In your specific case, Read 1 looks like one you would hope to get (assuming this is a plant species?). Read 2 certainly has a somewhat spiky methylation pattern over the first 8-10bp (?) which quite clearly is much lower than for the rest of the read. Whether you want to hard-clip the reads (e.g. with Trim Galore I would be somewhat more alarmed by the fact that your Read 1 methylation are around 30/15/3 % in CpG/CHG/CHH context, and 45/25/10% for Read 2. Arguably that difference is much bigger than the biases observed at the 5' end of Read 2. The easiest explanation for this would be that the reads do not belong to the same sample - which would be great. If they are from the same sample, you would be in the awkward position to decide how to proceed - do you want to just use R1, or just R2, or simply use both and see what you get? You could also go back to the sequencing facility to see if something appeared weird, check which kind of sequencer your data was on (overcalling of Gs for Read 2?) etc. But that is kind of yet another question... |
Thanks, @FelixKrueger for your response. I guess in my case I will proceed with the analysis with only R1. I understand I lose on the coverage. These sequences were done a long time ago, so tracking down where the problem was in the library is a bit tricky. Yes, these are sequences from a plant species. |
Duplicate post (see #673). |
Hey, @FelixKrueger I have been having a challenge understanding why R2 read from Illumina library has biases on methylation calls and how to correct them. I understand you can ignore a few bases 5' or 3' but in cases
. This read quality was okay (if the quality was poor then, that could be a possible reason). I don't really know how to correct it. Is it even relevant to correct these graphs to at least look like
- (Read 1) or just ignore and carry on with the downstream analysis. What would be the impact on the downstream analysis?
I have looked for literature explaining the cause of these biases but I have found none. Please comment.
The text was updated successfully, but these errors were encountered: