-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpreting log output #11
Comments
For both the average and the max statistics, they are calculated using the number of unique UMIs at each alignment position. The number of unique UMIs is counted by identity (no error tolerance). This differs slightly from the number of grouped/collapsed UMIs at each alignment position, because grouping involves clustering UMIs that may have errors. After counting the unique UMIs, error-tolerant collapsing is performed. The reason for these statistics is that it helps identify whether error-tolerant grouping/collapsing could be the bottleneck in terms of speed. |
I see, so the number of reads after deduplicating is before or after grouping/collapsing? |
Deduplicating typically means the whole process. There's two steps: 1. find unique UMIs 2. group the unique UMIs in an error-tolerant way. Collapsing is used sometimes because only one UMI from each group is kept (the group is collapsed). Sorry, I'm not very clear when I use these terms. |
Dear Daniel Lu,
Does maximum number of UMIs over all alignment positions mean: the maxium number of UMIs recovered at a given alignment position?
Done reading input file into memory!
Number of input reads 8779688
Number of removed unmapped reads 8746016
Number of unremoved reads 33672
Number of unique alignment positions 266
Average number of UMIs per alignment position 126.28195488721805
Max number of UMIs over all alignment positions 5466
Number of reads after deduplicating 32818
The text was updated successfully, but these errors were encountered: