Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreting log output #11

Open
abracarambar opened this issue Jun 26, 2021 · 3 comments
Open

Interpreting log output #11

abracarambar opened this issue Jun 26, 2021 · 3 comments

Comments

@abracarambar
Copy link

Dear Daniel Lu,
Does maximum number of UMIs over all alignment positions mean: the maxium number of UMIs recovered at a given alignment position?

Done reading input file into memory!
Number of input reads 8779688
Number of removed unmapped reads 8746016
Number of unremoved reads 33672
Number of unique alignment positions 266
Average number of UMIs per alignment position 126.28195488721805
Max number of UMIs over all alignment positions 5466
Number of reads after deduplicating 32818

@Daniel-Liu-c0deb0t
Copy link
Owner

Daniel-Liu-c0deb0t commented Jun 26, 2021

For both the average and the max statistics, they are calculated using the number of unique UMIs at each alignment position. The number of unique UMIs is counted by identity (no error tolerance). This differs slightly from the number of grouped/collapsed UMIs at each alignment position, because grouping involves clustering UMIs that may have errors. After counting the unique UMIs, error-tolerant collapsing is performed.

The reason for these statistics is that it helps identify whether error-tolerant grouping/collapsing could be the bottleneck in terms of speed.

@abracarambar
Copy link
Author

abracarambar commented Jun 26, 2021

I see, so the number of reads after deduplicating is before or after grouping/collapsing?

@Daniel-Liu-c0deb0t
Copy link
Owner

Deduplicating typically means the whole process. There's two steps: 1. find unique UMIs 2. group the unique UMIs in an error-tolerant way. Collapsing is used sometimes because only one UMI from each group is kept (the group is collapsed). Sorry, I'm not very clear when I use these terms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants