Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sensitiviy SCOP benchmark question #8

Open
bioinsilico opened this issue Jan 23, 2025 · 0 comments
Open

Sensitiviy SCOP benchmark question #8

bioinsilico opened this issue Jan 23, 2025 · 0 comments

Comments

@bioinsilico
Copy link

bioinsilico commented Jan 23, 2025

I have a question on how sensitivity is calculated in the SCOPe benchmark. In particular, it is on the total number of positives for the different cases: family, super-family, and fold.

In the evaluation script, I understand that famCnt[id2fam[i]] - 1 removes the self-comparison.

famVal=foundFam[i]/(famCnt[id2fam[i]] - 1);

However, in the super-family and fold cases, subtracting (famCnt[id2fam[i]] - 1) includes the self-comparison in the total count.

sfamVal=foundSFam[i]/(sfamCnt[id2sfam[i]] - (famCnt[id2fam[i]] - 1));

For example, given the SCOPe domain d1eu1a1 b.52.2.2, there are 9 other domains with the same class (including itself)

grep b.52.2.2 scop.tsv 
d1g8ka1	b.52.2.2
d1eu1a1	b.52.2.2
d2iv2x1	b.52.2.2
d1kqfa1	b.52.2.2
d2fug31	b.52.2.2
d2jioa1	b.52.2.2
d1ogya1	b.52.2.2
d1y5ia1	b.52.2.2
d1ti6a1	b.52.2.2

grep b.52.2.2 scop.tsv | wc -l
       9

then, famCnt['b.52.2.2'] - 1 = 8

If we now consider the super-family case, i.e., domains in the same super-family but not the same family we found 7 domains

grep b.52.2 scop.tsv | grep -v b.52.2.2                  
d1ppya_	b.52.2.1
d1e32a1	b.52.2.3
d1cr5a1	b.52.2.3
d1qcsa1	b.52.2.3
d1cz4a1	b.52.2.3
d1wlfa2	b.52.2.3
d3ouga_	b.52.2.0

grep b.52.2 scop.tsv | grep -v b.52.2.2 | wc -l
       7

the total number of domains for the super-family b.52.2 is 16

grep b.52.2 scop.tsv | wc -l 16

and thus,

(sfamCnt['b.52.2'] - (famCnt['b.52.2.2'] - 1)) = 8

I think the discrepancy comes from subtracting (famCnt['b.52.2.2'] - 1)) that should be only famCnt['b.52.2.2'] since self hit is already excluded in sfamCnt['b.52.2'].

This results in none of the methods evaluated having queries with 100% sensitivity for super-family and fold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant