-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Restraints API #1043
base: main
Are you sure you want to change the base?
[WIP] Restraints API #1043
Conversation
🚨 API breaking changes detected! 🚨 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1043 +/- ##
==========================================
- Coverage 94.46% 89.52% -4.95%
==========================================
Files 135 152 +17
Lines 10090 10849 +759
==========================================
+ Hits 9532 9713 +181
- Misses 558 1136 +578
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
🚨 API breaking changes detected! 🚨 |
🚨 API breaking changes detected! 🚨 |
🚨 API breaking changes detected! 🚨 |
🚨 API breaking changes detected! 🚨 |
🚨 API breaking changes detected! 🚨 |
🚨 API breaking changes detected! 🚨 |
🚨 API breaking changes detected! 🚨 |
🚨 API breaking changes detected! 🚨 |
🚨 API breaking changes detected! 🚨 |
🚨 API breaking changes detected! 🚨 |
h1_eval = EvaluateHostAtoms1( | ||
g1g2h0_atoms, | ||
host_atom_pool, | ||
minimum_distance, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's double check this is indeed 0.5 nm (or higher)
raise ValueError(errmsg) | ||
|
||
# Set the equilibrium values as those of the final frame | ||
u.trajectory[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raise an issue / todo here to experiment on what frame / values to pick - ideally some kind of mean with the frame closest to the mean values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @IAlibay , just a few small comments from a first quick pass. I left out the boresch.py
and the utils
script for now as discussed. Will do another pass after the break!
No API break detected ✅ |
if len(atom_pool) < 3: | ||
ring_atoms_only = False | ||
heavy_atoms = get_heavy_atom_idxs(rdmol) | ||
atom_pool = set(heavy_atoms[rmsf[heavy_atoms] < rmsf_cutoff]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently failing. Something like this could work:
atom_pool = set(heavy_atoms[rmsf[heavy_atoms] < rmsf_cutoff]) | |
atom_pool = set([i for (i, v) in zip(heavy_atoms, rmsf[heavy_atoms] < rmsf_cutoff) if v]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, there's something else odd in this code atom_pool
is set[tuple[int]]
which is nonsensical..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just made heavy_atoms an ndarray, a bit more expensive but less code.
|
||
structures = [] # container for all contiguous secondary structure units | ||
structure_residue_counts = {'H': 0, 'E': 0, '-': 0} | ||
for frag in protein_ag.fragments: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the structure has a small structure part that is not connected to the rest of the protein (e.g. a loop that is only partially modeled or a small protein that interacts with the main one like in jnk1) the DSSP calculation will fail and likely non suitable atoms (e.g. atom index 0) are returned.
Could one maybe either add a check here that if the len(frag.residues) < (trim_chain_start + trim_chain_end)
then it would skip that fragment? Or to implement something that if the DSSP fails that it would at least not pick atoms right at the start of the protein?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we should skip on < trim_chain_start + trim_chain_end + min_structure_size.
Implement something that if the DSSP fails that it would at least not pick atoms at the start of the protein
I think I have an idea for this, it'll mean effectively implementing a backup selection method.
|
||
for _, group_iter in groupby(dssp_results, lambda x: x[0]): | ||
group = list(group_iter) | ||
if len(group) > min_structure_size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe >=
as suggested by the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This filter currently leads to failures for galectin and hiv integrase from the industry benchmarking systems. Setting this to >=
finds atoms successfully.
Potentially loosen the filters to min_structure_size = 6
and/or trim_structure_ends=2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the system mcs_docking_set/hne
(with the defaults 8 and 3) this finds two residues in a beta sheet, but it later is not able to pick suitable atoms from that selection.
min_structure_size: int = 8, | ||
trim_structure_ends: int = 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially change these to be less strict? (e.g. 6 and 2?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do! We should also make this user-controllable. It'll make for a giant settings object, but this is the kind of thing you'll need "per system" flexibility.
# then we allow picking from beta-sheets too. | ||
allowed_structures = ['H'] | ||
if structure_residue_counts['H'] < structure_residue_counts['E']: | ||
allowed_structures.append('E') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a check on allowed_structures
to check whether those have a structure_residue_counts
that is not zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully understand the need for this here.
Possibly explaining my current view on this:
- If 'H' is zero and 'E' is non-zero, then you can pick 'E' residues so that's good.
- If 'H' is nonzero and 'E' is zero, then this is the intended behaviour of only picking from 'H'.
- If 'H' is zero and 'E' is zero, then you don't have anything to pick from, so the intended behaviour is you have an empty selection.
The only case I could think of here is if we want to alter the approach and set a minimum number limit to 'H' residues where if there are too few residues we should also add 'E' "just in case".
Alterrnatively we could just pick from 'H' and 'E' - I suspect in mosts cases it'll be sufficiently stable.
Checklist
news
entryDevelopers certificate of origin