Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Restraints API #1043

Open
wants to merge 43 commits into
base: main
Choose a base branch
from
Open

[WIP] Restraints API #1043

wants to merge 43 commits into from

Conversation

IAlibay
Copy link
Member

@IAlibay IAlibay commented Dec 9, 2024

Checklist

  • Added a news entry

Developers certificate of origin

Copy link

github-actions bot commented Dec 9, 2024

🚨 API breaking changes detected! 🚨

Copy link

codecov bot commented Dec 9, 2024

Codecov Report

Attention: Patch coverage is 46.84211% with 404 lines in your changes missing coverage. Please review.

Project coverage is 89.52%. Comparing base (915d110) to head (e166c70).

Files with missing lines Patch % Lines
openfe/protocols/restraint_utils/geometry/utils.py 26.49% 111 Missing ⚠️
...protocols/restraint_utils/geometry/boresch/host.py 18.34% 89 Missing ⚠️
...protocols/restraint_utils/openmm/omm_restraints.py 39.09% 81 Missing ⚠️
...rotocols/restraint_utils/geometry/boresch/guest.py 14.92% 57 Missing ⚠️
...ocols/restraint_utils/geometry/boresch/geometry.py 38.70% 38 Missing ⚠️
...e/protocols/restraint_utils/geometry/flatbottom.py 50.00% 18 Missing ⚠️
...nfe/protocols/restraint_utils/geometry/harmonic.py 50.00% 8 Missing ⚠️
openfe/protocols/restraint_utils/settings.py 96.29% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1043      +/-   ##
==========================================
- Coverage   94.46%   89.52%   -4.95%     
==========================================
  Files         135      152      +17     
  Lines       10090    10849     +759     
==========================================
+ Hits         9532     9713     +181     
- Misses        558     1136     +578     
Flag Coverage Δ
fast-tests 89.52% <46.84%> (?)
slow-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Dec 9, 2024

🚨 API breaking changes detected! 🚨

Copy link

github-actions bot commented Dec 9, 2024

🚨 API breaking changes detected! 🚨

Copy link

🚨 API breaking changes detected! 🚨

Copy link

🚨 API breaking changes detected! 🚨

Copy link

🚨 API breaking changes detected! 🚨

Copy link

🚨 API breaking changes detected! 🚨

Copy link

🚨 API breaking changes detected! 🚨

Copy link

🚨 API breaking changes detected! 🚨

Copy link

🚨 API breaking changes detected! 🚨

Copy link

🚨 API breaking changes detected! 🚨

h1_eval = EvaluateHostAtoms1(
g1g2h0_atoms,
host_atom_pool,
minimum_distance,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's double check this is indeed 0.5 nm (or higher)

raise ValueError(errmsg)

# Set the equilibrium values as those of the final frame
u.trajectory[-1]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raise an issue / todo here to experiment on what frame / values to pick - ideally some kind of mean with the frame closest to the mean values.

@hannahbaumann hannahbaumann self-requested a review December 19, 2024 09:23
Copy link
Contributor

@hannahbaumann hannahbaumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @IAlibay , just a few small comments from a first quick pass. I left out the boresch.py and the utils script for now as discussed. Will do another pass after the break!

@hannahbaumann hannahbaumann self-assigned this Jan 14, 2025
Copy link

No API break detected ✅

if len(atom_pool) < 3:
ring_atoms_only = False
heavy_atoms = get_heavy_atom_idxs(rdmol)
atom_pool = set(heavy_atoms[rmsf[heavy_atoms] < rmsf_cutoff])
Copy link
Contributor

@hannahbaumann hannahbaumann Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently failing. Something like this could work:

Suggested change
atom_pool = set(heavy_atoms[rmsf[heavy_atoms] < rmsf_cutoff])
atom_pool = set([i for (i, v) in zip(heavy_atoms, rmsf[heavy_atoms] < rmsf_cutoff) if v])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, there's something else odd in this code atom_pool is set[tuple[int]] which is nonsensical..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made heavy_atoms an ndarray, a bit more expensive but less code.


structures = [] # container for all contiguous secondary structure units
structure_residue_counts = {'H': 0, 'E': 0, '-': 0}
for frag in protein_ag.fragments:
Copy link
Contributor

@hannahbaumann hannahbaumann Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the structure has a small structure part that is not connected to the rest of the protein (e.g. a loop that is only partially modeled or a small protein that interacts with the main one like in jnk1) the DSSP calculation will fail and likely non suitable atoms (e.g. atom index 0) are returned.
Could one maybe either add a check here that if the len(frag.residues) < (trim_chain_start + trim_chain_end) then it would skip that fragment? Or to implement something that if the DSSP fails that it would at least not pick atoms right at the start of the protein?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should skip on < trim_chain_start + trim_chain_end + min_structure_size.

Implement something that if the DSSP fails that it would at least not pick atoms at the start of the protein

I think I have an idea for this, it'll mean effectively implementing a backup selection method.


for _, group_iter in groupby(dssp_results, lambda x: x[0]):
group = list(group_iter)
if len(group) > min_structure_size:
Copy link
Contributor

@hannahbaumann hannahbaumann Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe >= as suggested by the name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filter currently leads to failures for galectin and hiv integrase from the industry benchmarking systems. Setting this to >= finds atoms successfully.
Potentially loosen the filters to min_structure_size = 6 and/or trim_structure_ends=2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the system mcs_docking_set/hne (with the defaults 8 and 3) this finds two residues in a beta sheet, but it later is not able to pick suitable atoms from that selection.

Comment on lines +487 to +488
min_structure_size: int = 8,
trim_structure_ends: int = 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially change these to be less strict? (e.g. 6 and 2?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do! We should also make this user-controllable. It'll make for a giant settings object, but this is the kind of thing you'll need "per system" flexibility.

# then we allow picking from beta-sheets too.
allowed_structures = ['H']
if structure_residue_counts['H'] < structure_residue_counts['E']:
allowed_structures.append('E')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a check on allowed_structures to check whether those have a structure_residue_counts that is not zero.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand the need for this here.

Possibly explaining my current view on this:

  1. If 'H' is zero and 'E' is non-zero, then you can pick 'E' residues so that's good.
  2. If 'H' is nonzero and 'E' is zero, then this is the intended behaviour of only picking from 'H'.
  3. If 'H' is zero and 'E' is zero, then you don't have anything to pick from, so the intended behaviour is you have an empty selection.

The only case I could think of here is if we want to alter the approach and set a minimum number limit to 'H' residues where if there are too few residues we should also add 'E' "just in case".

Alterrnatively we could just pick from 'H' and 'E' - I suspect in mosts cases it'll be sufficiently stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants