Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph theoretic fragmentation via graphgen. #86

Merged
merged 46 commits into from
Jan 17, 2025
Merged

Conversation

ShaunWeatherly
Copy link
Contributor

@ShaunWeatherly ShaunWeatherly commented Jan 14, 2025

The idea is rather straightforward:

graphgen generalizes the BE n fragmentation scheme to arbitrary fragment sizes (I've tested BE1-BE9 so far) by using a graph theoretic heuristic. In it, atoms are assigned to nodes in an adjacency graph and edges are weighted by some distance metric. For a given fragment center site, Dijkstra's algorithm is used to find the shortest path from that center to its neighbors. The number of nodes visited on that shortest path determines the degree of separation of the corresponding neighbor. I.e., all atoms whose shortest paths from the center site visit at most 1 node must be direct neighbors (adjacent to the center site), which gives BE2-type fragments; all atoms whose shortest paths visit at most 2 nodes must then be second-order neighbors, hence BE3; and so on. This depends on the NetworkX library.

Major points to note:

  1. Works for arbitrary n in BE n. (Yields near identical fragments to autogen for n=1->3)
  2. Does not rely on heuristic bond lengths or atom identities.
  3. Much simpler code, which I think we can all appreciate.
  4. And most importantly - the distance metric for assigning edge weights in the adjacency graph can easily be changed. For example, I'm currently looking into fragmentation via entanglement (orbital pair mutual info).

EXTRA: now adds FragmentMap data class.
EXTRA2: there are 14 new unittests covering both autogen and graphgen

Copy link
Contributor

@mcocdawc mcocdawc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement! Thank you!

I have a few strongly requested changes and some other comments.

src/quemb/molbe/autofrag.py Outdated Show resolved Hide resolved
src/quemb/molbe/autofrag.py Outdated Show resolved Hide resolved
src/quemb/molbe/autofrag.py Outdated Show resolved Hide resolved
src/quemb/molbe/autofrag.py Outdated Show resolved Hide resolved
src/quemb/molbe/autofrag.py Outdated Show resolved Hide resolved
src/quemb/molbe/autofrag.py Outdated Show resolved Hide resolved
src/quemb/molbe/autofrag.py Outdated Show resolved Hide resolved
src/quemb/molbe/autofrag.py Show resolved Hide resolved
src/quemb/molbe/autofrag.py Outdated Show resolved Hide resolved
src/quemb/molbe/fragment.py Show resolved Hide resolved
@mcocdawc
Copy link
Contributor

oh and the most important comment, I really would add a test, probably something like molbe_octane_test.py just with this fragmentation. this could ensure, that they indeed produce the same results for BE1-3

@ShaunWeatherly
Copy link
Contributor Author

Alright, unittests have been added for both autogen and graphgen, likewise I think I've addressed all of your other comments, but do let me know if there's anything I've missed!

Copy link
Contributor

@mcocdawc mcocdawc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good now.
The only big question mark I still have is about the euclidean_norm function.

A general comment just for the record and potential improvements for the future; we talked about it in person.
I think there is unnecessary casting between lists, tuples and sets.
This graph operation can most likely be completely done with sets and frozenset with only one cast to an ordered container such as list in the end.
In the current implementation there is more book-keeping necessary to avoid duplicates and unnecessary frequent casts.
But the code works, is really well tested, and readable. If you want to keep it like this, then I am also fine.

.ruff.toml Show resolved Hide resolved
src/quemb/molbe/autofrag.py Show resolved Hide resolved
src/quemb/molbe/autofrag.py Outdated Show resolved Hide resolved
src/quemb/molbe/autofrag.py Show resolved Hide resolved
src/quemb/molbe/fragment.py Outdated Show resolved Hide resolved
src/quemb/molbe/lo.py Show resolved Hide resolved
src/quemb/molbe/solver.py Outdated Show resolved Hide resolved
tests/fragmentation_test.py Show resolved Hide resolved
src/quemb/molbe/autofrag.py Show resolved Hide resolved
src/quemb/molbe/autofrag.py Outdated Show resolved Hide resolved
@ShaunWeatherly ShaunWeatherly merged commit 8c410c4 into main Jan 17, 2025
4 checks passed
@ShaunWeatherly ShaunWeatherly deleted the new_graphgen branch January 17, 2025 21:20
@mscho527 mscho527 mentioned this pull request Jan 19, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants