Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues with graphgen #90

Open
4 tasks
mcocdawc opened this issue Jan 19, 2025 · 1 comment · May be fixed by #91
Open
4 tasks

issues with graphgen #90

mcocdawc opened this issue Jan 19, 2025 · 1 comment · May be fixed by #91
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation maintenance things that should make our life easier in the future
Milestone

Comments

@mcocdawc
Copy link
Contributor

mcocdawc commented Jan 19, 2025

While comparing the output of the different fragmentation codes, i.e. "autogen", "graphgen" and my upcoming "chemgen" (#85) for larger molecules I noticed a few problems with "graphgen".

  • Incomplete Initialization
  • Fails at molecule
  • Bad scaling
  • Misleading name in code

Incomplete Initialization

The graphgen fragmentation exposes its result via the fragpart class and looks as if it can be used as a replacement for autogen.
Unfortunately a few components are not initialized although the information should available.
graphgen should either initialize them or make it explicit that it is incompletely initialized compared to autogen.

Let's say the user wanted to visualize the fragments and loops over Frag_atom, then it gives unexpected results for graphgen, because it is still an empty list in the case of graphgen.

graph_frags = fragpart(mol=mol, frag_type='graphgen', be_type='be2')
auto_frags = fragpart(mol=mol, frag_type='autogen', be_type='be2')

for fragment in auto_frags.Frag_atom:
    # visualize fragment
 
# The following loop is empty, because `graph_frags.Frag_atom == []`
for fragment in graph_frags.Frag_atom:
    # visualize fragment

Since this is (?) relevant for the mixed basis code. I added @lweisburn and @mscho527 as well to this issue.

Fails at molecule

Testing graphgen at a large (342 atoms) molecule of the project of @lweisburn and @mscho527 fails with.
I cannot append the xyz file here, but will send it to you @ShaunWeatherly

File ~/code/quemb/src/quemb/molbe/autofrag.py:277, in graphgen(mol, be_type, frozen_core, remove_nonunique_frags, frag_prefix, connectivity, iao_valence_basis)
...
File ~/code/quemb/src/quemb/molbe/autofrag.py:277, in graphgen(mol, be_type, frozen_core, remove_nonunique_frags, frag_prefix, connectivity, iao_valence_basis)
    273 # Update relative center site indices (centerf_idx) and weights
    274 # for center site contributions to the energy (ebe_weights):
    275 for adx, center in enumerate(fragment_map.center):
    276     centerf_idx = tuple(
--> 277         set([fragment_map.fsites[adx].index(cdx) for cdx in center])
    278     )
    279     ebe_weight = (1.0, tuple(centerf_idx))
    280     fragment_map.centerf_idx.append(centerf_idx)

ValueError: tuple.index(x): x not in tuple

To reproduce just call on the xyz file

import pyscf

mol = pyscf.gto.M(path, basis="sto-3g", charge=1)
graph_frag_map = graphgen(mol, be_type = 'be2')

Bad scaling

The aforementioned error happens after 36 s, i.e. the fragmentation is not finished after 36 s.
On the same molecule both "autogen" and "chemgen" need less than a second.
Apparently graphgen has a far worse scaling behaviour than the other algorithms since it is equally fast for small molecules such as octane.
If it is documented as a more or less drop-in-replacement, it should preferably have a similar scaling behaviour.
Or it should be documented, that it is a considerably slower algorithm.

Misleading name and interface

Coming back to our discussion from the review.
The interface and name of euclidean_norm is misleading and should be changed.

  • The arguments are of type Vector and not just arbitrary tensors, (the np.asarray is fully redundant in both cases)
  • np.floating[Any] is unnecessary from my understanding; it can be just np.floating
  • But most importantly it is a metric/distance; calling it a norm is misleading. When one sees code such as euclidean_norm(x, y) they could assume it is the norm of one 2D vector with scalar components x, y while in reality it is the distance between vectors x and y.

All in all

def euclidean_norm(
    i_coord: np.ndarray,
    j_coord: np.ndarray,
) -> np.floating[Any]:
    return norm(np.asarray(i_coord - j_coord))

is better named and typed as

def euclidean_metric(i_coord: Vector], j_coord: Vector) -> np.floating:
    return norm(i_coord - j_coord)
@mcocdawc mcocdawc added bug Something isn't working documentation Improvements or additions to documentation maintenance things that should make our life easier in the future labels Jan 19, 2025
@mcocdawc mcocdawc added this to the v0.1 milestone Jan 19, 2025
@mscho527
Copy link
Member

I haven't taken a closer look at either #85 or #86, so please correct me if I am wrong. Based on the excerpt of conversations I have seen on this issue and elsewhere, here are my 2cts:

  1. There is a scenario where we agree to make it clear in the documentation and the paper that a) graphgen provides an alternative way of automatically generating fragments, b) it excels in some explicitly specified cases, and c) supporting other cases (like the ones @mcocdawc mentioned in this issue) with graphgen will happen later. Here, I think we can just mark this issue as 'will fix later' and move on for now.
  2. Otherwise, if we want to brand it as a drop-in alternative, I think we should definitely fix the 'incomplete initialization' bit and the 'bad scaling' bit. I am okay with graphgen failing for mixed-basis, since this code lives in a separate repository anyway. @lweisburn and I can just make a note there that using autogen is a necessity.

@ShaunWeatherly ShaunWeatherly linked a pull request Jan 20, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation maintenance things that should make our life easier in the future
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants