Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the best practice to build 3c2e tensors? #300

Open
John-zzh opened this issue Jan 7, 2025 · 4 comments
Open

What's the best practice to build 3c2e tensors? #300

John-zzh opened this issue Jan 7, 2025 · 4 comments

Comments

@John-zzh
Copy link

John-zzh commented Jan 7, 2025

Hi developers,

I'm trying to implement TDDFT-ris with density fitting. In general, I need T_pqP,
which is contraced through

" Puv, up,vq - >   Ppq" , eri3c, coeff_p, coeff_q

coeff_p, coeff_q referes to either occupied or virtual blocks of coefficeint matrix, C[:,:nocc], C[:,nocc:]. (could be both)
P being the auxbf dimension, smallest dimension here (compared to u v p q)

How to implement it on GPU?

CPU code is relatively easy, I can build full eri3c given enough memory. Or slice it along the P dimension with limied memory.
I belive GPU memory is more valuable, so I guess there must be some memory-efficeint way to code up this procesure?

@wxj6000
Copy link
Collaborator

wxj6000 commented Jan 7, 2025

Hi Zehao,
Yes, GPU memory is quite limited. In GPU4PySCF, CDERI is stored in a sparse format, and is automatically transferred to CPU memory. When it is needed, the sparse CDERI is unpacked into CDERI slices for usage.

Here is an example of performing Lji,jk->Lki.
https://github.com/pyscf/gpu4pyscf/blob/master/gpu4pyscf/df/df_jk.py#L293

@John-zzh
Copy link
Author

Thx!

I figured out. And certained it can be faster and more memory efficient. For example, how to control the slicing behavior?

`def compute_Ppq_on_gpu(mol, auxmol, C_p, C_q, aosym=True, omega=None):
"""
(3c2e_{Puv}, C_{up}, C_{vq} -> Ppq)。

Parameters:
    mol: pyscf.gto.Mole
    auxmol: pyscf.gto.Mole
    C_p: cupy.ndarray (nao, p)
    C_q: cupy.ndarray  (nao, q)

Returns:
    Ppq: cupy.ndarray (naux, nao, nao)
"""

intopt = VHFOpt(mol, auxmol, 'int2e')
intopt.build(aosym=aosym, group_size=BLKSIZE, group_size_aux=BLKSIZE)

nao = mol.nao
naux = auxmol.nao
# print('nao', nao)
# print('naux', naux)
# print('auxmol.nbas', auxmol.nbas) 

siz_p = C_p.shape[1]
siz_q = C_q.shape[1]

Ppq = cp.zeros((naux, siz_p, siz_q), dtype=cp.float32)

for cp_kl_id, _ in enumerate(intopt.aux_log_qs):
    k0, k1 = intopt.aux_ao_loc[cp_kl_id], intopt.aux_ao_loc[cp_kl_id+1]

    int3c_slice = cp.zeros((k1 - k0, nao, nao), dtype=cp.float32, order='C')
    # print('int3c_slice.shape', int3c_slice.shape)

    for cp_ij_id, _ in enumerate(intopt.log_qs):
        # print('cp_ij_id', cp_ij_id)
        cpi = intopt.cp_idx[cp_ij_id]
        cpj = intopt.cp_jdx[cp_ij_id]
        li = intopt.angular[cpi]
        lj = intopt.angular[cpj]
     
        int3c_nlk_cart = get_int3c2e_slice(intopt, cp_ij_id, cp_kl_id, omega=omega)
        if not mol.cart:
            int3c_nlk_cart = cart2sph(int3c_nlk_cart, axis=1, ang=lj)
            int3c_nlk_cart = cart2sph(int3c_nlk_cart, axis=2, ang=li)

        i0, i1 = intopt.ao_loc[cpi], intopt.ao_loc[cpi+1]
        j0, j1 = intopt.ao_loc[cpj], intopt.ao_loc[cpj+1]  
        
        int3c_slice[:,j0:j1, i0:i1] = int3c_nlk_cart

    if aosym:
        row, col = cp.tril_indices(nao)
        int3c_slice[:, row, col] = int3c_slice[:, col, row]


    unsorted_ao_index = cp.argsort(intopt._ao_idx)
    int3c_slice = int3c_slice[:, unsorted_ao_index, :]
    int3c_slice = int3c_slice[:, :, unsorted_ao_index]
    print('check int3c_slice symmetry', cp.allclose(int3c_slice, int3c_slice.transpose((0, 2, 1))))

    tmp = cp.einsum('Puv,up->Ppv', int3c_slice, C_p)
    Ppq[k0:k1,:,:] = cp.einsum('Ppv,vq->Ppq', tmp, C_q)

unsorted_aux_ao_index = cp.argsort(intopt._aux_ao_idx)
Ppq = Ppq[unsorted_aux_ao_index,:,:]


# eri_3c2e = get_int3c2e(mol, auxmol)
# tmp = cp.einsum('Puv,up->Ppv', eri_3c2e, C_p)
# Ppq = cp.einsum('Ppv,vq->Ppq', tmp, C_q)

return Ppq`

@wxj6000
Copy link
Collaborator

wxj6000 commented Jan 17, 2025

Are you saying the control of block size? You can change group_size and group_size_aux in the following function.

intopt.build(aosym=aosym, group_size=BLKSIZE, group_size_aux=BLKSIZE)

They control the maximum size of int3c in the AO direction and AUX_AO direction.

@John-zzh
Copy link
Author

John-zzh commented Jan 17, 2025

thx, that helps!
My Wechat :-)
jojozzh00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants