-
Hey there! I recently began using Minimal Working Example (MWE)Consider a matrix import numpy as np
import xarray as xr
import numba
A = np.random.rand(20, 5) # 20 samples, 5 features
b = np.random.rand(20, 2) # 20 samples, (fixed) extra dimension
@numba.njit(fastmath=True, parallel=True)
def foo_nb(A, b, n_out: int = 3):
n_samples, n_features = A.shape
res1 = np.empty((n_samples, n_features, n_out))
res2 = np.empty((n_samples, n_out))
res3 = np.empty((n_samples,))
for i in range(n_samples):
# here numba arithmetics happen
# note that actually X will have a dimensionalty of (n_samples_reduced, n_features)
# with n_samples_reduced ~ 20% n_samples
X = A * np.sum(b[i] ** 2)
# ...
U, s, VT = np.linalg.svd(X)
res1[i] = VT[:n_out].T
res2[i] = s[:n_out]
res3[i] = np.sum(s)
return res1, res2, res3
foo_nb(A, b, n_out=3) This works flawlessly! However, when trying to adapt this code to work with A = xr.DataArray(A, dims=["sample", "feature"])
b = xr.DataArray(b, dims=["sample", "extra_dim"])
# Attempt to parallelize over samples, so -> core dimensions
xr.apply_ufunc(
foo_nb,
A,
b,
input_core_dims=[["sample"], ["sample"]],
output_core_dims=[["sample"], ["sample"], ["sample"]],
# dask="parallelized",
) Questions
For context, in some real-world scenarios, I anticipate handling datasets ranging from thousands to hundreds of thousands of samples. Additionally, the |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Look here: https://tutorial.xarray.dev/advanced/apply_ufunc/apply_ufunc.html and let us know how it goes. If you see opportunities to improve that material, PRs are very welcome! |
Beta Was this translation helpful? Give feedback.
Look here: https://tutorial.xarray.dev/advanced/apply_ufunc/apply_ufunc.html and let us know how it goes. If you see opportunities to improve that material, PRs are very welcome!