Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is there a variant of load! that accumulates? #252

Closed
haakon-e opened this issue Mar 12, 2024 · 2 comments
Closed

is there a variant of load! that accumulates? #252

haakon-e opened this issue Mar 12, 2024 · 2 comments

Comments

@haakon-e
Copy link
Contributor

I was curious if there exists (or is even possible to implement) a variant of load! that accumulates the array instead of (over)writing?

e.g. if x=ones(3,3) and ds["var"][:, :] == ones(3,3), and you do load_acc!(variable(ds, "var"), x, :, :), you'd get x .== 2. I'm able to get around this by first writing data to a buffer, then accumulating x, but was just curious if there's a direct way of doing this... Could we quite useful for quickly accumulating statistics.

My current code looks something like this:

function load_accumulate!(file, data, var, buf = similar(data))
    NCDataset(file) do ds
        NCDatasets.load!(variable(ds, var), buf, :, :)
        data .+= buf
    end
end

# and is useful for operations like this:
function data_mean(files, data, var)
    buf = similar(data)
    for file in files
        load_accumulate!(file, data, var, buf)
    end
    data ./= length(files)
end

... Thinking about this a bit more, I suppose in principle that what I'm suggesting above can be generalized to handle any type of metric, like

function load_accumulate!(file, data, var, func::Function, buf = similar(data))
    NCDataset(file) do ds
        NCDatasets.load!(variable(ds, var), buf, :, :)
        data .+= func(buf)
    end
end

# e.g.:
load!("data.nc", data, "T", x -> x .^2)

but I don't actually know if any of that is possible to do without (secretly?) allocating a buffer. So maybe my local implementation is the way to go?

@Alexander-Barth
Copy link
Member

I don't think that this is possible without allocating an additional buffer as nc_get_var overwrite the buffet it gets.

For your information, there is some groupby + aggregation function defined here https://juliageo.org/CommonDataModel.jl/stable/tutorial1/#Grouping-and-reducing

@haakon-e
Copy link
Contributor Author

Thank you! I am doing the looping because I aggregate data from many different files, and I've found the multifile-reading to be quite slow in some instances (but perhaps I'll try to file a separate issue on that if I can).

I'll try experimenting more with groupby+agg, which seems fast so far!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants