Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compatibility with LazyArrays #925

Open
tiemvanderdeure opened this issue Feb 7, 2025 · 15 comments
Open

compatibility with LazyArrays #925

tiemvanderdeure opened this issue Feb 7, 2025 · 15 comments

Comments

@tiemvanderdeure
Copy link
Collaborator

It could be nice to make dimensionaldata and LazyArrays.jl work together. It would probably need an extension. Currently everything I've tried with the @d macro returns errors somewhere and I don't really understand how their @~ macro works. But just to demonstrate some potential use cases I've implemented a wrapper to generate a DimArray with a LazyArray parent:

using DimensionalData, LazyArrays, BenchmarkTools

_maybe_reshape(x::AbstractDimArray, dims) = parent(DimensionalData._maybe_insert_length_one_dims(x, dims))
_maybe_reshape(x, _) = x

function LazyDimArray(a)
    dimargs = filter(x -> x isa AbstractDimArray, a.args)
    isempty(dimargs) && error("No dimensional arguments found")
    dest_dims = DimensionalData.combinedims(dimargs...)
    _args = map(x -> _maybe_reshape(x, dest_dims), dimargs)
    data = LazyArray(@~ a.f.(_args...))
    rebuild(first(dimargs); data, dims = dest_dims)
end

Something like this could be a neat way of getting some really big dimarrays with minimal overhead:

da1 = rand(X(1:1000))
da2 = rand(Y(1:1000))

@btime sum(LazyDimArray(@~ $da1 .* $da2))
@btime sum(@d $da1 .* $da2)
A = parent(da1); B = parent(da2)'
@btime sum(LazyArray(@~ $A .* $B))
  105.847 μs (2 allocations: 96 bytes)
  1.399 ms (5 allocations: 7.63 MiB)
  105.815 μs (0 allocations: 0 bytes)
@rafaqz
Copy link
Owner

rafaqz commented Feb 7, 2025

Probably we want to apply this wrapper automatically, so we need to add methods to whatever the constructor of your a object is that rewrap as a DimArray.

So that: (@~ $da1 .* $da2) isa DimArray

(The principle here is that DD aggressively rewraps other constructors to be the outer wrapper wherever it can, without user intervention)

@tiemvanderdeure
Copy link
Collaborator Author

Ideally I think something like LazyArray(@d @~ da1 .* da2) would work. But I think we might need some additional dispatches.

What @~ does is basically it stops the materialize. So @~ A .* B is a Base.Broadcast.Broadcasted

@rafaqz
Copy link
Owner

rafaqz commented Feb 7, 2025

I don't think @d @~ will compose correctly, why not the other way around?

@tiemvanderdeure
Copy link
Collaborator Author

why not the other way around

The other way around doesn't work because the expression @d returns is a let block which @~ cannot do anything with.

Another option is to handle all of this within the @d macro and just add a keyword like lazy.

So @d A .* B lazy=true would just return a DimArray with a LazyArray parent. It's pretty clean syntax and we don't have to make LazyArray return a DimArray, but of course it comes at the cost of adding another keyword, and diverging from LazyArrays syntax.

It'd be very easy to implement, we'd just need to add something like

if lazy 
    broadcast_expr = quote
        bc = Base.Broadcast.instantiate(LazyArrays.lazy.($broadcast_expr))
        rebuild(first(vars); data = LazyArray(bc), dims)
    end
end

@rafaqz
Copy link
Owner

rafaqz commented Feb 10, 2025

It's also a bit weird as we mostly have disk arrays for lazyness, and this laziness may be broken on disk arrays depending how the broadcast styles work out. People will think it means lazy on disk too, but it doesnt

@tiemvanderdeure
Copy link
Collaborator Author

In Rasters, yes. But now that @d makes broadcasting across dims so easy, I think this is also a pretty logical use case.

I agree this could potentially be confusing, though.

@tiemvanderdeure
Copy link
Collaborator Author

Another thing we need for this to work is to move some code from Broadcast.copy to Broadcast.instantiate (like dims comparisons, and unwrapping). Just spent ages debugging because the axes of a lazy dimensional array I made in this way were broken because there is no Broadcast.instantiate(bc::Broadcasted{DimensionalStyle{S}})

@rafaqz
Copy link
Owner

rafaqz commented Feb 10, 2025

And YAX...

We can totally integrate with LazyArrays, but there is no way lazy will be a built in keyword for the d macro that means "LazyArrays".

@asinghvi17
Copy link
Collaborator

Maybe materialize=false, if it's just a naming issue?

@rafaqz
Copy link
Owner

rafaqz commented Feb 10, 2025

Name is one thing, but also putting extensions into default keywords is kinda weird.

materialize also makes no sense if the array parent is already a disk array. Then materialise=false would break the already nonmatrializing behaviour.

@rafaqz
Copy link
Owner

rafaqz commented Feb 10, 2025

Is there a reason not to do modify(LazyArray, A) on you DimArrays? Or DimArray(LazyArray(A), dims) ?

Won't the broadcasts end up lazy?

@tiemvanderdeure
Copy link
Collaborator Author

No unfortunately not, because we also need to functionality of @~, which prevents the broadcast from materializing. I still haven't figured out what the best way is to get both @~ and @d.

Once we have that the wrapping of a LazyArray in a DimArray is the easy bit - which we can do with a dispatch in an extension or just have people do manually.

@rafaqz
Copy link
Owner

rafaqz commented Feb 10, 2025

Hm... I'm just wondering why LazyArray wrappers don't do that with Broadcast styles. That's what DiskArrays does and its a lot more elegant and composable than needing a special macro. Macros compose badly and always lead to these problems.

(e.g. we only have the @d macro because it actually changes the result of the broadcast. LazyArrays doesn't need a macro except for convenience)

@tiemvanderdeure
Copy link
Collaborator Author

Yeah it's quite annoying. The design choice is that you can't just wrap an Array in LazyArray to make any operation on it automatically (like with DiskArrays). Instead you have to construct a LazyArray is a result of a broadcast operation. It's not something we're going to be able to change, I guess, but it makes life a lot harder for us.

The way to do this without the macro is to go LazyArrays.lazy.(A .* B) (which is what the macro does), and this almost works with @d, except for that @d wraps the result of A .* B in a _maybe_dimensional_broadcast as well, and so it again materializes.

Maybe this is something we can change? Is there a reason that in @d f.(g.(A,B)), the result of g.(A,B) should be wrapped in _maybe_dimensional_broadcast, if both A and B are anyway?

Otherwise I suppose the conclusion is that this needs thought/isn't possible right now, I can use my bandaid solution for my own use case, and we now have this issue as documentation for anyone who's trying the same thing.

@rafaqz
Copy link
Owner

rafaqz commented Feb 11, 2025

Ok, yeah that's not great design, I would make an issue there too. It should work like DiskArrays, whicu is really composable even though it does so much more than LazyArrays

We could limit how much _maybe_dimensional_broadcast is applied by passing out a flag for if we found a broadcast lower down. Or maybe it can let more things through without wrapping. It will just take some care

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants