Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Multiple render target (MRT) #2930

Open
11 tasks
Zyko0 opened this issue Mar 16, 2024 · 20 comments
Open
11 tasks

Support Multiple render target (MRT) #2930

Zyko0 opened this issue Mar 16, 2024 · 20 comments

Comments

@Zyko0
Copy link
Contributor

Zyko0 commented Mar 16, 2024

Operating System

  • Windows
  • macOS
  • Linux
  • FreeBSD
  • OpenBSD
  • Android
  • iOS
  • Nintendo Switch
  • PlayStation 5
  • Xbox
  • Web Browsers

What feature would you like to be added?

I believe it could be a great addition if ebitengine could support MRT.

At the moment we can write to a single dst image, and pass multiple src image (to a DrawTriangles/DrawTrianglesShader) function, writing to multiple dst images with the same function call (and with the same internal draw call) would be nice for some specific usecases.

On software side, something like:

DrawTrianglesShadersMRT(dst []*ebiten.Image, vertices []ebiten.Vertex, indices []uint16, opts *DrawTrianglesShaderOptions)

On kage side:

func Fragment(dst vec4, src vec2, color vec4) (vec4, vec4, vec4) {
    // Heavy calculations on common (maths, geometry, etc..)
    common := HeavyCalculations(dst, src, someUniforms)
    // 3 destinations textures
    mask0 := Mask0(common)
    colorOut := ColorOut(common)
    dataOut := GetData(common)
    
    return mask0, colorOut, dataOut
}

Why is this needed?

I've had many usecases where 80% of the heavy computations made within a shader invocation are needed for multiple destination images.
Having the possibility to re-use the same vertices, the same shader draw call, the same 80% initial work that is common to all destination images could give new possibilities.

So far, in order to do so, we need:

  • To write multiple shaders with repeated code, or a single one with uniforms branches
  • To make distinct calls for each of the destination images (and each of these shaders/new uniforms => no batching), even though we use the same vertices and most of the same shader code (and results of the same calculations) => this obviously forces the re-invocation of a fragment shader for the same pixels 2-3 times, and the computations that come with those

In terms of usecases:

  • One of the most common use nowadays, is for 3D pipelines where you process some geometry or any kind of maths for meshes / triangles, and you want to write to multiple offscreens at once (https://learnopengl.com/Advanced-Lighting/Deferred-Shading):
    • Diffuse (albedo), Normal, Depth, Specular, UVs, etc..
  • Any scenario that would normally require multiple passes over the same geometry!

If this can be supported, it would certainly unlock new rendering potential for Ebitengine, even for 2D/2.5D workflows I believe.
Some existing game rendering pipelines could be optimized on the user side, or improved with new effects (for free almost?) and in general would give a new (advanced) way of designing an (richer) ebitengine application.

Potential hints:

Proof of concept PR: #2953

@hajimehoshi
Copy link
Owner

hajimehoshi commented Mar 16, 2024

Wouldn't we need a depth buffer or a stencil buffer first, perhaps?

@Zyko0
Copy link
Contributor Author

Zyko0 commented Mar 16, 2024

Yes, maybe!
Also, there's a case with depth buffer + MRT where you might want to override a custom depth value (https://registry.khronos.org/OpenGL-Refpages/gl4/html/gl_FragDepth.xhtml)

I'm thinking if we want to write to some destination textures, and sometimes discard() writing to some others, based on some runtime conditions (e.g: one want to overwrite the pixel in the depth buffer only if some conditions are met) => this would complicate the MRT feature a bit

Depth buffering would be great indeed, but I have no idea how we would like to support it (especially since the usual depth buffer is a floating-point texture?)

edit:

Wouldn't we need a depth buffer or a stencil buffer first, perhaps?

But either feature (MRT or new buffer types) doesn't require the other, and can still add value individually

@hajimehoshi
Copy link
Owner

Depth buffering would be great indeed, but I have no idea how we would like to support it (especially since the usual depth buffer is a floating-point texture?)

As Ebitengine is a 2D game engine, supporting a depth buffer sounds a little odd. I'm not familiar with it so this might be perhaps useful even for 2D game engine, but I am not sure.

@Zyko0
Copy link
Contributor Author

Zyko0 commented Apr 6, 2024

So it's technically easy to support at the graphic driver level: #2953 is just a minimal working example (for MRT at least) for OpenGL and directx11 (both tested on windows only),

And I think the state of this issue is a API design issue (probably more internal than public), that still needs investigation and discussion on whether it's something we'd like to support (and if so, how), since:

  • It makes sense to render to multiple targets (and it's only doable) when the different destinations are separate textures (unmanaged) => A fragment is bound to a destination location, so it can write at the same location on multiple textures, but not at different locations from a single texture
  • Based on the previous point, it is possible that this can't be generalized and therefore involves having a different "draw path" for triangles using MRT => Then it could add a maintenance cost for a not-so-required feature for a 2D engine

@hajimehoshi
Copy link
Owner

It is possible to make destination images separate from atlases dynamically (and actually Ebitengine does so when necessary), but this would degrade performance, right?

@Zyko0
Copy link
Contributor Author

Zyko0 commented Apr 6, 2024

It is possible to make destination images separate from atlases dynamically (and actually Ebitengine does so when necessary)

I think this would defeat the purpose a little yes, but I wasn't aware of that actually!

Unless, it is stated somehow that:
Images passed to this method will be made unmanaged if they are not already, which might prevent them to be batched with different commands

I didn't consider it, but it's true, that in this case it shouldn't even matter to the user (the fact that an image is made unmanaged), and be accepted since the usage of this function would be a bit special by nature.
The risk, of being concerned (as a user) by losing the batching-capability of an image used as part of an MRT pipeline should be quite low.

but this would degrade performance, right?

I mentioned "made unmanaged" in order to cover for the performances part, assuming that: once it is made unmanaged by ebitengine, it will never be moved again to an atlas or merged with other atlases.
In that case, the cost would only happen once, so it should be okay!

However, if you meant that they can be moved for the sole purpose of ensuring that a draw call can be performed, but that they can be moved back to atlases, then it's not good (we would like this operation to happen once at most).

edit: This would solve the primary issue (and most important one), but then it should also be stated (+panic()) that ebiten.SubImages are not accepted => which is probably okay too!

@hajimehoshi
Copy link
Owner

I didn't consider it, but it's true, that in this case it shouldn't even matter to the user (the fact that an image is made unmanaged), and be accepted since the usage of this function would be a bit special by nature.
The risk, of being concerned (as a user) by losing the batching-capability of an image used as part of an MRT pipeline should be quite low.

I'm not sure I understand what you mean. I assume the destination textures for MRT are used as multiple source textures for one shader draw call, then even if the textures are separate, this should be efficient. Is this correct?

I mentioned "made unmanaged" in order to cover for the performances part, assuming that: once it is made unmanaged by ebitengine, it will never be moved again to an atlas or merged with other atlases.
However, if you meant that they can be moved for the sole purpose of ensuring that a draw call can be performed, but that they can be moved back to atlases, then it's not good (we would like this operation to happen once at most).

If an image is unmanaged (NewImageOptions.Unmanaged), right, the image never goes to an atlas. If an image is managed, the image might go to atlas again in some conditions (e.g. the image is used as a source for a while, and the image is not used as destination)

@Zyko0
Copy link
Contributor Author

Zyko0 commented Apr 7, 2024

I'm not sure I understand what you mean. I assume the destination textures for MRT are used as multiple source textures for one shader draw call, then even if the textures are separate, this should be efficient. Is this correct?

Yes! (faster than batched triangles multiplied by N regions on a single texture, since it would be a single region here and just N writes from the same shader call)

If an image is unmanaged (NewImageOptions.Unmanaged), right, the image never goes to an atlas. If an image is managed, the image might go to atlas again in some conditions (e.g. the image is used as a source for a while, and the image is not used as destination)

Okay yeah then it's acceptable I think, I understand what you mean.
Setting it as unmanaged for more control over the performances should be a user tweak then!

  • How to handle passing subimages as destinations? I suggest we reject those 👀

@hajimehoshi
Copy link
Owner

hajimehoshi commented Jun 14, 2024

I think we have already discussed in Discord, but what we have reached an agreement is that

  • A fragment function in Kage will return multiple values
  • The function for MRT will be a global function rather than a Draw* style method
  • The function takes multiple destinations, which must be unmanaged images
    • A tricky thing in MRT is that all the destination positions are shared among the destinations, so atlases are not available.
  • The destination images must not be a sub image
    • For the same reason above.

Is that correct?

@hajimehoshi
Copy link
Owner

hajimehoshi commented Jun 14, 2024

As we discussed in Discord:

  • All the destination images must have the same bounds
  • All the parent images of the destination images must have the same bounds
    • A parent image is the image itself if the image is not a sub-image, or the original image if the image is a sub-image
    • A parent image might be able to have different size for MRT, but for simplicity, let's have such restriction and revisit this later if we really need.
  • All the destination images must be unmanaged
    • Ebitengine might be able to convert a managed image to unmanaged automatically, but for simplicity, let's have such restriction and revisit this later if we really need.

@tinne26
Copy link

tinne26 commented Jun 14, 2024

By the way, slightly out of topic, but I have found a use-case for this feature in a 2D game, so I'll share it here:

  • Say you have this game: https://tinne26.github.io/mipix-examples/gametest/.
  • If you look closely, some objects can look very slightly disconnected from the ground (ymmv depending on display size, resolution and so on, but even if you don't see it just trust me). This is because rendering is done in a logical canvas for the world (back), then a second pass at high res for the smoothly moving character, and then a final pass with the logical rendering of elements in front of the player. Once everything is scaled, sometimes two contiguous pixels in logical space are not fully opaque in high res due to projection filters, so they are slightly translucid and you see a hole between them that should not really be there.
  • There are some low-level solutions to this like extending your graphical assets to cover the ground with one extra pixel and so on, but that's kinda annoying. And here's where MRT can be useful: besides drawing the game elements to the main logical canvas, you also keep a "connectivity" logical canvas (so, the second rendering target). Having this "connectivity canvas" allows you to detect which elements are fully adjacent in logical space and use this information when projecting to high resolution without leaving gaps.

Sounds a bit convoluted, but it's a nice and purely 2D use-case. There are decent alternative ways around it in this case, though.

@Zyko0
Copy link
Contributor Author

Zyko0 commented Jun 14, 2024

@tinne26 Very cool!!
And as a bonus you also render it once, even though the gain might not be massive!

@hajimehoshi
Copy link
Owner

@tinne26 Hmm? I still don't understand how MRT resolves the gap issue
image

@tinne26
Copy link

tinne26 commented Jun 15, 2024

Those disconnected graphics are on the back and front layer respectively, drawn on a logical canvas of 256, 144. The reason they appear disconnected is that I have a separate high resolution draw in the middle, so I need to project the logical canvas first before the high res draw and then do the same for the front layer after the high res draw. One idea to solve this is to use MRT to make the logical draws to 2 canvases, both of size 256x144. One will be used for the regular graphics, and the other will be used to keep track of the connectivity of the elements drawn at logical size. So, on the third draw pass, during the front layer logical draw, I have a clean canvas with the front layer and another that also includes the previous data (what I'm calling the "connectivity canvas"). I can use this connectivity canvas on the {logical => high res} projection to "correct" these gaps (theoretically). There are many different strategies though, both with MRT and without MRT, but MRT seems to make life easier in this case.

In any case, I'm not particularly arguing in favor of MRT or anything, you all know I'm more interested in depth buffers than MRT, but it's still an interesting example of how MRT might have some uses even in 2D. In fact, more uses would come for MRT if we actually had depth buffers too, as isometric games can absolutely use depth information for many things, and if you can draw that at the same time as the main tiles, that's great.

@hajimehoshi
Copy link
Owner

hajimehoshi commented Jun 15, 2024

So if we should do:

  • Draw the backend layer to the high-res canvas
  • Draw the frontend layer to the high-res canvas

you meant we can change them with MRT into like these:

  • Draw the backend/frontend layer to the two low-res canvases at the same time
  • Draw the two low-res canvases to the high-res canvas

?

If backend and frontend are different very much, would the MRT shader be efficient? Why not using one low-res canvas?

Maybe I don't understand this sentence:

One will be used for the regular graphics, and the other will be used to keep track of the connectivity of the elements drawn at logical size.

@hajimehoshi
Copy link
Owner

hajimehoshi commented Jun 15, 2024

image

OK so I missed the middle layer, but I still don't understand what and how MRT resolves. Please list draw calls before and after MRT, thanks!

@hajimehoshi
Copy link
Owner

@Zyko0 By the way, how much would the performance be imporved by your experimental PR?

@Zyko0
Copy link
Contributor Author

Zyko0 commented Jun 17, 2024

I paused my side project actually to focus on this, also not knowing if this feature would get accepted or not originally.
So I haven't tested yet, but I'm excited to, it's just mean quite a big refactor, so I didn't try yet, but I can try later if you want!

I also paused it because replicating draw calls and the same costly operations wasn't sustainable for new effects I wanted to add.
So I decided to stop at an arbitrary number of features.

These features require tracing/image information (that could come for free with MRT), but just impossible to add to the current load, so I've not implemented those yet.

@Zyko0
Copy link
Contributor Author

Zyko0 commented Jul 4, 2024

By the way, how much would the performance be imporved by your experimental PR?

@hajimehoshi I made 2 frame captures using RenderDoc to see the differences between the current implementation and the mrt one:

The difference is that the EID=88 (0.6ms) call from first screenshot is not necessary anymore in the new version.

Current - no MRT (2.46ms?):
qrenderdoc_KOevHcNHAk

With MRT (1.95ms?) (single tracing, multiple outputs + a deferred rendering merging pass):
qrenderdoc_tMKuH7Vwdb

hajimehoshi pushed a commit that referenced this issue Jul 8, 2024
@hajimehoshi
Copy link
Owner

I'm happy that there seems improvement in MRT!

By the way, I was wondering if there are other potential users or other use cases than @Zyko0 . As this would be a pretty big change and a big burden of maintenance, I'd like to know those.

@hajimehoshi hajimehoshi removed this from the v2.8.0 milestone Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants