Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solari: Dynamic realtime global illumination #10000

Closed
wants to merge 104 commits into from
Closed

Conversation

JMS55
Copy link
Contributor

@JMS55 JMS55 commented Oct 2, 2023

WARNING: Highly experimental, will not be merged anytime soon.

Here's a pretty image so the PR isn't entirely text, but keep in mind there's still a lot of work needed to get this to a generally usable and artifact/bug-free state.

Overview

Solari is an implementation of fully dynamic, fully realtime raytraced global illumination. All lights and objects can move and mutate, contribute indirect lighting (including emissive meshes), and receive indirect light bounces. It's comparable in scope to Unreal Engine 5's Lumen (Fortnite, Lumen in the Land of Nanite), or Nvidia's RTXGI (Cyberpunk 2077).

This is a high end GPU-intensive rendering technique. All testing was done on an RTX 3080 GPU, at both 1080p and 4k, with an initial target budget of around 4ms of GPU time (ideally we can get this down to ~2ms with further optimizations). As it relies on hardware raytracing support, it only works on GPUs such as Nvidia's RTX 2000 series+, and AMD's RX 6000 series+. Currently, this PR is using an experimental fork of wgpu that implements hardware raytracing on the Vulkan backend only. Metal and DirectX12 are not currently supported.

Note that Solari only (currently) does indirect diffuse lighting. Indirect specular lighting (reflections) will come later, and raytraced direct diffuse and specular lighting much later as part of a separate sub-plugin of Solari, likely using a different technique.

Current Technique

The following is a simple high-level overview of Solari's current rendering, as details are extremely subject to change and I don't want to have to rewrite this frequently. A detailed breakdown will probably come once everything is actually finished. See the literature section of this PR description for links to each of the techniques.

Like Lumen and specifically GI-1.0, Solari uses a multi-level radiance/irradiance cache. The high level process is as follows:

  • Light probes capture the incoming radiance from all directions at points directly visible from the camera. These probes are reprojected between frames, forming the "screen cache".
  • In order to get incoming radiance for a probe, a ray is traced out of the probe, hitting another point. In order to get the irradiance at that point, the "world cache" is queried.
  • The world cache is a pre-allocated, persistent hashmap storing irradiance for a large chunk of the world called a cell. In order to determine the irradiance for the cell, rays are traced per-cell towards light sources (and optionally other cells).

The final path is as such: camera -> visible point on screen (probe placed) -> a different point in the world (world cache queried) -> light sources. GI-1.0 has a great illustration of this:
image

A more detailed, technical breakdown is as follows:

  1. Screen probes are allocated as a set of 4 cascades of octahedral probes. Each cascade has probes placed twice as far apart, with twice greater directional resolution (so bigger probes), and has half as many total probes, and tracing a twice longer radiance interval.
  2. The world cache is allocated a single persistent storage buffer.
  3. Each frame:
  4. Go over the active world cache cells, and decay a life value (reset on each access) by 1. If the life of a cell reaches 0, it is turned back into an empty cell.
  5. A series of passes compact the alive world cache cells into a buffer, in order to improve occupancy for the next step.
  6. Each alive world cache cell traces a ray towards light sources, and optionally an extra ray in a random direction. While the first ray gets light from an analytic light source or emissive mesh, the optional extra ray queries the world cache itself (other cells) for radiance, forming a multi-bounce feedback effect that is particularly important for indoor scenes.
  7. The contribution of the newly traced rays are then temporally blended into the current irradiance for each associated cell.
  8. Screen probes (for each cascade) are reprojected using motion vectors and the current and previous frame's depth buffers. Screen probes are placed directly on visible points in the world, reconstructed using the rasterized depth buffer.
  9. Per cell (octahedral map texel) of each probe for each cascade, a new ray is traced, and radiance is computed by querying the world cache at the hit point. Each probe traces within a specific radiance interval. The new radiance values are temporally blended into the probes.
  10. Starting from the highest cascade, each cascade recursively merges downwards onto cascade 0, the lowest resolution but highest density cascade.
  11. Each probe in the merged cascade 0 is converted to spherical harmonics which filters out higher (noisy) frequencies and makes reading them cheaper in the next pass.
  12. Finally, each pixel on screen interpolates from the spherical harmonics to form the final GI texture. The main lighting passes read from this texture per-pixel as an additional form of indirect light.

TODOs

See crates/bevy_pbr/src/solari/todo.txt.

Expect lots of bugs, ghosting, light leaks, quality issues, etc. Especially in non-toy scenes. Anything in bevy_pbr::solari::scene should be treated as temporary, and only written in order to let me work on the actual shader techniques.

This is a draft PR, and will likely remain that way for a long while. I've been working on this basically solo for months already, and expect to continue doing so for many more months. That said, I'm putting this out there in the hopes of raising additional help developing Solari. If you're interested in helping with the development, please reach out on the rendering-dev channel in Bevy's discord!

Blockers

All of these are currently worked around either in Bevy (along with other hacks), or hacked around in forked dependencies.

Other missing nice-to-haves for better performance:

  • Subgroup operation support in wgsl
  • Async compute for building acceleration structures
  • Async compute

Literature

Thanks

Finally, I'd like to thank Alexander Sannikov for his help understanding and implementing his radiance cascades technique, @daniel-keitel for their work implementing raytracing support in wgpu, and countless others in the Bevy discord, wgpu/naga communities, and other graphics forums that have given me advice on this project. Far too many people to name individually :)

@alice-i-cecile alice-i-cecile added C-Feature A new feature, making something new possible A-Rendering Drawing game state to the screen D-Complex Quite challenging from either a design or technical perspective. Ask for help! labels Oct 2, 2023
@JMS55 JMS55 added the S-Blocked This cannot move forward until something else changes label Oct 3, 2023
@JMS55 JMS55 mentioned this pull request Oct 3, 2023
@entropylost
Copy link
Contributor

@JMS55 What happened? Did it just get too far behind or something?

@JMS55
Copy link
Contributor Author

JMS55 commented Aug 9, 2024

A couple of things:

  • Wgpu raytracing never got finished, and maintaining a fork of wgpu/naga/naga_oil/bevy got more and more painful
  • The algorithms/implementations I was trying out were flawed. Nowadays I feel like ReSTIR based methods have a lot more promise than screen space probes, radiance cascades or otherwise. The probes were just too finicky. Additionally my world-space radiance cache was poorly/incorrectly implemented. If I were to start over the project, I would forgo the world space cache until I was confident in the final gather and denoising, and only then start working on it.
  • I ended up devoting 95% of my Bevy-development time to meshlets, which is making a lot more progress and isn't blocked (anymore) on wgpu features.

I am really looking forward to coming back to this in the future, but wgpu needs to support raytracing first. Realtime RT-GI is super cool, and since I started this project there's been even more exciting papers, but I don't have the time or motivation to keep trying to patch wgpu and bevy.

If anyone would like to see this work continue, please contribute raytracing support to wgpu along with the tests it needs to get merged.

@baadc0de
Copy link

baadc0de commented Jan 7, 2025

I'm working on pushing RT support in wgpu with express motivation to get some ReSTIR lighting approaches into bevy.

@JMS55
Copy link
Contributor Author

JMS55 commented Jan 7, 2025

Hey, glad to see more people interested in RT!

Wgpu's current RT support is actually good enough for me to start working on RT DI/GI. The main blockers are no longer RT related:

  • Wgpu needs proper bindless texture/buffer support, where you can mutate a bind group to add more resources to the binding array without recreating it.
  • Naga oil needs to support transferring SpecialTypes between modules Handle naga's SpecialTypes naga_oil#54

When these are resolved I plan to revive Solari.

@baadc0de
Copy link

baadc0de commented Jan 8, 2025

I'm still studying the internals and how wgpu works. My plan is to help mature the RT stack on wpgu side first - compaction, in-place update of accel structures, then work my way up to binding arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Feature A new feature, making something new possible D-Complex Quite challenging from either a design or technical perspective. Ask for help! S-Blocked This cannot move forward until something else changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants