Skip to content

Commit

Permalink
Use multi_draw_indirect_count where available, in preparation for t…
Browse files Browse the repository at this point in the history
…wo-phase occlusion culling. (#17211)

This commit allows Bevy to use `multi_draw_indirect_count` for drawing
meshes. The `multi_draw_indirect_count` feature works just like
`multi_draw_indirect`, but it takes the number of indirect parameters
from a GPU buffer rather than specifying it on the CPU.

Currently, the CPU constructs the list of indirect draw parameters with
the instance count for each batch set to zero, uploads the resulting
buffer to the GPU, and dispatches a compute shader that bumps the
instance count for each mesh that survives culling. Unfortunately, this
is inefficient when we support `multi_draw_indirect_count`. Draw
commands corresponding to meshes for which all instances were culled
will remain present in the list when calling
`multi_draw_indirect_count`, causing overhead. Proper use of
`multi_draw_indirect_count` requires eliminating these empty draw
commands.

To address this inefficiency, this PR makes Bevy fully construct the
indirect draw commands on the GPU instead of on the CPU. Instead of
writing instance counts to the draw command buffer, the mesh
preprocessing shader now writes them to a separate *indirect metadata
buffer*. A second compute dispatch known as the *build indirect
parameters* shader runs after mesh preprocessing and converts the
indirect draw metadata into actual indirect draw commands for the GPU.
The build indirect parameters shader operates on a batch at a time,
rather than an instance at a time, and as such each thread writes only 0
or 1 indirect draw parameters, simplifying the current logic in
`mesh_preprocessing`, which currently has to have special cases for the
first mesh in each batch. The build indirect parameters shader emits
draw commands in a tightly packed manner, enabling maximally efficient
use of `multi_draw_indirect_count`.

Along the way, this patch switches mesh preprocessing to dispatch one
compute invocation per render phase per view, instead of dispatching one
compute invocation per view. This is preparation for two-phase occlusion
culling, in which we will have two mesh preprocessing stages. In that
scenario, the first mesh preprocessing stage must only process opaque
and alpha tested objects, so the work items must be separated into those
that are opaque or alpha tested and those that aren't. Thus this PR
splits out the work items into a separate buffer for each phase. As this
patch rewrites so much of the mesh preprocessing infrastructure, it was
simpler to just fold the change into this patch instead of deferring it
to the forthcoming occlusion culling PR.

Finally, this patch changes mesh preprocessing so that it runs
separately for indexed and non-indexed meshes. This is because draw
commands for indexed and non-indexed meshes have different sizes and
layouts. *The existing code is actually broken for non-indexed meshes*,
as it attempts to overlay the indirect parameters for non-indexed meshes
on top of those for indexed meshes. Consequently, right now the
parameters will be read incorrectly when multiple non-indexed meshes are
multi-drawn together. *This is a bug fix* and, as with the change to
dispatch phases separately noted above, was easiest to include in this
patch as opposed to separately.

## Migration Guide

* Systems that add custom phase items now need to populate the indirect
drawing-related buffers. See the `specialized_mesh_pipeline` example for
an example of how this is done.
  • Loading branch information
pcwalton authored Jan 14, 2025
1 parent e53c8e0 commit 35101f3
Show file tree
Hide file tree
Showing 32 changed files with 2,257 additions and 630 deletions.
34 changes: 27 additions & 7 deletions crates/bevy_core_pipeline/src/core_2d/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ use core::ops::Range;
use bevy_asset::UntypedAssetId;
use bevy_render::{
batching::gpu_preprocessing::GpuPreprocessingMode,
render_phase::PhaseItemBatchSetKey,
view::{ExtractedView, RetainedViewEntity},
};
use bevy_utils::{HashMap, HashSet};
Expand Down Expand Up @@ -132,7 +133,7 @@ pub struct Opaque2d {
///
/// Objects in a single batch set can potentially be multi-drawn together,
/// if it's enabled and the current platform supports it.
pub batch_set_key: (),
pub batch_set_key: BatchSetKey2d,
/// The key, which determines which can be batched.
pub bin_key: Opaque2dBinKey,
/// An entity from which data will be fetched, including the mesh if
Expand Down Expand Up @@ -198,7 +199,7 @@ impl PhaseItem for Opaque2d {
impl BinnedPhaseItem for Opaque2d {
// Since 2D meshes presently can't be multidrawn, the batch set key is
// irrelevant.
type BatchSetKey = ();
type BatchSetKey = BatchSetKey2d;

type BinKey = Opaque2dBinKey;

Expand All @@ -219,6 +220,20 @@ impl BinnedPhaseItem for Opaque2d {
}
}

/// 2D meshes aren't currently multi-drawn together, so this batch set key only
/// stores whether the mesh is indexed.
#[derive(Clone, Copy, PartialEq, PartialOrd, Eq, Ord, Hash)]
pub struct BatchSetKey2d {
/// True if the mesh is indexed.
pub indexed: bool,
}

impl PhaseItemBatchSetKey for BatchSetKey2d {
fn indexed(&self) -> bool {
self.indexed
}
}

impl CachedRenderPipelinePhaseItem for Opaque2d {
#[inline]
fn cached_pipeline(&self) -> CachedRenderPipelineId {
Expand All @@ -232,7 +247,7 @@ pub struct AlphaMask2d {
///
/// Objects in a single batch set can potentially be multi-drawn together,
/// if it's enabled and the current platform supports it.
pub batch_set_key: (),
pub batch_set_key: BatchSetKey2d,
/// The key, which determines which can be batched.
pub bin_key: AlphaMask2dBinKey,
/// An entity from which data will be fetched, including the mesh if
Expand Down Expand Up @@ -297,9 +312,7 @@ impl PhaseItem for AlphaMask2d {
}

impl BinnedPhaseItem for AlphaMask2d {
// Since 2D meshes presently can't be multidrawn, the batch set key is
// irrelevant.
type BatchSetKey = ();
type BatchSetKey = BatchSetKey2d;

type BinKey = AlphaMask2dBinKey;

Expand Down Expand Up @@ -335,6 +348,9 @@ pub struct Transparent2d {
pub draw_function: DrawFunctionId,
pub batch_range: Range<u32>,
pub extra_index: PhaseItemExtraIndex,
/// Whether the mesh in question is indexed (uses an index buffer in
/// addition to its vertex buffer).
pub indexed: bool,
}

impl PhaseItem for Transparent2d {
Expand Down Expand Up @@ -387,6 +403,10 @@ impl SortedPhaseItem for Transparent2d {
// radsort is a stable radix sort that performed better than `slice::sort_by_key` or `slice::sort_unstable_by_key`.
radsort::sort_by_key(items, |item| item.sort_key().0);
}

fn indexed(&self) -> bool {
self.indexed
}
}

impl CachedRenderPipelinePhaseItem for Transparent2d {
Expand All @@ -411,7 +431,7 @@ pub fn extract_core_2d_camera_phases(
}

// This is the main 2D camera, so we use the first subview index (0).
let retained_view_entity = RetainedViewEntity::new(main_entity.into(), 0);
let retained_view_entity = RetainedViewEntity::new(main_entity.into(), None, 0);

transparent_2d_phases.insert_or_clear(retained_view_entity);
opaque_2d_phases.insert_or_clear(retained_view_entity, GpuPreprocessingMode::None);
Expand Down
27 changes: 25 additions & 2 deletions crates/bevy_core_pipeline/src/core_3d/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ use core::ops::Range;
use bevy_render::{
batching::gpu_preprocessing::{GpuPreprocessingMode, GpuPreprocessingSupport},
mesh::allocator::SlabId,
render_phase::PhaseItemBatchSetKey,
view::{NoIndirectDrawing, RetainedViewEntity},
};
pub use camera_3d::*;
Expand Down Expand Up @@ -269,6 +270,12 @@ pub struct Opaque3dBatchSetKey {
pub lightmap_slab: Option<NonMaxU32>,
}

impl PhaseItemBatchSetKey for Opaque3dBatchSetKey {
fn indexed(&self) -> bool {
self.index_slab.is_some()
}
}

/// Data that must be identical in order to *batch* phase items together.
///
/// Note that a *batch set* (if multi-draw is in use) contains multiple batches.
Expand Down Expand Up @@ -430,6 +437,9 @@ pub struct Transmissive3d {
pub draw_function: DrawFunctionId,
pub batch_range: Range<u32>,
pub extra_index: PhaseItemExtraIndex,
/// Whether the mesh in question is indexed (uses an index buffer in
/// addition to its vertex buffer).
pub indexed: bool,
}

impl PhaseItem for Transmissive3d {
Expand Down Expand Up @@ -493,6 +503,11 @@ impl SortedPhaseItem for Transmissive3d {
fn sort(items: &mut [Self]) {
radsort::sort_by_key(items, |item| item.distance);
}

#[inline]
fn indexed(&self) -> bool {
self.indexed
}
}

impl CachedRenderPipelinePhaseItem for Transmissive3d {
Expand All @@ -509,6 +524,9 @@ pub struct Transparent3d {
pub draw_function: DrawFunctionId,
pub batch_range: Range<u32>,
pub extra_index: PhaseItemExtraIndex,
/// Whether the mesh in question is indexed (uses an index buffer in
/// addition to its vertex buffer).
pub indexed: bool,
}

impl PhaseItem for Transparent3d {
Expand Down Expand Up @@ -560,6 +578,11 @@ impl SortedPhaseItem for Transparent3d {
fn sort(items: &mut [Self]) {
radsort::sort_by_key(items, |item| item.distance);
}

#[inline]
fn indexed(&self) -> bool {
self.indexed
}
}

impl CachedRenderPipelinePhaseItem for Transparent3d {
Expand Down Expand Up @@ -594,7 +617,7 @@ pub fn extract_core_3d_camera_phases(
});

// This is the main 3D camera, so use the first subview index (0).
let retained_view_entity = RetainedViewEntity::new(main_entity.into(), 0);
let retained_view_entity = RetainedViewEntity::new(main_entity.into(), None, 0);

opaque_3d_phases.insert_or_clear(retained_view_entity, gpu_preprocessing_mode);
alpha_mask_3d_phases.insert_or_clear(retained_view_entity, gpu_preprocessing_mode);
Expand Down Expand Up @@ -662,7 +685,7 @@ pub fn extract_camera_prepass_phase(
});

// This is the main 3D camera, so we use the first subview index (0).
let retained_view_entity = RetainedViewEntity::new(main_entity.into(), 0);
let retained_view_entity = RetainedViewEntity::new(main_entity.into(), None, 0);

if depth_prepass || normal_prepass || motion_vector_prepass {
opaque_3d_prepass_phases.insert_or_clear(retained_view_entity, gpu_preprocessing_mode);
Expand Down
7 changes: 7 additions & 0 deletions crates/bevy_core_pipeline/src/prepass/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ use bevy_ecs::prelude::*;
use bevy_math::Mat4;
use bevy_reflect::{std_traits::ReflectDefault, Reflect};
use bevy_render::mesh::allocator::SlabId;
use bevy_render::render_phase::PhaseItemBatchSetKey;
use bevy_render::sync_world::MainEntity;
use bevy_render::{
render_phase::{
Expand Down Expand Up @@ -184,6 +185,12 @@ pub struct OpaqueNoLightmap3dBatchSetKey {
pub index_slab: Option<SlabId>,
}

impl PhaseItemBatchSetKey for OpaqueNoLightmap3dBatchSetKey {
fn indexed(&self) -> bool {
self.index_slab.is_some()
}
}

// TODO: Try interning these.
/// The data used to bin each opaque 3D object in the prepass and deferred pass.
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
Expand Down
3 changes: 3 additions & 0 deletions crates/bevy_gizmos/src/pipeline_2d.rs
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,7 @@ fn queue_line_gizmos_2d(
sort_key: FloatOrd(f32::INFINITY),
batch_range: 0..1,
extra_index: PhaseItemExtraIndex::None,
indexed: false,
});
}

Expand All @@ -360,6 +361,7 @@ fn queue_line_gizmos_2d(
sort_key: FloatOrd(f32::INFINITY),
batch_range: 0..1,
extra_index: PhaseItemExtraIndex::None,
indexed: false,
});
}
}
Expand Down Expand Up @@ -418,6 +420,7 @@ fn queue_line_joint_gizmos_2d(
sort_key: FloatOrd(f32::INFINITY),
batch_range: 0..1,
extra_index: PhaseItemExtraIndex::None,
indexed: false,
});
}
}
Expand Down
3 changes: 3 additions & 0 deletions crates/bevy_gizmos/src/pipeline_3d.rs
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,7 @@ fn queue_line_gizmos_3d(
distance: 0.,
batch_range: 0..1,
extra_index: PhaseItemExtraIndex::None,
indexed: true,
});
}

Expand All @@ -390,6 +391,7 @@ fn queue_line_gizmos_3d(
distance: 0.,
batch_range: 0..1,
extra_index: PhaseItemExtraIndex::None,
indexed: true,
});
}
}
Expand Down Expand Up @@ -484,6 +486,7 @@ fn queue_line_joint_gizmos_3d(
distance: 0.,
batch_range: 0..1,
extra_index: PhaseItemExtraIndex::None,
indexed: true,
});
}
}
Expand Down
2 changes: 2 additions & 0 deletions crates/bevy_pbr/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,8 @@ pub mod graph {
GpuPreprocess,
/// Label for the screen space reflections pass.
ScreenSpaceReflections,
/// Label for the indirect parameters building pass.
BuildIndirectParameters,
}
}

Expand Down
12 changes: 7 additions & 5 deletions crates/bevy_pbr/src/material.rs
Original file line number Diff line number Diff line change
Expand Up @@ -851,6 +851,9 @@ pub fn queue_material_meshes<M: Material>(
}
};

// Fetch the slabs that this mesh resides in.
let (vertex_slab, index_slab) = mesh_allocator.mesh_slabs(&mesh_instance.mesh_asset_id);

match mesh_key
.intersection(MeshPipelineKey::BLEND_RESERVED_BITS | MeshPipelineKey::MAY_DISCARD)
{
Expand All @@ -865,13 +868,12 @@ pub fn queue_material_meshes<M: Material>(
distance,
batch_range: 0..1,
extra_index: PhaseItemExtraIndex::None,
indexed: index_slab.is_some(),
});
} else if material.properties.render_method == OpaqueRendererMethod::Forward {
let (vertex_slab, index_slab) =
mesh_allocator.mesh_slabs(&mesh_instance.mesh_asset_id);
let batch_set_key = Opaque3dBatchSetKey {
draw_function: draw_opaque_pbr,
pipeline: pipeline_id,
draw_function: draw_opaque_pbr,
material_bind_group_index: Some(material.binding.group.0),
vertex_slab: vertex_slab.unwrap_or_default(),
index_slab,
Expand Down Expand Up @@ -903,10 +905,9 @@ pub fn queue_material_meshes<M: Material>(
distance,
batch_range: 0..1,
extra_index: PhaseItemExtraIndex::None,
indexed: index_slab.is_some(),
});
} else if material.properties.render_method == OpaqueRendererMethod::Forward {
let (vertex_slab, index_slab) =
mesh_allocator.mesh_slabs(&mesh_instance.mesh_asset_id);
let batch_set_key = OpaqueNoLightmap3dBatchSetKey {
draw_function: draw_alpha_mask_pbr,
pipeline: pipeline_id,
Expand Down Expand Up @@ -938,6 +939,7 @@ pub fn queue_material_meshes<M: Material>(
distance,
batch_range: 0..1,
extra_index: PhaseItemExtraIndex::None,
indexed: index_slab.is_some(),
});
}
}
Expand Down
4 changes: 2 additions & 2 deletions crates/bevy_pbr/src/prepass/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -966,13 +966,13 @@ pub fn queue_prepass_material_meshes<M: Material>(
}
};

let (vertex_slab, index_slab) = mesh_allocator.mesh_slabs(&mesh_instance.mesh_asset_id);

match mesh_key
.intersection(MeshPipelineKey::BLEND_RESERVED_BITS | MeshPipelineKey::MAY_DISCARD)
{
MeshPipelineKey::BLEND_OPAQUE | MeshPipelineKey::BLEND_ALPHA_TO_COVERAGE => {
if deferred {
let (vertex_slab, index_slab) =
mesh_allocator.mesh_slabs(&mesh_instance.mesh_asset_id);
opaque_deferred_phase.as_mut().unwrap().add(
OpaqueNoLightmap3dBatchSetKey {
draw_function: opaque_draw_deferred,
Expand Down
Loading

0 comments on commit 35101f3

Please sign in to comment.