MutiDrawIndirect with state chages

Started by
2 comments, last by Valakor 2 years, 3 months ago

Hello there!

I am working on the rendering engine using Vulkan. My engine follows the GPU-driven approach. In the beginning of the frame the culling compute shader is executed. It iterates trough the instances buffer, performs a frustum and occlusion culling (with depth pyramid) and writes the resulting indirect command buffer. Then vkCmdDrawIndexedIndirectCount is performed. Material parameters are retrieved from another buffer using InstanceIndex (descriptorIndexing feature is used also).

Now I need to do much complex rendering using the different (possible arbitrary) pipelines (i.e alpha blending, tesselation, etc). And I'm stuck how to handle it with a GPU-driven indirect rendering.

I think about two approaches:

  1. Allocate separate indirect commands buffer for the every pipeline. Add pipeline ID's to an every entity in the instance buffer. My doubts are about how to predict a space needed to an every buffer (in worst case it should be numberofinstances*numberofpipelines. And it requires more atomic increments in the divergent branches of the culling shader. And finally, it requires to execute multiple DrawIndexedIndirect (one for each pipeline) even for zero amount of actual draws (since CPU don't know anything about culling results).
  2. Do dispatch (overwriting previous indirect buffer) then draw for an every shader in the scene. It requires to start new renderpass for every pipeline (since CmdDispatch cannot be performed within a renderpass), so I don't like that idea at all

Another option I see is to give-up and do GPU-driven rendering only for single most frequently used pipeline. And do traditional CPU driven for every other instances (loosing the ability to use depth pyramid culling for them). Don't want to do this because of need to support both ways (indirect and direct) together.

What would you do? How are large GPU-driven renderers handle it? Thank you for your answers!

Advertisement

Unless you are going to implement every potential render state as shader logic which would turn out be a nightmare, it nigh near impossible to have a 100% GPU-driven pipe-line for any moderately complex use case. Even then I would question the efficiency of such approach as some GPU render state functionality could be implemented in fix-function hardware without the added hassle of programmability. A hybrid approach would give you more flexibility as you map functionality to either approach as see fit instead of trying to shoehorn all into one.

Few (if any) GPU-driven pipelines can draw without any state changes, which means you can't generally get away with a single call to DrawIndexedIndirect. Something close might be possible if you bucket all your materials into a small number of “uber” shaders (e.g. “opaque”, “transparent”, “cutout”, etc.) with state changes between each of these buckets, but with everything inside each bucket fully GPU-driven and relying upon runtime feature checks in each shader.

If this isn't feasible, a common approach is to assume the worst case: submit 1 DrawIndexedIndirect for every unique material in the scene (obviously bucketing as many draws into each material as possible) with state changes between each, and have the GPU fill in the draw count. You'll likely end up with buckets that end up with 0 draws, which kind of sucks and has a measurable performance impact, but there isn't much you can do if you need to insert state changes inbetween draws. If you're on a platform with a very high degree of control (e.g. consoles) you can get around this by hacking the command stream to conditionally insert jumps to skip 0-sized draws entirely, but that's probably unlikely to have wide support on PC.

This topic is closed to new replies.

Advertisement