Vulkan 3D DirectX OpenGL Intermediate Windows C++

Comparing GPU Resource Update Strategies in Diligent Engine

EgorYusov

5.00/5 (1 vote)

Sep 12, 2018

CPOL

11 min read

8224

This article describes several strategies to update GPU resources in Diligent Engine (a modern low-level graphic library) as well as important internal details and performance implications related to each method.

Disclaimer: This article uses material published on Diligent Engine web site.

Download latest source (Github)

Introduction

Efficiently supplying data to the graphics processing unit (GPU) is essential for a 3D renderer or any other application that utilizes the power of modern GPUs. As CPU and GPU usually have separate memory systems and perform operations in different timelines, it is not always as straightforward as simply writing bytes at a given address. In fact, the optimal way depends on the expected usage scenario. This article describes different ways a resource can be updated in Diligent Engine as well as important internal details and performance implications related to each method.

Background

Diligent Engine is a modern cross-platform low-level graphic library that has Direct3D11, Direct3D12, OpenGL/GLES and Vulkan backends and supports Windows, Linux, MacOS, iOS and Android platforms. This article gives an introduction to the engine.

Buffers

Buffers represent linear memory and are the most basic resource type. The data can be written to a buffer during initialization, as well as at run time using one of the methods described in this paragraph.

Buffer Initialization

The most basic way to supply data into a buffer is to provide it during the buffer initialization:

// Create index buffer
BufferDesc IndBuffDesc;
IndBuffDesc.Name = "Cube index buffer";
IndBuffDesc.Usage = USAGE_STATIC;
IndBuffDesc.BindFlags = BIND_INDEX_BUFFER;
IndBuffDesc.uiSizeInBytes = sizeof(Indices);
BufferData IBData;
IBData.pData = Indices;
IBData.DataSize = sizeof(Indices);
pDevice->CreateBuffer(IndBuffDesc, IBData, &m_CubeIndexBuffer);

The buffer usage defines how often its content is expected to change. USAGE_STATIC buffers cannot be updated after being created and initial data must always be provided. USAGE_DEFAULT buffers are expected to be updated occasionally, while USAGE_DYNAMIC buffers are optimized for very frequent updates.

To initialize the buffer, Diligent Engine performs the steps described below.

OpenGL/GLES backend

The operation directly translates to glBufferData.

Direct3D11 backend

In Direct3D11 backend, the initial data is passed over to ID3D11Device::CreateBuffer method.

Direct3D12/Vulkan backend

In next-gen backends, USAGE_STATIC and USAGE_DEFAULT buffers are allocated in GPU-only memory that is not directly accessible by CPU. To initialize the buffer, Diligent Engine creates a temporary staging buffer that is allocated in a memory visible to CPU, copies the data to this temporary buffer and then issues a GPU-side copy command. As soon as the command completes, temporary buffer is released and the staging memory is returned to the system. USAGE_DYNAMIC cannot be initialized this way and are described later in this paragraph.

Updating Buffers with IBuffer::UpdateData()

The first way to update a buffer contents at run time is to use IBuffer::UpdateData() method. Only buffers created with USAGE_DEFAULT flag can use this way. This method writes new data to a given buffer subregion, as in the example below:

m_CubeVertexBuffer[BufferIndex]->UpdateData(
    m_pImmediateContext, // Device context to use for the operation
    FirstVertToUpdate * sizeof(Vertex), // Start offset in bytes
    NumVertsToUpdate  * sizeof(Vertex), // Data size in bytes
    Vertices // Data pointer
);

Under the hood, Diligent Engine translates this call into the following operations:

OpenGL/GLES Backend

The operation directly translates to glBufferSubData.

Direct3D11 Backend

The operation directly translates to ID3D11DeviceContext::UpdateSubresource.

Direct3D12/Vulkan Backend

Default buffers are allocated in GPU-only accessible memory, so the data cannot be written directly. To perform the operation, the engine first allocates temporary storage in a CPU-visible memory, copies the data to this temporary storage and then issues GPU command to copy the data from the storage to the final destination. It also performs necessary resource state transitions (such as shader resource -> copy destination).

Performance

IBuffer::UpdateData() is currently the only way to update data in a default (GPU-only) buffer. The operation involves two copy operations. However, the main and not so obvious performance issue with this method is state transitions. Every time when a buffer is used in a copy operation, it needs to be transitioned to copy destination state. Every time it is used in a shader, it needs to be transitioned to shader resource state. Transitioning back and forth stalls the GPU pipeline and degrades performance dramatically.

This method should be used when a buffer content stays constant most of the time and only needs to be updated occasionally, usually no more often than once in a frame, for example, when reusing existing buffer to write new mesh data (vertices/indices). This method should not be used for high frequency updates such as animation or constant buffer updates.

Limitations

Inefficient for frequent updates.

Updating Dynamic Buffers via Mapping

When buffer contents need to be updated frequently (once or more times per frame), the buffer should be created with USAGE_DYNAMIC flag. Dynamic buffers cannot be updated with IBuffer::UpdateData() . Instead, they need to be mapped to obtain a pointer that can be used to write data directly to the buffer, as in the example below:

Vertex* Vertices = nullptr;
VertexBuffer->Map(m_pImmediateContext, MAP_WRITE, MAP_FLAG_DISCARD,
                  reinterpret_cast<PVoid&>(Vertices));
for(Uint32 v=0; v < _countof(CubeVerts); ++v)
{
    const auto& SrcVert = CubeVerts[v];
    Vertices[v].uv  = SrcVert.uv;
    Vertices[v].pos = SrcVert.pos;
}
VertexBuffer->Unmap(m_pImmediateContext, MAP_WRITE, MAP_FLAG_DISCARD);

OpenGL/GLES Backend

The operation translates to glMapBufferRange with GL_MAP_WRITE_BIT and GL_MAP_INVALIDATE_BUFFER_BIT flags set.

Direct3D11 Backend

In Direct3D11 backend, this call directly translates to ID3D11DeviceContext::Map with D3D11_MAP_WRITE_DISCARD flag.

Direct3D12/Vulkan Backend

When dynamic buffer is created in Direct3D12 or Vulkan backend, no memory is allocated. Instead, both backends have special dynamic storage which is a buffer created in CPU-accessible memory that is persistently mapped. When dynamic buffer is mapped, a region is reserved in this buffer. This operation boils down to simply moving current offset and is very cheap. A pointer is then returned that references this memory and the application can write data directly, avoiding all copies. When a dynamic buffer is used for rendering, internal dynamic buffer is bound instead and the proper offset is applied. Internal dynamic buffer is pre-transitioned to read-only state and no transitions are ever performed at run time. The engine takes care of synchronization making sure that a region in the buffer is never given to the application while being used by the GPU.

Performance

In Direct3D12/Vulkan backends, mapping dynamic buffers with MAP_FLAG_DISCARD flag is very cheap as it only involves updating current offset. It is hard to say what exactly Direct3D11 and OpenGL do under the hood, but most likely something similar. There is one significant difference however: Direct3D11 and OpenGL preserve contents of dynamic buffers between frames while Direct3D12 and Vulkan backends do not. As a result, mapping is many times more efficient in next-gen backends.

Dynamic buffers should be used for content that changes often, typically multiple time per frame. The most common example is a constant buffer that is updated with different transformation matrices before every draw call. Dynamic buffers should not be used for constant data that never changes.

Limitations

Only the entire buffer can currently be mapped with MAP_FLAG_DISCARD flag.

In Direct3D12 and Vulkan backends, the contents of all dynamic resources are lost at the end of every frame. A dynamic buffer must be mapped in every frame before its first use.

The total amount of CPU-accessible memory can be limited. Besides, access from the GPU may be slower compared to GPU-only memory, so dynamic buffers should not be used to store resources that are constant or change infrequently.

Streaming Buffer

Streaming buffer is not an API object, but rather a strategy that allows uploading variable amounts of data to the GPU in an efficient manner. The idea of streaming buffer can be summarized as follows:

Create dynamic buffer large enough to encompass the maximum amount of data that can be uploaded to GPU.
First time, map the buffer with MAP_FLAG_DISCARD flag.
- This will discard previous buffer contents and allocate new memory.
Set the current buffer offset to zero, write data to the buffer and update offset accordingly.
Unmap the buffer and issue draw command.
- Note that in Direct3D12 and Vulkan backends, unmapping the buffer is not required and can be safely skipped to improve performance.
When mapping the buffer next time, check if the remaining space is enough to encompass the new polygon data.
If there is enough space, map the buffer with MAP_FLAG_DO_NOT_SYNCHRONIZE flag.
- This will tell the system to return previously allocated memory. It is the responsibility of the application to not overwrite the memory that is in use by the GPU.
- Write new data at current offset (which guarantees that bytes previously written and currently used by the GPU will not be affected) and update the offset.
If there is not enough space, reset the offset to zero and map the buffer with MAP_FLAG_DISCARD flag to request new chunk of memory.

Textures

While buffers are simply linear regions of memory, textures are optimized for efficient sampling operations and use opaque layouts that are typically not exposed to the application. As a result, only the driver knows how to write data to the texture. Linear texture layouts are allowed in Direct3D12 and Vulkan, but they are less efficient.

Texture Initialization

Similar to buffers, initial data can be supplied to textures at creation time. For USAGE_STATIC textures, this is the only way.

TexDesc TexDesc;
TexDesc.Type = RESOURCE_DIM_TEX_2D;
TexDesc.Format = TEX_FORMAT_RGBA8_UNORM_SRGB;
TexDesc.Width = 1024;
TexDesc.Height = 1024;
TexDesc.MipLevels = 1;
TexDesc.BindFlags = BIND_SHADER_RESOURCE;
TexDesc.Usage = USAGE_STATIC;

TextureData InitData;
// Pointer to subresouce data, one for every mip level
InitData.pSubResources = subresources;
InitData.NumSubresources = _countof(subresources);

RefCntAutoPtr<ITexture> Texture;
Device->CreateTexture(TexDesc, InitData, &Texture);

Texture initialization is performed similar to buffer initialization. In Direct3D11 and OpenGL/GLES backends, there are corresponding native API calls. In Direct3D12/Vulkan backends, the engine creates temporary staging texture in a CPU-writable memory, copies the data to this memory and then issues a GPU copy command.

Updating Textures with ITexture::UpdateData()

The first way to update a texture at run time is to use ITexture::UpdateData() method. The method works similar to IBuffer::UpdateData() and writes new data to a given texture region:

Box UpdateBox;
Uint32 Width = 128;
Uint32 Height = 64;
UpdateBox.MinX = 16;
UpdateBox.MinY = 32;
UpdateBox.MaxX = UpdateBox.MinX + Width;
UpdateBox.MaxY = UpdateBox.MinY + Height;

TextureSubResData SubresData;
SubresData.Stride = Width * 4;
SubresData.pData = Data.data();
Uint32 MipLevel = 0;
Uint32 ArraySlice = 0;
Texture->UpdateData(m_pImmediateContext, MipLevel,
                    ArraySlice, UpdateBox, SubresData);

Under the hood, this maps to the following native API commands:

OpenGL/GLES Backend

The operation directly translates to glTexSubImage** family of functions.

Direct3D11 Backend

As with buffer updates, in Direct3D11 backend, this call directly maps to ID3D11DeviceContext::UpdateSubresource.

Direct3D12/Vulkan Backend

As with buffers, to update a texture the next-gen backends first allocate region in a CPU-accessible memory and copy client data to this region. They then perform necessary state transitions and issue GPU copy command that writes pixels to the texture using GPU-specific layout.

Performance

Usage scenarios are similar to buffer updates: the operation should be used for textures whose contents stay mostly constant and only occasionally requires updates.

Limitations

As the operation involves two copies and state transitions, it is not efficient for frequent texture updates.

Mapping Textures

Mapping a texture is a second way to update its contents. From the API side, mapping textures looks similar to mapping buffers:

Uint32 MipLevel = 0;
Uint32 ArraySlice = 0;
MappedTextureSubresource MappedSubres;
Box MapRegion;
Uint32 Width = 128;
Uint32 Height = 256;
MapRegion.MinX = 32;
MapRegion.MinY = 64;
MapRegion.MaxX = MapRegion.MinX + Width;
MapRegion.MaxY = MapRegion.MinY + Height;
Texture->Map(m_pImmediateContext, MipLevel, ArraySlice,
             MAP_WRITE, MAP_FLAG_DISCARD,
             &MapRegion, MappedSubres);
WriteTextureData((Uint8*)MappedSubres.pData, Width,
                 Height, MappedSubres.Stride);
Texture.Unmap(m_pImmediateContext, 0, 0);

What happens under the hood is very different compared to buffers.

OpenGL/GLES Backend

Mapping textures is currently not supported in OpenGL/GLES backends.

Direct3D11 Backend

In Direct3D11 backend, this call directly maps to ID3D11DeviceContext::Map with D3D11_MAP_WRITE_DISCARD flag.

Direct3D12/Vulkan Backend

There are no dynamic textures in next-gen backends in a way similar to dynamic buffers. While buffers can easily be suballocated from another buffer by binding parent buffer and applying an offset, there is no similar way for textures. So even if the required memory was suballocated from the dynamic buffer, there would be no way to treat this memory as a texture. Binding the memory to an existing texture is also not allowed. As a result, mapping textures in Direct3D12/Vulkan backend does not differ significantly from updating textures with ITexture::UpdateData(). When mapping a texture, the engine returns the pointer to the CPU-accessible memory directly that avoids one copy. However, GPU-side copy and most importantly state transitions are still performed.

Performance

It is not exactly clear what Direct3D11 does under the hood. The two most likely options are either creating linear-layout texture and suballocating it from CPU-accessible memory every time Map is called, or performing the same operations as Diligent's next-gen backends.

Mapping dynamic textures is not as efficient as mapping dynamic buffers, and typical usage scenarios are similar to ITexture::UpdateData().

There is no simple way to implement high-frequency texture updates across all APIs, so Diligent expects that this will be implemented by the application using low-level API interoperability. For Direct3D12 and Vulkan backends, one possible way is to create a number of linear-layout textures in CPU-writable memory and use them in a round-robin fashion. As this method is very application-specific, Diligent Engine does not expose it through common API.

Limitations

Texture mapping is not currently implemented in OpenGL/GLES backend.

In Direct3D11, only the entire texture level can be mapped with D3D11_MAP_WRITE_DISCARD flag.

In Direct3D12/Vulkan backends, mapping dynamic textures is not as efficient as mapping dynamic buffers. In fact, it is very similar to updating textures with ITexture::UpdateData() and only avoids one CPU-side copy.

Summary

The following table summarizes update methods for buffers:

Update scenario	Usage	Update Method	Comment
Constant data	`USAGE_STATIC`	n/a	Data can only be written during buffer initialization
< Once per frame	`USAGE_DEFAULT`	`IBuffer::UpdateData()`
>= Once per frame	`USAGE_DYNAMIC`	`IBuffer::Map()`	The content of dynamic buffers is invalidated at the end of every frame

The following table summarizes update methods for textures:

Update scenario	Usage/Update Method	Comment
Constant data	`USAGE_STATIC` / n/a	Data can only be written during texture initialization
< Once per frame	`USAGE_DEFAULT` + `ITexture::UpdateData()` or `USAGE_DYNAMIC` + `ITexture::Map()`
>= Once per frame	Implemented by the application	Dynamic textures cannot be implemented the same way as dynamic buffers

Source Code

Full engine source code is available for download at GitHub.

The following tutorials illustrate the ideas described in this article:

Tutorial 10 - Data Streaming

Tutorial 11 - Resource Updates