Managing Descriptor Heaps in Direct3D12

EgorYusov

5.00/5 (3 votes)

Apr 6, 2017

CPOL

15 min read

17577

122

Descriptors and descriptor heaps are key components of a new resource binding paradigm introduced in Direct3D12. This article describes an efficient system for managing descriptor heaps.

Disclaimer: This article is a repost of material originally published on this page on Diligent Engine web site.

Background

This article is not an introduction to descriptor heaps in D3D12. Though we will give a brief description of what descriptor heaps are, it is assumed that the reader has an understanding of basic D3D12 concepts. The system described below uses Simple Variable-Size Memory Block Allocator and is related to the resource binding model presented in this post.

Introduction

Resource descriptors and descriptor heaps are key concepts of a new resource binding model introduced in Direct3D12. A descriptor is a small block of data that fully describes an object to the GPU, in a GPU specific opaque format. Descriptor heap is essentially an array of descriptors. Every pipeline state incorporates a root signature that defines how shader registers are mapped to the descriptors in the bound descriptor heaps. Resource binding is a two-stage process: shader register is first mapped to the descriptor in a descriptor heap as defined by the root signature. The descriptor (which may be SRV, UAV, CBV or Sampler) then references the resource in GPU memory. The picture below illustrates a simplified view of the D3D12 resource binding model.

There are four types of descriptor heaps in D3D12:

Constant Buffer/Shader Resource/Unordered Access view (D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV)
Sampler (D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER)
Render Target View (D3D12_DESCRIPTOR_HEAP_TYPE_RTV)
Depth Stencil View (D3D12_DESCRIPTOR_HEAP_TYPE_DSV)

For GPU to be able to access descriptors in the heap, the heap must be shader-visisble. Only the first two heap types (CBV_SRV_UAV and SAMPLER) can be shader visible. RTV and DSV heaps are only CPU-visible. The size of the CPU-only descriptor heap is only limited by the available CPU memory. The size of the shader-visible descriptor heap has more strict limitations. While CBV_SRV_UAV heap can hold as many as 1,000,000 descriptors or more, the maximum number of samplers in a shader-visible descriptor heap is only 2048 (see D3D12 Hardware Tiers on MSDN). As a result, not all descriptor handles can be stored in a shader-visible descriptor heap, and it is responsibility of D3D12 application to make sure that all descriptor handles required for rendering are in GPU-visible heaps. This article describes a descriptor heap management system implemented in Diligent Engine 2.0.

Overview

Descriptor heap management system in Diligent Engine consists of five main classes:

DescriptorHeapAllocation is a helper class that represents descriptor heap allocation, which is simply a range of descriptors
DescriptorHeapAllocationManager is the main workhorse class that manages allocations in D3D12 descriptor heap using variable-size GPU allocations manager
CPUDescriptorHeap implements CPU-only descriptor heap that is used as a storage of resource view descriptor handles
GPUDescriptorHeap implements shader-visible descriptor heap that holds descriptor handles used by the GPU commands
DynamicSuballocationsManager is responsible for allocating short-living dynamic descriptor handles used in the current frame only

Each class as well as their interactions will be described in details below.

Descriptor Heap Allocation

DescriptorHeapAllocation, the first class used by the Diligent Engine descriptor heap management system, represents a descriptor heap allocation. It can be initialized as a single descriptor or as a continuous range of descriptors in the specified heap.

Note that the descriptor heap allocation only references a range in the heap. It contains the first CPU handle in CPU virtual address space, and, if the heap is shader-visible, the first GPU handle in GPU virtual address space. The class prohibits copies and only allows transfer of ownership through move semantics. The class is defined as shown below:

class DescriptorHeapAllocation
{
public:
    // Creates null allocation
    DescriptorHeapAllocation();

    // Initializes non-null allocation
    DescriptorHeapAllocation( IDescriptorAllocator *pAllocator,
                              ID3D12DescriptorHeap *pHeap,
                              D3D12_CPU_DESCRIPTOR_HANDLE CpuHandle,
                              D3D12_GPU_DESCRIPTOR_HANDLE GpuHandle,
                              Uint32 NHandles,
                              Uint16 AllocationManagerId );

    // Move constructor (copy is not allowed)
    DescriptorHeapAllocation(DescriptorHeapAllocation &&Allocation);

    // Move assignment (assignment is not allowed)
    DescriptorHeapAllocation& operator = (DescriptorHeapAllocation &&Allocation);

    // Destructor automatically releases this allocation through the allocator
    ~DescriptorHeapAllocation()
    {
        if(!IsNull() && m_pAllocator)
            m_pAllocator->Free(std::move(*this));
    }

    // Returns CPU descriptor handle at the specified offset
    D3D12_CPU_DESCRIPTOR_HANDLE GetCpuHandle(Uint32 Offset = 0) const
    {
        D3D12_CPU_DESCRIPTOR_HANDLE CPUHandle = m_FirstCpuHandle;
        if (Offset != 0)
            CPUHandle.ptr += m_DescriptorSize * Offset;
        return CPUHandle;
    }

    // Returns GPU descriptor handle at the specified offset
    D3D12_GPU_DESCRIPTOR_HANDLE GetGpuHandle(Uint32 Offset = 0) const
    {
        D3D12_GPU_DESCRIPTOR_HANDLE GPUHandle = m_FirstGpuHandle;
        if (Offset != 0)
            GPUHandle.ptr += m_DescriptorSize * Offset;
        return GPUHandle;
    }

    // Returns pointer to the descriptor heap that contains this allocation
    ID3D12DescriptorHeap *GetDescriptorHeap(){return m_pDescriptorHeap;}

    size_t GetNumHandles()const{return m_NumHandles;}

    bool IsNull() const { return m_FirstCpuHandle.ptr == 0; }
    bool IsShaderVisible() const { return m_FirstGpuHandle.ptr != 0; }
    size_t GetAllocationManagerId(){return m_AllocationManagerId;}
    UINT GetDescriptorSize()const{return m_DescriptorSize;}

private:
    // No copies, only moves are allowed
    DescriptorHeapAllocation(const DescriptorHeapAllocation&) = delete;
    DescriptorHeapAllocation& operator= (const DescriptorHeapAllocation&) = delete;

    // First CPU descriptor handle in this allocation
    D3D12_CPU_DESCRIPTOR_HANDLE m_FirstCpuHandle = {0};
   
    // First GPU descriptor handle in this allocation
    D3D12_GPU_DESCRIPTOR_HANDLE m_FirstGpuHandle = {0};

    // Pointer to the descriptor heap allocator that created this allocation
    IDescriptorAllocator* m_pAllocator = nullptr;

    // Pointer to the D3D12 descriptor heap that contains descriptors in this allocation
    ID3D12DescriptorHeap* m_pDescriptorHeap = nullptr;
   
    // Number of descriptors in the allocation
    Uint32 m_NumHandles = 0;

    // Allocation manager ID
    Uint16 m_AllocationManagerId = static_cast<Uint16>(-1);
   
    // Descriptor size
    Uint16 m_DescriptorSize = 0;
};

One field that requires some clarification is m_AllocationManagerId. As we will discuss later, a descriptor heap object may contain several allocation managers. This field is used to identify the manager within the descriptor heap that was used to create this allocation.

Descriptor Heap Allocation Manager

Second class that constitutes descriptor heap management system is DescriptorHeapAllocationManager. This class uses variable-size GPU allocations manager to handle allocations within the descriptor heap.

Every allocation that the class creates is represented by an instance of DescriptorHeapAllocation class. The list of free descriptors is managed by m_FreeBlocksManager member. The class declaration is given in the listing below:

class DescriptorHeapAllocationManager
{
public:
    // Creates a new D3D12 descriptor heap
    DescriptorHeapAllocationManager(IMemoryAllocator &Allocator,
                                    RenderDeviceD3D12Impl *pDeviceD3D12Impl,
                                    IDescriptorAllocator *pParentAllocator,
                                    size_t ThisManagerId,
                                    const D3D12_DESCRIPTOR_HEAP_DESC &HeapDesc);

    // Uses subrange of descriptors in the existing D3D12 descriptor heap
    // that starts at offset FirstDescriptor and uses NumDescriptors descriptors
    DescriptorHeapAllocationManager(IMemoryAllocator &Allocator,
                                    RenderDeviceD3D12Impl *pDeviceD3D12Impl,
                                    IDescriptorAllocator *pParentAllocator,
                                    size_t ThisManagerId,
                                    ID3D12DescriptorHeap *pd3d12DescriptorHeap,
                                    Uint32 FirstDescriptor,
                                    Uint32 NumDescriptors);

    // Move constructor
    DescriptorHeapAllocationManager(DescriptorHeapAllocationManager&& rhs);

    // No copies or move-assignments
    DescriptorHeapAllocationManager& operator = (DescriptorHeapAllocationManager&& rhs) = delete;
    DescriptorHeapAllocationManager(const DescriptorHeapAllocationManager&) = delete;
    DescriptorHeapAllocationManager& operator = (const DescriptorHeapAllocationManager&) = delete;

    ~DescriptorHeapAllocationManager();

    // Allocates Count descriptors
    DescriptorHeapAllocation Allocate( uint32_t Count );
   
    // Releases descriptor heap allocation. Note
    // that the allocation is not released immediately, but
    // added to the release queue in the allocations manager
    void Free(DescriptorHeapAllocation&& Allocation);
   
    // Releases all stale allocation
    void ReleaseStaleAllocations(Uint64 NumCompletedFrames);

    size_t GetNumAvailableDescriptors()const{return m_FreeBlockManager.GetFreeSize();}

private:
    // Allocations manager used to handle descriptor allocations within the heap
    VariableSizeGPUAllocationsManager m_FreeBlockManager;
   
    // Heap description
    D3D12_DESCRIPTOR_HEAP_DESC m_HeapDesc;

    // Strong reference to D3D12 descriptor heap object
    CComPtr<ID3D12DescriptorHeap> m_pd3d12DescriptorHeap;
   
    // First CPU descriptor handle in the available descriptor range
    D3D12_CPU_DESCRIPTOR_HANDLE m_FirstCPUHandle = {0};
   
    // First GPU descriptor handle in the available descriptor range
    D3D12_GPU_DESCRIPTOR_HANDLE m_FirstGPUHandle = {0};

    UINT m_DescriptorSize = 0;

    // Number of descriptors in the allocation.
    // If this manager was initialized as a subrange in the existing heap,
    // this value may be different from m_HeapDesc.NumDescriptors
    Uint32 m_NumDescriptorsInAllocation = 0;

    std::mutex m_AllocationMutex;
    RenderDeviceD3D12Impl *m_pDeviceD3D12Impl = nullptr;
    IDescriptorAllocator *m_pParentAllocator = nullptr;
   
    // External ID assigned to this descriptor allocations manager
    size_t m_ThisManagerId = static_cast<size_t>(-1);
};

The class provides two constructors. The first constructor creates a new D3D12 descriptor heap and address the entire available space. The second constructor uses subrange of descriptors in an existing D3D12 heap. This allows a number of allocation managers to share the same D3D12 descriptor heap, which is essential for GPU-visible heaps.

Allocation routine uses DescriptorHeapAllocationManager::Allocate() to allocate the requested number of descriptors in the heap and returns DescriptorHeapAllocation object representing the allocation.

DescriptorHeapAllocation DescriptorHeapAllocationManager::Allocate(uint32_t Count)
{
    std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);

    // Use variable-size GPU allocations manager to allocate the requested number of descriptors
    auto DescriptorHandleOffset = m_FreeBlockManager.Allocate(Count);
    if (DescriptorHandleOffset == VariableSizeGPUAllocationsManager::InvalidOffset)
        return DescriptorHeapAllocation();

    // Compute the first CPU and GPU descriptor handles in the allocation by
    // offseting the first CPU and GPU descriptor handle in the range
    auto CPUHandle = m_FirstCPUHandle;
    CPUHandle.ptr += DescriptorHandleOffset * m_DescriptorSize;

    auto GPUHandle = m_FirstGPUHandle;
    if(m_HeapDesc.Flags & D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE)
        GPUHandle.ptr += DescriptorHandleOffset * m_DescriptorSize;

    return DescriptorHeapAllocation( m_pParentAllocator, m_pd3d12DescriptorHeap, 
                                     CPUHandle, GPUHandle, Count, 
                                     static_cast<Uint16>(m_ThisManagerId) );
}

Similarly, deallocation routine takes DescriptorHeapAllocation object and uses DescriptorHeapAllocationManager::Free() to release the allocation. Note that since GPU commands are executed asynchronously, the allocation cannot be released immediately. Instead, the manager adds it to the queue along with the current frame number and releases all stale allocations later when the frame is completed by the GPU (which is detected by a signaled fence).

void DescriptorHeapAllocationManager::Free(DescriptorHeapAllocation&& Allocation)
{
    std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
    auto DescriptorOffset = (Allocation.GetCpuHandle().ptr - m_FirstCPUHandle.ptr) / m_DescriptorSize;
    // Note that the allocation is not released immediately, but added to the 
    // release queue in the allocations manager
    m_FreeBlockManager.Free(DescriptorOffset, Allocation.GetNumHandles(), 
                            m_pDeviceD3D12Impl->GetCurrentFrame());
    // Clear the allocation
    Allocation = DescriptorHeapAllocation();
}

ReleaseStaleAllocations() method must be called at the end of every frame to actually release all stale allocations from previous frames:

void DescriptorHeapAllocationManager::ReleaseStaleAllocations(Uint64 NumCompletedFrames)
{
    std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
    m_FreeBlockManager.ReleaseCompletedFrames(NumCompletedFrames);
}

CPU Descriptor Heap

The next part of the descriptor heap management system is CPU descriptor heap. CPU descriptor heaps are used by the engine to store resource views when a new resource is created. Since there are total four descriptor heap types, the system maintains four CPUDescriptorHeap instances (the heaps are part of the render device). Every CPU descriptor heap keeps a pool of Descriptor Heap Allocation Managers and a list of managers that have unused descriptors:

// Pool of descriptor heap managers
std::vector<DescriptorHeapAllocationManager> m_HeapPool;
// Indices of available descriptor heap managers
std::set<size_t> m_AvailableHeaps;

The following figure gives an example of the contents of the CPU descriptor heap object:

When allocating a new descriptor, the CPUDescriptorHeap class goes through the list of managers that have available descriptors and tries to process the request using every manager. If there are no available managers or no manager was able to handle the request, the function creates a new descriptor heap manager and lets it handles the request. The source code of the allocation function is given in the listing below:

DescriptorHeapAllocation CPUDescriptorHeap::Allocate( uint32_t Count )
{
    std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
    DescriptorHeapAllocation Allocation;
    // Go through all descriptor heap managers that have free descriptors
    for (auto AvailableHeapIt = m_AvailableHeaps.begin(); AvailableHeapIt != m_AvailableHeaps.end(); 
         ++AvailableHeapIt)
    {
        // Try to allocate descriptors using the current descriptor heap manager
        Allocation = m_HeapPool[*AvailableHeapIt].Allocate(Count);
        // Remove the manager from the pool if it has no more available descriptors
        if(m_HeapPool[*AvailableHeapIt].GetNumAvailableDescriptors() == 0)
            m_AvailableHeaps.erase(*AvailableHeapIt);

        // Terminate the loop if descriptor was successfully allocated, otherwise
        // go to the next manager
        if(Allocation.GetCpuHandle().ptr != 0)
            break;
    }

    // If there were no available descriptor heap managers or no manager was able
    // to suffice the allocation request, create a new manager
    if(Allocation.GetCpuHandle().ptr == 0)
    {
        // Make sure the heap is large enough to accomodate the requested number of descriptors
        m_HeapDesc.NumDescriptors = std::max(m_HeapDesc.NumDescriptors, static_cast<UINT>(Count));
        // Create a new descriptor heap manager. Note that this constructor creates a new D3D12
        // descriptor heap and references the entire heap. Pool index is used as manager ID
        m_HeapPool.emplace_back( m_MemAllocator, m_pDeviceD3D12Impl, this, 
                                 m_HeapPool.size(), m_HeapDesc );
        auto NewHeapIt = m_AvailableHeaps.insert(m_HeapPool.size()-1);

        // Use the new manager to allocate descriptor handles
        Allocation = m_HeapPool[*NewHeapIt.first].Allocate(Count);
    }

    m_CurrentSize += (Allocation.GetCpuHandle().ptr != 0) ? Count : 0;
    m_MaxHeapSize = std::max(m_MaxHeapSize, m_CurrentSize);

    return Allocation;
}

For instance, if we request a new allocation with five descriptors, the function will first ask manager [1] to handle this request, but it will fail as it only has maximum two consecutive descriptors. The function will then ask manager [2], which will be able to handle the request:

If after that, we ask to allocate three descriptors, no managers will be able to handle this request and the function will add new manager to the pool and use it to handle the request:

Deallocation routine calls Free() method of the appropriate allocation manager. Recall that the method is called from the destructor of DescriptorHeapAllocation. Note that the function uses GetAllocationManagerId() to retrieve the index of the manager that created this allocation:

void CPUDescriptorHeap::Free(DescriptorHeapAllocation&& Allocation)
{
    std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
    auto ManagerId = Allocation.GetAllocationManagerId();
    m_CurrentSize -= static_cast<Uint32>(Allocation.GetNumHandles());
    m_HeapPool[ManagerId].Free(std::move(Allocation));
}

Finally, there is usual method that must be called at the end of the frame to release all stale allocations when it is safe to do so. Note that it is this method that returns the manager to the list of available managers. Only after descriptors have been actually released is it safe to do so.

void CPUDescriptorHeap::ReleaseStaleAllocations(Uint64 NumCompletedFrames)
{
    std::lock_guard<std::mutex> LockGuard(m_AllocationMutex);
    for (size_t HeapManagerInd = 0; HeapManagerInd < m_HeapPool.size(); ++HeapManagerInd)
    {
        m_HeapPool[HeapManagerInd].ReleaseStaleAllocations(NumCompletedFrames);
        // Return the manager to the pool of available managers if it has available descriptors
        if(m_HeapPool[HeapManagerInd].GetNumAvailableDescriptors() > 0)
            m_AvailableHeaps.insert(HeapManagerInd);
    }
}

GPU Descriptor Heap

The main goal of the CPU descriptor heap is to provide storage for the resource view descriptors. For GPU to be able to access the descriptors, they must reside in a shade-visible descriptor heap. Only one SRV_CBV_UAV and one SAMPLER heap can be bound to the GPU at the same time. Source descriptors may be scattered across several CPU-only descriptor heaps, but must be consolidated in the same SRV_CBV_UAV or SAMPLER heap before a draw command can be executed. As a result, GPUDescriptorHeap object contains only single D3D12 descriptor heap. The space is broken into two parts: the first part is intended to keep rarely changing descriptor handles (corresponding to static and mutable variables). The second part is used to hold dynamic descriptor handles, i.e., temporary handles that live during the current frame only. While the first part is shared between all threads, it would be very inefficient to have the second part organized the same way. Dynamic descriptor handle allocation can potentially be very frequent operation, and if several threads record commands simultaneously, allocating dynamic descriptor handles from the same pool will be a bottleneck. To avoid this problem, dynamic descriptor handle allocation is a two stage process. On the first stage, every command context recording commands allocates a chunk of descriptors from the shared dynamic part of the GPU descriptor heap. This operation requires exclusive access to the GPU heap, but happens infrequently. The second stage is suballoction from that chunk. This part is lock-free and can be done in parallel by every thread. The structure of the GPU heap can then be depicted as shown below:

There are two classes that implement the strategy described above. The GPUDescriptorHeap manages the two parts of the heap and DynamicSuballocationsManager handles suballocations within the dynamic part. As we talked above, GPUDescriptorHeap class contains two descriptor heap allocation managers, one for static allocations, one for dynamic allocations:

DescriptorHeapAllocationManager m_HeapAllocationManager;
DescriptorHeapAllocationManager m_DynamicAllocationsManager;

Note that both these allocation managers are initialized to perform suballocations from the same D3D12 descriptor heap. Also, the first manager is assigned id 0, the second one is assigned id 1. The class provides two methods to allocate from static and dynamic parts of the heap:

DescriptorHeapAllocation GPUDescriptorHeap::Allocate(uint32_t Count)
{
    std::lock_guard<std::mutex> LockGuard(m_AllocMutex);
    DescriptorHeapAllocation Allocation = m_HeapAllocationManager.Allocate(Count);
    return Allocation;
}

DescriptorHeapAllocation GPUDescriptorHeap::AllocateDynamic(uint32_t Count)
{
    std::lock_guard<std::mutex> LockGuard(m_DynAllocMutex);
    DescriptorHeapAllocation Allocation = m_DynamicAllocationsManager.Allocate(Count);
    return Allocation;
}

There is only one Free() method as manager id can be used to understand if allocation belongs to the static or dynamic part:

void GPUDescriptorHeap::Free(DescriptorHeapAllocation&& Allocation)
{
    auto MgrId = Allocation.GetAllocationManagerId();
    if(MgrId == 0)
    {
        std::lock_guard<std::mutex> LockGuard(m_AllocMutex);
        m_HeapAllocationManager.Free(std::move(Allocation));
    }
    else
    {
        std::lock_guard<std::mutex> LockGuard(m_DynAllocMutex);
        m_DynamicAllocationsManager.Free(std::move(Allocation));
    }
}

Note that all methods lock mutexes to acquire exclusive access to the allocation managers. AllocateDynamic() method is solely used by the DynamicSuballocationsManager class to allocate a chunk of heap to perform suballocations from. The class maintains a list of chunks allocated from the main GPU descriptor heap as well as the offset within the current chunk:

std::vector<DescriptorHeapAllocation> m_Suballocations;
Uint32 m_CurrentSuballocationOffset = 0;

During every frame, allocations are performed in a linear fashion. The allocation method fist checks if there is enough space for the requested number of descriptors in the current chunk. If there is not, the method requests a new chunk from the main GPU descriptor heap. The suballocation then happens from the new chunk:

DescriptorHeapAllocation DynamicSuballocationsManager::Allocate(Uint32 Count)
{
    // Check if there are no chunks or the last chunk does not have enough space
    if( m_Suballocations.empty() ||
        m_CurrentSuballocationOffset + Count > m_Suballocations.back().GetNumHandles() )
    {
        // Request new chunk from the GPU descriptor heap
        auto SuballocationSize = std::max(m_DynamicChunkSize, Count);
        auto NewDynamicSubAllocation = m_ParentGPUHeap.AllocateDynamic(SuballocationSize);
        m_Suballocations.emplace_back(std::move(NewDynamicSubAllocation));
        m_CurrentSuballocationOffset = 0;
    }

    // Perform suballocation from the last chunk
    auto &CurrentSuballocation = m_Suballocations.back();
   
    auto ManagerId = CurrentSuballocation.GetAllocationManagerId();
    DescriptorHeapAllocation Allocation( 
        this,
        CurrentSuballocation.GetDescriptorHeap(),
        CurrentSuballocation.GetCpuHandle(m_CurrentSuballocationOffset),
        CurrentSuballocation.GetGpuHandle(m_CurrentSuballocationOffset),
        Count,
        static_cast<Uint16>(ManagerId) );
    m_CurrentSuballocationOffset += Count;

    return Allocation;
}

Note that this method is lock-free as every context has its own suballocations manager. The thread may only be blocked when a new chunk is requested from the main GPU descriptor heap, but this is infrequent situation.

Suballocations are not released individually, so DynamicSuballocationsManager::Free() method does nothing. Instead, all allocations are discarded when command list from this context is recorded and executed by the render device:

void DynamicSuballocationsManager::DiscardAllocations(Uint64 FrameNumber)
{
    m_Suballocations.clear();
}

Clearing the vector causes all Descriptor Heap Allocation objects to be destroyed, which in turns calls their destructors. Destructors call GPUDescriptorHeap::Free() method of the parent GPU descriptor heap, which adds the allocation to the release queue. The allocations are actually released few frames later.

The Big Picture

Now when we presented every individual component, we can describe how they interact with each other and the rest of the system. There are four shared CPU-only descriptor heaps (CBV_SRV_UAV, SAMPLER, RTV and DSV) implemented by CPUDescriptorHeap class, and two shader-visible (GPU) descriptor heaps (CBV_SRV_UAV and SAMPLER) implemented by GPUDescriptorHeap class. Every device context that is used for recording commands contains two dynamic suballocation managers (corresponding to two shader-visible descriptor heap types) represented by DynamicSuballocationsManager class. CPU descriptor heaps are used when a new resource view is created. GPU descriptor heaps are used by the shader resource binding system to allocate storage for shader-visible descriptors. They also used for allocation of dynamic descriptors.

Usage Scenarios

Let's now talk about few scenarios where descriptor heaps are involved.

Creating Resource View

Let's first consider how resource views are created using the example of creating a shader resource view (SRV) of a texture. The process proceeds as follows:

An allocation containing single descriptor handle is requested from the CBV_SRV_UAV CPU-only descriptor heap. Descriptor heap allocation goes as discussed above through the following steps:
- The CPUDescriptorHeap::Allocate() method acquires exclusive access to the CPU descriptor heap object
- The method iterates over descriptor heap managers that have available descriptor handles and requests one-descriptor allocation
  - Since only one descriptor handle is requested, the very first manager will be able to handle the request
- If there are no available managers, new manager (and a new D3D12 descriptor heap) is created to handle the request
D3D12 render device is used to initialize shader resource view in the allocated descriptor (see ID3D12Device::CreateShaderResourceView on MSDN)
Descriptor Heap Allocation object is kept as part of the resource view object and is destroyed when resource view object is released. At this point:
- Destructor of the Descriptor Heap Allocation object calls CPUDescriptorHeap::Free() that locks the heap and calls DescriptorHeapAllocationManager::Free() method of the allocation manager that created the allocation
- The manager inserts allocation attributes (offset and size) along with the frame number into the deletion queue
- Few frames later when frame completion fence is signaled, the allocation is actually released by CPUDescriptorHeap::ReleaseStaleAllocations() method

Creating all types of texture views (SRV, RTV, DSV and UAV) as well as all types of buffer views is done in the same way.

Allocating Dynamic Descriptor

Let's now recap how dynamic descriptors are allocated:

The context which needs dynamic descriptor uses one of its two dynamic suballocation managers (CBV_SRV_UAV or SAMPLER) to request the desired type of descriptor handle
- The suballocation manager checks if the last chunk contains enough space to suffice the allocation request. In most situations, that will be the case and the descriptor handles will be suballocated from this chunk
- If there is no enough space, the suballocation manager reuquests the main GPU descriptor heap to allocate new chunk of descriptor handles. The handles are then suballocated from the new chunk
At the end of the frame, the suballocation manager disposes all chunks which go back to the GPU descriptor heap
- The GPU descriptor heap inserts all chunks along with the frame number into the release queue
- Few frames later when frame completion fence is signaled, the chunks are actually released and the space becomes available for new allocations

Shader Resource Binding

Diligent Engine uses shader resource binding model that includes three types of shader resources based on the frequency of change (static, mutable and dynamic) as well as shader resource binding object. When new shader resource binding object is created, it allocates space in the GPU descriptor heap for its mutable and static resources. The allocation is kept by the shader resource binding object and is released when the owning object is destroyed. This topic will be discussed in details in a separate post.

Multithreading and GPU-Safety Concerns

The descriptor heap management system is correct, safe and efficient in a multithreaded environment. All three types of allocations (CPU descriptor, static/mutable GPU descriptor and dynamic GPU descriptor) proceed through thread-safe paths. CPU and static/mutable descriptor allocation functions (CPUDescriptorHeap::Allocate() , GPUDescriptorHeap::Allocate()) acquire exclusive access to descriptor heap objects and potentially may block other threads. However, descriptor allocation is fast and constitutes only a tiny portion of work associated with resource creation, so this is not a problem. Dynamic descriptor heap allocation (DynamicSuballocationsManager::Allocate()) is free-threaded, so can be called in parallel by many threads with no performance cost (the same context should not be used by different threads simultaneously). The only blocking function is GPUDescriptorHeap::AllocateDynamic(), but it is only called occasionally.

Deallocation is more complicated as besides CPU-side safety the system must also make sure that descriptors are not used by the GPU. CPU-side safety is achieved by protecting the deallocation methods (CPUDescriptorHeap::Free(), and GPUDescriptorHeap::Free()) with mutexes. GPU-side safety is assured by recording the command list number when the allocation is destroyed. For CPU and static/mutable GPU descriptors, it does not matter which thread releases the allocation. As long as there are no more references, the allocation can never be used again in any new GPU command, but it may be referenced by the commands pending execution by GPU. So at the moment when allocation is released, it is added by the deleting thread into the deletion queue along with the current command list number. Deletion queues are purged once at the end of each frame by the render device. The device knows how many command lists have actually been completed by the GPU and can release all allocations that are referenced by completed commands.

For dynamic descriptors, deallocation happens when command list from the context is closed and executed. It does not matter which thread recorded the list. As long as it has been sent to the command queue for execution (from any thread), all dynamic descriptors are stale and can be discarded. So the context returns all chunks back to the GPU descriptor heap object, which adds them to the release queue. For a deferred context that means that until it is executed, all dynamic descriptors are unavailable for use by other contexts.

Discussion

In the current implementation, same CPU descriptor heap objects are used to allocate resource view descriptor handles on all threads. We did not notice this to be a problem as descriptor heap allocation/deallocation is very fast unless new CPU descriptor heap needs to be created. This however should not be a problem as the descriptor heap manager size can be specified at the initialization time to furnish the applications demands. The system provides methods to query the maximum size that every heap achieved during the application run time.

Careful reader may have noticed that GPUDescriptorHeap class uses generic DescriptorHeapAllocationManager to allocate dynamic chunks of equal sizes. The only situation when the chunk size may be different is when the number of requested descriptors is larger than the default chunk size. This however a very untypical situation, so a more efficient fixed-size block allocator may be used instead of the variable-size allocations manager.

Diligent Engine currently supports only single GPU descriptor heap of each type (CBV_SRV_UAV and SAMPLER). While the first heap can contain large number of descriptor handles (1,000,000+), sampler heap size is limited to 2048 descriptors, which can potentially lead to heap exhaustion. However, in most cases, the type of the sampler in the shader is known in advance and never changes. D3D12 introduced a concept of static samplers to handle such cases, which is also exposed by Diligent Engine. Static samplers should be used whenever possible, and the number of static samplers is unlimited. So the sampler descriptor heap will be used only to keep descriptor handles of samplers that change at run-time, which is less typical situation.