Click here to Skip to main content
15,879,184 members
Articles / High Performance Computing

DirectX 11 Compute Shaders

Rate me:
Please Sign up or sign in to vote.
4.58/5 (11 votes)
22 Feb 2013CPOL2 min read 94.8K   2.9K   37   17
HPC via Compute Shaders (GPGPU).

Introduction

This article introduces GPGPU via DirectX11 Compute Shaders.

GPGPU (General-Purpose Computing on Graphics Processing Units) involves using graphics processing units to perform repeated calculation, utilizing the vast array of processing elements available on the GPU.

This article will demonstrate a very simple trigonometric calculation executed on the GPU.

Additional attached code shows the classic use GPGPU with square matrix squaring (multiplication) by spawning nrow*nrow number of GPU threads. This example is chosen since the output elements can be calculated independently.

Background

GPGPU has been around for more than a year with NVIDIA introducing CUDA, AMD introducing close to metal and AMD stream, and many other enthusiasts trying to use DirectX9 pixel shaders to achieve GPGPU.

Using the code

The attached code is compiled using VS2010 Beta 1 using libraries from DirectX SDK (August 2009) on Windows 7 RC. This code will not run on Windows XP since DirectX11 is not available for Windows XP. Some parts of the source code are picked up from DirectX SDK August 09 samples and adapted to suite the program.

The code starting point is Start(void*). The program is divided into the following sub parts:

Creation of a device (the easiest part)

Use D3D_DRIVER_TYPE_REFERENCE for emulation, and D3D_DRIVER_TYPE_HARDWARE to run code on GPU (you will require hardware support for this).

C++
D3D11CreateDevice( NULL,D3D_DRIVER_TYPE_REFERENCE/*D3D_DRIVER_TYPE_HARDWARE*/, 
  NULL, D3D11_CREATE_DEVICE_SINGLETHREADED|D3D11_CREATE_DEVICE_DEBUG, 
  NULL, 0,D3D11_SDK_VERSION, &pDeviceOut, &flOut, &pContextOut );

Load the GPU

The tough bit is the programmer must load the buffers to the GPU for processing. The attached source code will shed a lot more light on this:

C++
//for input buffer

HRESULT CreateStructuredBufferOnGPU( ID3D11Device* pDevice, 
        UINT uElementSize, UINT uCount, VOID* pInitData, 
        ID3D11Buffer** ppBufOut )
{

    *ppBufOut = NULL;
    D3D11_BUFFER_DESC desc;
    ZeroMemory( &desc, sizeof(desc) );

    desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE;
    desc.ByteWidth = uElementSize * uCount;
    desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
    desc.StructureByteStride = uElementSize;

    if ( pInitData )
    {
    D3D11_SUBRESOURCE_DATA InitData;
    InitData.pSysMem = pInitData;
    return pDevice->CreateBuffer( &desc, &InitData, ppBufOut );
    }
    else
        return pDevice->CreateBuffer( &desc, NULL, ppBufOut );
}

//for input buffer
HRESULT CreateBufferSRV( ID3D11Device* pDevice, ID3D11Buffer* pBuffer, 
                         ID3D11ShaderResourceView** ppSRVOut )
{

    D3D11_BUFFER_DESC descBuf;
    ZeroMemory( &descBuf, sizeof(descBuf) );
    pBuffer->GetDesc( &descBuf );
    D3D11_SHADER_RESOURCE_VIEW_DESC desc;
    ZeroMemory( &desc, sizeof(desc) );
    desc.ViewDimension = D3D11_SRV_DIMENSION_BUFFEREX;
    desc.BufferEx.FirstElement = 0;

    if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS )
    {
        // This is a Raw Buffer
        desc.Format = DXGI_FORMAT_R32_TYPELESS;
        desc.BufferEx.Flags = D3D11_BUFFEREX_SRV_FLAG_RAW;
        desc.BufferEx.NumElements = descBuf.ByteWidth / 4;
    }
    else
        if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_STRUCTURED )
        {

            // This is a Structured Buffer
            desc.Format = DXGI_FORMAT_UNKNOWN;
            desc.BufferEx.NumElements = 
               descBuf.ByteWidth / descBuf.StructureByteStride;
        }
        else
        {
            return E_INVALIDARG;
        }
    return pDevice->CreateShaderResourceView( pBuffer, &desc, ppSRVOut );
}

//for output buffer    
HRESULT CreateBufferUAV( ID3D11Device* pDevice, ID3D11Buffer* pBuffer, 
                         ID3D11UnorderedAccessView** ppUAVOut )
{
    D3D11_BUFFER_DESC descBuf;
    ZeroMemory( &descBuf, sizeof(descBuf) );
    pBuffer->GetDesc( &descBuf );

    D3D11_UNORDERED_ACCESS_VIEW_DESC desc;
    ZeroMemory( &desc, sizeof(desc) );
    desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
    desc.Buffer.FirstElement = 0;

    if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS )
    {
        // This is a Raw Buffer
        desc.Format = DXGI_FORMAT_R32_TYPELESS;
        // Format must be DXGI_FORMAT_R32_TYPELESS,
        // when creating Raw Unordered Access View

        desc.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
        desc.Buffer.NumElements = descBuf.ByteWidth / 4; 
    }
    else
        if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_STRUCTURED )
        {
            // This is a Structured Buffer
            desc.Format = DXGI_FORMAT_UNKNOWN;
            // Format must be must be DXGI_FORMAT_UNKNOWN,
            // when creating a View of a Structured Buffer

            desc.Buffer.NumElements = 
                 descBuf.ByteWidth / descBuf.StructureByteStride; 
        }
        else
        {
            return E_INVALIDARG;
        }
    return pDevice->CreateUnorderedAccessView( pBuffer, &desc, ppUAVOut );
}

Run

This command dispatches the data to the processing elements available to the GPU, and its performance is directly related to the hardware and driver support (this is for the device created using D3D_DRIVER_TYPE_HARDWARE).

C++
pd3dImmediateContext->Dispatch( X, Y, Z );

Read output buffer

Earlier, using DirectX9, this part was the most painful bit, but with DirectX 11 Compute Shaders, this has become a lot easier.

First, create a temporary read buffer with the CPU access flag set to D3D11_CPU_ACCESS_READ. Then, copy the buffer, and map it to a pointer as shown below:

C++
pd3dImmediateContext->CopyResource( debugbuf, pBuffer );
BufType *p;
pContextOut->Map( debugbuf, 0, D3D11_MAP_READ, 0, &MappedResource );
p = (BufType*)MappedResource.pData; //p will hold the output buffer

Points of interest

With Compute Shaders, we can implement Physics based simulations involving liquids (probably my next project).

I have also implemented compute shader using Vulkan: https://bitbucket.org/asif_bahrainwala/matrix-multiply/src/master/

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Instructor / Trainer
India India
Hi,
I have been working with computers since my eight grade, programming the ZX Spectrum. I have always had an interest in assembly language and computer theory (and is still the reason for taking tons of online courses), actively code using C/C++ on Windows (using VS) and Linux (using QT).

I also provide training on data structures, algorithms, parallel patterns library , Graphics (DX11), GPGPUs (DX11-CS,AMP) and programming for performance on x86.
Feel free to call me at 0091-9823018914 (UTC +5:30)



(All views expressed here do not reflect the views of my employer).

Comments and Discussions

 
QuestionOutput Mapped Resource not getting more than 16777216 elements Pin
Juan Pablo Echevarria14-Jun-17 10:22
Juan Pablo Echevarria14-Jun-17 10:22 
AnswerRe: Output Mapped Resource not getting more than 16777216 elements Pin
Rick York14-Jun-17 15:17
mveRick York14-Jun-17 15:17 
GeneralRe: Output Mapped Resource not getting more than 16777216 elements Pin
Juan Pablo Echevarria15-Jun-17 4:36
Juan Pablo Echevarria15-Jun-17 4:36 
QuestionOpenCl based matrix multiplication Pin
Asif Bahrainwala2-Aug-15 8:33
Asif Bahrainwala2-Aug-15 8:33 
QuestionMatrix multiplication using AMP Pin
Asif Bahrainwala22-Feb-13 1:25
Asif Bahrainwala22-Feb-13 1:25 
GeneralMy vote of 3 Pin
GPUToaster™22-Nov-10 22:03
GPUToaster™22-Nov-10 22:03 
GeneralRe: My vote of 3 Pin
Rei Miyasaka25-Jul-11 0:01
Rei Miyasaka25-Jul-11 0:01 
GeneralRe: My vote of 3 Pin
GPUToaster™25-Jul-11 1:26
GPUToaster™25-Jul-11 1:26 
General"specialized Graphics API DX11 is not conforming the GPGPU" Pin
Asif Bahrainwala25-Jul-11 1:42
Asif Bahrainwala25-Jul-11 1:42 
GeneralRe: "specialized Graphics API DX11 is not conforming the GPGPU" Pin
GPUToaster™25-Jul-11 4:46
GPUToaster™25-Jul-11 4:46 
Generalcomparison with DX10 Pin
Asif Bahrainwala8-Feb-10 2:35
Asif Bahrainwala8-Feb-10 2:35 
GeneralMy vote of 2 Pin
virtualnik2-Jan-10 5:29
virtualnik2-Jan-10 5:29 
GeneralLooks like a copy of the SDK DirectCompute sample Pin
JohnWallis4221-Dec-09 15:03
JohnWallis4221-Dec-09 15:03 
GeneralRe: Looks like a copy of the SDK DirectCompute sample Pin
Asif Bahrainwala20-Jan-10 8:27
Asif Bahrainwala20-Jan-10 8:27 
GeneralDirectX SDK download Pin
MooseBoys425-Oct-09 7:39
MooseBoys425-Oct-09 7:39 
GeneralMy vote of 2 Pin
Turms29-Sep-09 3:40
Turms29-Sep-09 3:40 
There is nothing about GPGPU and intitled Compute Shaders... It's all simply about creating buffers for CS. Seems unfinished.
GeneralRe: My vote of 2 Pin
Asif Bahrainwala30-Sep-09 0:45
Asif Bahrainwala30-Sep-09 0:45 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.