Brought to you by the Intel® Visual Computing Source
Download the source code and watch the demo

Introduction

This sample demonstrates how to render large-scale terrains on Intel® microarchitecture codename Sandy Bridge in real time by efficiently distributing the tasks between the CPU cores and the processor graphics unit. The sample pre-processes an input height map into a hierarchical quadtree representation which is used to render the terrain with adaptively selected level of detail (LOD). The adaptive simplified triangulation calculated during the pre-processing is compactly encoded to save runtime processing and memory space. LOD construction is asynchronously performed by the CPU cores while rendering is done by the processor graphics unit.

Application

Terrain Rendering is an application using DXUT and Microsoft DirectX* 11 with D3D_FEATURE_LEVEL_10_0. The application handles all rendering, user interaction and GUI. Upon initialization, the application loads all models, allocates resources and compiles shaders. On the first run, the application pre-calculates triangulation, which can take some time (up to one minute), and stores it on the disk. On subsequent runs, the application loads the data from the disk.

Overview

Terrain rendering demonstrates how to render large-scale terrains on Intel microarchitecture codename Sandy Bridge in real time by efficiently distributing the tasks between the CPU cores and the processor graphics unit. The terrain rendering can be optimized by constructing simplified triangulation, which is adaptive to the terrain surface characteristics. Such triangulation contains more primitives in sharp regions with high-frequency details and allocates a small number of large triangles for flat areas. This significantly reduces the total triangle count while providing almost the same visual quality (compare fig. 1 and 2).

319399/image001.gif

Figure 1: Full-detail triangulation, 1 million triangles, 40 fps on Intel® microarchitecture codename Sandy Bridge, 1280x1024 screen resolution

319399/image002.gif

Figure 2: Adaptive simplified triangulation, 200k triangles, 112 fps on Intel® microarchitecture codename Sandy Bridge, 1280x1024 screen resolution

Full-resolution triangulation (fig. 1) is more than 5x redundant compared to adaptive (fig. 2), which results in nearly 3x lower performance with almost the same visual quality.

Pre-computing adaptive triangulation is a complex and computation-intensive task. Doing this at runtime could require a significant amount of time and would lead to perceptible stalls or delays. To solve this problem, the application pre-computes adaptive triangulation for the whole terrain at the pre-process stage and stores the resulting data on the disk in a compact representation (see section 4). For example, the whole encoded triangulation for an 8192×8192 terrain consumes approximately 6 MB (compare this to 128 MB required to store the height map). At the runtime stage, this data is used to efficiently construct the triangulation.

The triangulation must be adaptive not only to the terrain surface features, but also to the camera position, because distant terrain regions can be rendered using coarser representation without loss of visual fidelity. To support multiple levels of detail (LODs), the input height map is pre-processed and a patch quadtree data structure [3] is constructed (fig. 3). At runtime, the level of detail selection is done on a per-patch level, not on the triangle level. If the adaptive triangulation were constructed every frame, it would require intensive data transfer between CPU and GPU memory, which is not efficient. With patch-based LOD selection, the data is uploaded to the GPU memory only when new patches are created, which happens once in a number of frames.

319399/image003.gif

Figure 3: Three levels of a patch quadtree

Each patch in the hierarchy is a height map of a fixed size with additional vertices required to seamlessly connect neighboring patches. Each patch covers the same area as its four direct children, but approximates the terrain with lower accuracy. Each patch is assigned a unique adaptive triangulation which is pre-computed and encoded as described in section 4.

For each patch in the hierarchy, a world-space geometric error τ is pre-computed during the pre-processing. This error shows the maximum geometric deviation of the patch polygonal surface to the height map samples at the finest resolution covered by the patch. Given τ and camera position, we can estimate the patch screen-space error, which is the maximum visible deviation of the simplified model to the samples of the original full-detail height map, using the following standard formula [3]:

where W and H are width and height of the view port, are horizontal and vertical fields of view and dist(V,c) is the distance from the camera c to the nearest point on the patch bounding box V.

At the runtime stage, an unbalanced patch quadtree is maintained with leaf nodes satisfying the given screen-space error bound . Each patch in this tree stores height map, normal map and adaptive triangulation indices. On each frame, a recursive procedure is executed which updates the tree with respect to new camera location (fig. 4). The procedure creates new patches and allocates resources for terrain regions where additional accuracy is required (), and coarsens the representation where LOD has become unreasonably high ().

319399/image010.gif

Figure 4: Updating unbalanced patch quadtree.

Constructing a triangulation from the encoded representation is efficiently handled by the CPU cores while rendering is performed by the graphics unit. This technique is useful because it significantly reduces the GPU burden by constructing a simplified adaptive triangulation. The most useful application of this technique would be a system that has excess CPU computational power, but is utilizing all of its GPU power.

Adaptive Triangulation

As it was mentioned above, each patch in the quadtree is assigned its own unique triangulation which is adaptive to local terrain surface features. This triangulation is computed at the pre-process stage and stored on the disk. The adaptive triangulation construction exploits the method described in [1] and [2]. To build the triangulation, all samples of the patch are assigned to different levels as shown in figure 5. Note that to construct an adaptive triangulation, a quadtree data structure is also used. To distinguish this quadtree from the patch quadtree described above, we refer to it as the vertices quadtree.

319399/image011.gif

Figure 5: Assignment of height map samples to different levels of a vertices quadtree

To guarantee that a triangulation constructed from a vertices quadtree does not contain cracks, the quadtree is restricted with the dependency graph shown in figure 6. Every vertex depends on two other vertices of the same or the next finest level in the vertices quadtree hierarchy. Border vertices depend only on one other vertex. This means that if the vertex is selected for triangulation, then the related ones must be selected too.

319399/image012.gif

Figure 6: Dependency relations on the vertices of the restricted quadtree.

The described above vertices quadtree is tightly bound to the triangle binary tree. The root of the triangle binary tree (level 0) is represented by a single right triangle. Level n of the tree is obtained by bisecting each triangle in level n-1 along its longest edge (fig. 7).

319399/image013.gif

Figure 7: Triangle binary tree first four levels.

The coarsest possible patch triangulation is represented by two right triangles, which are the roots of two triangle binary trees. The vertex in the middle of the triangles’ longest edge is called base. If some vertex is included into the restricted quadtree, it is called enabled, or disabled otherwise. Now, if we have the correct set of enabled vertices (with all dependencies properly kept), a crack-free triangulation can be constructed at runtime using the following simple recursive procedure which starts from the root:

1.        if( triangle base vertex is enabled ) 

2.        { 

3.        bisect the triangle 

4.        process two new triangles 

5.        } 

6.        else 

7.        output current triangle to the list

Algorithm 1: Constructing adaptive triangulation using set of enabled flags

319399/image014.gif

Figure 8: Enabled vertices and corresponding triangulation.

To determine a set of enabled vertices, the following bottom-up algorithm is executed during the pre-process:

1.        Clear enabled_array[] with false 

2.        for ( quadtree level l = finest resolution to coarsest resolution ) 

3.        for ( each vertex v in level l ) 

4.        { 

5.        if ( for all vertices d which v is dependent on: enabled_array[d] == false ) 

6.        { 

7.        merge two triangles for which v is base vertex 

8.        calculate the coalesced triangle world space approximation error e 

9.        if ( e < threshold ) 

10.     enabled_array[v] = false 

11.     else 

12.     enabled_array[v] = true 

13.     } 

14.     else 

15.     enabled_array[v] = true 

16.     }

Algorithm 2: Determining enabled vertices.

Triangle world-space approximation error is the maximum vertical distance of all vertices covered by the triangle to the triangle plane. Adaptive triangulation constructed with Algorithm 2 described above guarantees that the maximum geometric deviation of the simplified triangulation from the original height map is below the given threshold.

It is now clear that the whole patch triangulation is thoroughly described by the set of flags indicating whether or not the vertex is enabled. These flags can be efficiently encoded during the recursive traversal by outputting 1-bit flags. This is done with the following algorithm, very similar to Algorithm 1:

1.        if ( triangle base vertex is enabled ) 

2.        { 

3.        output 1 

4.        bisect the triangle 

5.        process two new triangles 

6.        } 

7.        else 

8.        output 0

Algorithm 3: Encoding enabled flags.

Thus during the pre-processing, Algorithm 2 is first executed to determine the set of enabled vertices, which is followed by Algorithm 3 encoding them. At runtime, Algorithm 1 is executed, which uses pre-computed data to construct the adaptive triangulation.

Implementation Details

The terrain rendering system is built up as a collection of logically independent components. The principal architecture of the system is shown in figure 9.

319399/image015.gif

Figure 9: Terrain rendering system architecture principal scheme

The system contains an elevation data source (implemented with CElevationDataSource class) and an encoded triangulation data source (implemented as CTriangDataSource class). The former provides access to the elevation data through the GetElevData() method that creates an instance of CPatchElevationData class and returns pointer to it. The CPatchElevationData class provides access to the stored height map data through the GetDataPtr() method.

The triangulation data source follows the same philosophy: when it is necessary to create an adaptive triangulation for some patch, it creates an instance of CRQTTriangulation class with the call to DecodeTriangulation(). The triangulation indices are generated by the call to GenerateIndices(). CRQTTriangulation class is also responsible for determining enabled vertices and implements the described above Algorithm 2 by the CTriangDataSource::CreateAdaptiveTriangulation() method.

Algorithms 1 and 3 are implemented by the CRQTTriangulation::RecursiveGenerateIndices() method. The method can operate in two modes, encoding and decoding (indicated by the m_bIsEncodingMode flag). In the first mode, the method encodes the triangulation using enabled flags (pre-computed by Algorithm 2) as input. Depending on the triangle level and orientation, the method determines which vertex is its base. It then reads its enabled/disabled status from the array and outputs a corresponding 1-bit flag into the output bit stream.

In decoding mode, the method reads the flags from the bit stream (which is loaded from the disk) and sets appropriate enable/disable status in the enabled flags array. At the same time, the method generates the triangulation.

All patch Microsoft Direct3D* resources are stored in an instance of CTerrainPatch class. The resource management is handled by the D3D resource cache. When a new texture is required, the cache attempts to find an appropriate unused resource. If there are no spare resources, the cache creates a new one. When resource is no longer needed, it is not released, but placed into the cache. The cache is thread-safe so a number of threads can access it simultaneously.

The quadtree construction is implemented by the CBlockBasedAdaptiveModel class, while Microsoft DirectX* 11-specific methods are separated in CAdaptiveModelDX11Render.

Asynchronous Task Execution

At the runtime stage, a quadtree-based adaptive view-dependent terrain model is maintained. For this purpose, required patch elevation data and adaptive triangulation are extracted from the corresponding data sources, and terrain patches are created. To hide processing time required to perform these tasks and eliminate stalls, the level-of-detail processing is done asynchronously. Since LOD can be either increased or decreased, there are two types of tasks that can be performed by the system, which are implemented by CIncreaseLODTask and CDecreaseLODTask classes. These classes are derived from the base class CTaskBase which exposes Execute() virtual method. To manage the tasks, the system exploits TaskMgrTbb component.

Normal Map Compression

A normal map is used to shade the terrain surface. There is no need to store the normal map on the disk because it can be calculated in runtime from the patch height map. This saves disk space but requires moderate additional computations. Besides, if the terrain is dynamic and can be deformed, the normal map cannot be statically pre-computed. To improve visual quality, normal resolution can be higher than that of the height map. For instance, if the height map has resolution (2^n+1)×(2^n+1) samples, then normal map could have 4x higher resolution (4∙(2^n+1)×4∙(2^n+1) samples). To reduce memory storage requirements, normal maps are kept in compressed form. BC3 compression format is exploited that enables storing the normal map using 1 byte per normal. The compression is done asynchronously; the DXTCompressor component is exploited for this purpose.

Performance

Figure 10 shows the performance during the recorded fly-over captured on an Intel microarchitecture codename Sandy Bridge-based machine at 1280x1024 screen resolution.

319399/image016.gif

Figure 10: Performance during the recorded fly-over for screen space thresholds

With a three-pixel error threshold, the LOD changes are not annoying, while the average performance is more than 100 fps on the Intel microarchitecture codename Sandy Bridge-based machine. With a five- pixel threshold, geometry changes become much more apparent, but the performance increases by a factor of 1.5x.

The time that is required to increase LOD primarily depends on the selected normal map level-of-detail bias. With 4x normal map up-sample factor, it takes approximately 44 ms to process one patch on one core. Thus, for a model consisting of 150 patches, it takes approximately 6.6 seconds to construct the whole model on one core. Since all patches are processed independently, the workload could be evenly distributed across available cores, and as a result the time scales well. Note also that during a typical fly-over, only a number of patches are updated during one second (usually less than 10). If the camera position drastically changes, the model will asynchronously be updated during a number of frames, which will cause no stalls. Note that until the model is updated, it will be rendered in coarse resolution.

319399/image017.gif

Figure 11: Sample screenshot. Note that here a simple method is used to colorize differences in elevation but a much better looking terrain can be obtained by improving texture quality.

References

P. Lindstrom, D. Koller, W. Ribarsky, L. F. Hodges, N. Faust, and G. A. Turner. Real-time, continuous level of detail rendering of height fields. In Proc. SIGGRAPH 96, pages 109-118. ACM SIGGRAPH, 1996.
Renato Pajarola. Large scale terrain visualization using the restricted quadtree triangulation. In Proceedings Visualization 98, pages 19-26 and 515. IEEE Computer Society Press, 1998.
Thatcher Ulrich. Rendering massive terrains using chunked level of detail control. SIGGRAPH Course Notes (2002). Volume 3, Issue 5.