Intel® Developer Zone offers tools and how-to information for cross-platform app development, platform and technology information, code samples, and peer expertise to help developers innovate and succeed. Join our communities for Android, Internet of Things, Intel® RealSense™ Technology, and Windows to download tools, access dev kits, share ideas with like-minded developers, and participate in hackathon’s, contests, roadshows, and local events.
To get the most out of the x86 platform there are a number of performance optimizations you can apply to your project that help to maximize performance. In this guide, we will show a variety of tools to use as well as features in the Unity* software that can help you enhance the performance of your Unity project. We will discuss how to handle items like texture quality, batching, culling, light baking, and HDR effects.
By the end of this guide you will be able to identify performance issues and what they are bound to, key optimizations, and methodologies for good game development in Unity. First we will go over some of the tools available that will make it easy to identify potential hot spots in your application.
We will explore three main tools in this guide: Unity Profiler, GPA System Analyzer, and GPA Frame Analyzer. Each tool is powerful in its own right in regards to solid game development. If you are able to use all three, you will realize significant progress in streamlining and optimizing your game.
Figure 1. The Unity Profiler main screen
The Unity Profiler
The Unity Profiler (Figure 1) is an extremely powerful tool available in Unity that will help you identify issues in various subsystems used in your project. The profiler graph section has different sub-profilers that show metrics for specific hardware. The current sub-profilers available include CPU usage, GPU usage, Rendering, Memory, Audio, Physics, and Physics 2D. Each of these sub-profilers is further broken down into sections of relevant components that can be isolated to drill down into specifics. For example, CPU usage contains Rendering, Scripts, Physics, GarbageCollector, Vsync, and Others sections.
Below the graph section is the Overview window where you can see a list of metrics including timing info and memory allocations for various Unity subsystems. Everything from rendering to garbage collection is shown here, and it is a good idea to check the sections of your app that take the longest time for optimization opportunities. Clicking on any section of the graph will pause updates in the profiler and allow you to investigate the highlighted frame.
The Unity Profiler can be attached to a running app in the editor or in a standalone build. It is recommended to always attach to a standalone build when trying to get the most accurate timings to avoid the overhead of the editor. This can be done by going to the ‘Active Profiler’ button towards the top of the window and selecting from the available instances of ‘Android Player’ that were detected over ADB (Android Debug Bridge) as well as anything on the network.
Another option is to ‘Deep Profile’ the app. This option is not recommended for ordinary use as it will actually instrument all mono code, which can lead to a lot of overhead when profiling. Luckily, Unity has a way to explicitly instrument any code segment you are interested in. Figure 2 shows how to instrument the code so it will appear in the profiler with whatever label you supply:
Figure 2. Setting a code segment for use in Profiler
The GPA System Analyzer
Figure 3. GPA System Analyzer real-time view
The Intel® Graphics Performance Analyzers (Intel® GPA) is a suite of graphics analysis and optimization tools to help game developers make games and other graphics-intensive applications run faster. Intel GPA provides extensive functionality to allow developers to perform in-depth analysis of graphics API calls to determine where the primary performance issues arise. Many of the experiments and metrics shown in this guide are from Intel GPA. Intel GPA lets you study the graphics workload of DirectX* apps on Windows* and OpenGL ES* apps on select Intel® processor systems running Android. While it cannot directly monitor OpenGL* API calls, you can still use GPA System Analyzer to study GPU and CPU metrics as your OpenGL game runs. Regardless of the graphics API, you can also use GPA Platform Analyzer to see the detailed CPU load, including any OpenCL™ activity. If you want a closer look, Intel GPA has an API for adding your own instrumentation. The GPA toolset works on Android as well as desktop, and you can learn more and download Intel GPA here: www.intel.com/software/GPA/
The first step is to use Intel GPA to collect real-time performance information. Intel GPA has two different modes for real-time data display (both shown above): The heads-up display (HUD) that runs on top of your application and the System Analyzer that connects to your test system across the network. Either tool can show metrics from the DirectX pipeline (OpenGL ES pipeline on some Intel processors), CPU utilization, and system power. On supported Intel processor graphics systems, you also get extensive GPU hardware metrics. The HUD and System Analyzer provide simple experiments to help you quickly detect performance issues. See the Intel GPA documentation for more details on the HUD and System Analyzers features and functionality.
Figure 4. GPA System Analyzer alternative HUD
To include a metric’s values in the analysis, simply drag it from the left sidebar into the main graphing area. The tools will work on ARM* devices, but will not have all of the metrics that are available on Intel processor-based hardware. For further information, check out the GPA tutorials for Windows and OS X. The following groups of metrics are available on Intel hardware:
- Device IO
- Execution Units
- Fragment Shader
- Vertex Shader
For CPU bottlenecks, you may find Platform Analyzer useful for DirectX and OpenGL workloads. It displays a captured trace of CPU activity. If you add instrumentation to your code, you can correlate individual tasks running on the CPU and watch their progress through DirectX, the driver, and in to the GPU. To help you determine bottlenecks, GPA contains a ‘State Overrides’ section (Figure 5) that allows you to perform experiments by checking frame rate fluctuations against changing conditions. A few examples:
Figure 5. Available overrides
- Texture 2x2
- Fetching data from high res textures can be expensive. This will replace all textures used in the scene with 2x2 textures. Significant performance changes resulting from checking this option might suggest some textures could be reduced in size to improve frame rate.
- Null Hardware
- This will simulate an infinitely fast GPU. If this increases frame rate, your code is likely driver or CPU bound.
- Disable Draw Calls
- This will simulate a very fast driver, indicating that your code may be driver bound if the frame rate fluctuates.
- Simple Fragment Shader
- This will replace all shaders with a very simple fragment shader. Fluctuation may indicate that shaders should be optimized for a performance bump.
Below the experiments section is the platform settings slider. This feature allows you to run the CPU at various frequencies. This will help determine bottlenecks, even if your game / app is running at max frame rate on any device you are using to test. This can also be used to verify that your game / app will run on the widest range of devices. Another great use of the CPU frequency slider is to force a specific frequency to prevent technologies like Intel® Turbo Boost from skewing test results.
Finally, you can click the camera icon towards the top of the window to take a frame capture. The system analyzer will then record everything that goes into producing a single frame of your game / app (state changes, timings, textures, etc.). This information will be saved in a file that can be opened by the Frame Analyzer tool to enable a deeper dive.
The GPA Frame Analyzer
Figure 6. GPA Frame Analyzer showing change records and associated frame info
The Frame Analyzer tool (Figure 5) allows you to open up a single frame capture. The captured frame will contain records of all state changes, resources, timing info, and much more. At the top of the window is a graph that displays each individual draw call recorded in the frame. These draw calls are all separated per render target for easy visualization. The X and Y values of the graph can be changed via the drop-down menus on the top left. On the left is a list of the individual render targets. The lower left section shows a preview of the currently highlighted draw calls and how they appear on the frame. Various options allow you to customize the view, including highlighting the pixels drawn to or keeping them normal. You also have the option to adjust how everything that is not selected affects the preview (hidden or not). In the bottom right is a set of tabs to get more insight into the currently selected draw calls including:
- Frame Overview
- Timing / state values broken down by stages in the GPU pipeline for the entire frame
Figure 7. Values reported in the Frame Overview section
- Timing / state values broken down by stages in the GPU pipeline for the draw calls currently selected in the graph / tree.
- Texture (Figure 8 below)
- A list of currently bound textures
- The left sidebar under the texture tab can be used to verify compression, format, mip levels, etc.
Figure 8. View of textures used in a few draw calls
- The state settings for selected draw call(s)
- Can be edited to view the effect on render target preview and timings
- API Log
- Displays all of the API calls used for the selected draw calls. This can be immensely useful in tracking down unnecessary state changes that can impact performance.