Optimizing Draw Call Batching Using
Transient Data-Guided Texture Atlases

Master’s thesis, November 25, 2017

Author

Jordy van Dortmont [1,2]

Supervisors

Bas Zalmstra [1], Wishnu Prasetya [2], Jacco Bikker [2]

[1] Abbey Games
[2] Utrecht University

Introduction

Texture mapping is a common practice in 3D games that involves the wrapping and mapping of pixels onto a surface of a 3D model. Usually these pixels originate from a 2D image, a texture, that adds detail to a 3D surface. For example, a floor can be textured with an image of the top of a wooden plank. In 2D games however 3D models are replaced by 2D sprites. A sprite is a 2D entity that can be manipulated in the game, by rotation or translation for example. To visualize a sprite, a texture is used.

Typically a game has many different entities with different textures that need to be drawn in a frame, so a lot of draw calls are issued from the CPU to the GPU when rendering a frame, which can severely impact CPU performance and make the frame rate CPU-bound. To improve performance, draw calls with the same texture and no other changes to the rendering pipeline in-between are batched together.

Texture atlases are used to reduce the number of times a different texture is bound. A texture atlas contains multiple textures that can be drawn together in one draw call, because a texture atlas is stored in memory as one larger texture, so the texture does not need to change between draw calls. An example of a texture atlas can be seen in figure 1. When rendering a texture that was separate before, we look it up in the texture atlas. Optimally all textures that are drawn together in a frame reside in one single atlas. However, texture atlases also need to be loaded into GPU memory and the maximum texture dimensions are limited. In addition, loading a single atlas of which only a small number of textures is used, uses a lot of GPU memory, which leads to poor performance. If a texture atlas exceeds the maximum texture dimensions, a single atlas cannot be created, the textures need to be divided over multiple atlases. Multiple atlases do not have to be loaded into memory at the same time and can be unloaded anytime, when the containing textures are not in use anymore. Textures are usually packed into atlases by manual selection for atlases when building the assets. Unfortunately it can be hard to achieve optimal atlases, because the manual process of composing the texture atlases is cumbersome for large numbers of textures, the usage of textures might not be apparent and building texture atlases before running the game is not optimal due to the possibly varying and dynamic usage of textures that cannot be predetermined. For example when a player assembles a team of characters with different sets of textures, a different combination of those sets of textures can be used each time the player assembles a new team. Although it would reduce draw calls if the textures of all characters were in one atlas, they might not all fit and even when they would fit, GPU memory is wasted by loading textures of characters that are not going to be used. The assembly of the team by the player cannot be predetermined, so the characters may all need to be put into separate atlases to ensure consistent performance.

Figure 1: A texture atlas with an Egyptian theme from Renowned Explorers: International Society.

Approach

When composing texture atlases out of textures, we need to know which textures break a draw call batch frequently and which textures can be packed into the same texture atlas. To automate the traditionally manual collection of this knowledge, data is gathered from rendering a game. We call this atlas telemetry. When a draw call is issued, the bound textures are registered, because those textures are used to draw the batch. When the render pipeline state only changes because of the texture, we might be able to merge the draw call batch with the previous batch if the textures bound for the previous batch are packed together into an atlas with the textures bound for the current batch. If this is true we make pairs of these textures and count how many times this occurs. Unfortunately, we should not make pairs of all textures that satisfy this condition, because textures might have different storage formats or different mipmap levels, so we cannot put them in the same atlas.

We can extract compositions of the atlases from the data gathered by the atlas telemetry. We process the pairs of the textures and whenever a pair exists we put both textures into the same atlas composition. However, a texture atlas composition might not be enough to create texture atlases optimized for draw call batching if the textures in the composition do not fit into one texture atlas, so we also need a way to partition the textures into multiple compositions based on the atlas telemetry data. For the case that this happens when building texture atlases, we create a graph of textures from the pairs in which each pair resembles an edge between two textures as nodes with a weight based on the draw call batch breaking frequency. A visualization of such a texture graph can be seen in figure 2.

Figure 2: A visualization of a texture graph.

Using the atlas compositions and the texture graph we automatically build data-guided texture atlases before run-time without any manual intervention necessary. When the bin packing algorithm cannot pack the textures into one atlas, the textures can be partitioned into multiple atlases based on the texture graph or according to the bin packing algorithm itself.

Although we have automated the composition of texture atlases, the texture atlases are still not prepared for unpredictable use of textures as is the case in the example for the assembly of a team of characters with different sets of textures. To accomplish this we create transient data-guided texture atlases. These atlases are transient, because they will only exist in GPU memory while running a game, and might not even have to exist for the whole run. While running a game we gather which textures are loaded. After we have finished loading those textures into GPU memory, we create new atlases from existing textures. Existing textures can be both texture atlases and separate textures. We combine the loaded textures into a larger texture atlas composition by checking their mipmaps and format. Thereafter we pack the textures with a bin packing algorithm and create a new texture in GPU memory that can contain the larger texture atlas composition. If the packing of the new texture atlas composition has larger dimensions than the maximum dimensions of a texture on the GPU, a partitioning algorithm is used to partition the transient texture atlas composition into multiple compositions based on the texture graph, as is also performed for prebuilt data-guided atlases. After valid compositions have been created, we copy each loaded texture to its transient texture atlas on the GPU, if it was put into a transient texture atlas composition. When we create transient data-guided atlases for our running example, we can combine the characters of our assembled team into one transient texture atlas.

Conclusion

We automated the composition of texture atlases and using data-guided texture atlases created from the automated atlas compositions we achieved similar or better performance than manually composed texture atlases by reducing the number of draw calls significantly. Not only is automation possible, but it is also a viable approach to reducing the number of draw calls, eliminating or at least decreasing the need for cumbersome manual composition of texture atlases.

When developing the approach for composing textures into transient texture atlases on the CPU and creating the transient texture atlases on the GPU at run-time we built upon the approach of automating the composition of texture atlases. Transient data-guided texture atlases enable the optimization of draw call batching by automatically combining textures, which themselves can be texture atlases, into texture atlases on load. The transient texture atlases can utilize the full maximum texture dimensions of the GPU and are created based on data and suitable for dynamic, unpredictable and emergent texture usage. The overhead turns out to be far less than the performance gain and allows real-time rendering.