Porting Source To Linux
Porting Source To Linux
Overview
Who is this talk for? Why port? Windows->Linux Linux Tools Direct3D->OpenGL
Why port?
Why port?
Linux is open Linux (for gaming) is growing, and quickly Stepping stone to mobile Performance Steam for Linux
10%
1%
0%
Linux Mac Windows
Windows->Linux
Windowing issues
Consider SDL! Handles all cross-platform windowing issues, including on mobile OSes. Tight C implementationeverything you need, nothing you dont. Used for all Valve ports, and Linux Steam
http://www.libsdl.org/
Filesystem issues
Linux filesystems are case-sensitive Windows is not Not a big issue for deployment (because everyone ships packs of some sort) But an issue during development, with loose files Solution 1: Slam all assets to lower case, including directories, then tolower all file lookups (only adjust below root) Solution 2: Build file cache, look for similarly named files
Other issues
Bad Defines
E.g. Assuming that LINUX meant DEDICATED_SERVER
Locale issues
locale can break printf/scanf round-tripping Solution: Set locale to en_US.utf8, handle internationalization internally One problem: Not everyone has en_US.utf8so pop up a warning in that case.
Linux Tools
Telemetry
Telemetry is a performance visualization system on steroids, created by RAD Game Tools. Very low overhead (so you can leave it on all through development) Quickly identify long frames Then dig into guts of that frame
Telemetry Details
DX10 OpenGL 3
Streamlined API Geometry Shaders
DX11 OpenGL 4
Tessellation and Compute
Direct3D Support
D3D11 GPU / D3D11 Capable OS
D3D11
D3D10
D3D10 GPU / D3D9 Capable OS D3D9 (or below) GPU / All OSes Sep 2011
OpenGL Support
D3D11 GPU / D3D11 Capable OS
D3D11
D3D10
D3D10 GPU / D3D9 Capable OS D3D9 (or below) GPU / All OSes Sep 2011
D3D9
Feb 2013
togl
to GL A D3D9/10/11 implementation using OpenGL In application, using a DLL. Engine code is overwhelmingly (99.9%) unaware of which API is being usedeven rendering.
Source Engine
Matsys Shaderlib ShaderAPI
Direct3D
GPU
togl
to GL A D3D9/10/11 implementation using OpenGL In application, using a DLL. Engine code is overwhelmingly (99.9%) unaware of which API is being usedeven rendering.
Source Engine
Matsys Shaderlib ShaderAPI
CDirect3D9 (togl)
OpenGL GPU
Perf was a concern, but not a problemthis stack beats the shorter stack by ~20% in apples:apples testing.
Shaders
togl handles this, too!
GL / D3D differences
GL has thread local data
A thread can have at most one Context current A Context can be current on at most one thread Calls into the GL from a thread that has no current Context are specified to have no effect MakeCurrent affects relationship between current thread and a Context.
GL / D3D differences
GL is C based, objects referenced by handle
Many functions dont take a handle at all, act on currently selected object Handle is usually a GLuint.
GL extensions
NV|AMD|APPLE extensions are vendor specific (but may still be supported cross-vendor)
Ex: NV_bindless_texture
Core extensions
A core feature from a later GL version exposed as an extension to an earlier GL version.
GL tricks
When googling for GL functions, enums, etc, search with and without the leading gl or GL_ Reading specs will make you more powerful than you can possibly imagine Dont like where GL is heading? Join Khronos Group and shape your destiny.
GL objects
GL has many objects: textures, buffers, FBOs, etc. Current object reference unit is selected using a selector, then the object is bound. Modifications then apply to the currently bound object. Most object types have a default object 0.
Core vs Compatibility
Some IHVs assert Core will be faster No actual driver implementations have demonstrated this Tools starting with Core, but will add Compat features as needed. Some extensions / behaviors are outlawed by Core. Recommendation: Use what you need.
Useful extensions
EXT_direct_state_access EXT_swap_interval (and EXT_swap_control_tear) ARB_debug_output ARB_texture_storage ARB_sampler_objects
EXT_direct_state_access
Common functions take an object name directly, no binding needed for manipulation. Code is easier to read, less switching needed. More similar to D3D usage patterns
http://www.opengl.org/registry/specs/EXT/direct_state_access.txt
EXT_direct_state_access contd
GLint curTex; glGetIntegeriv( GL_TEXTURE_BINDING_2D, &curTex); glBindTexture( GL_TEXTURE_2D, 7 ); glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST ); glBindTexture( GL_TEXTURE_2D, curTex );
Becomes
glTextureParameteriEXT( 7, GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST );
EXT_swap_interval
Vsync, but can be changed dynamically at any time. Actually a WGL/GLX extension.
wglSwapInterval(1); // Enable VSYNC wglSwapInterval(0); // Disable VSYNC
http://www.opengl.org/wiki/Swap_Interval http://www.opengl.org/registry/specs/EXT/wgl_swap_control.txt http://www.opengl.org/registry/specs/EXT/swap_control.txt
EXT_swap_control_tear
XBox-style Swap-tear for the PC.
Requested by John Carmack.
First driver support a few weeks later All vendors supported within a few months
ARB_debug_output
You provide a callback when the driver detects an errorget fed a message. When the driver is in singlethreaded mode, you can see all the way back into your own stack. Supports fine-grained message control. And you can insert your own messages in the error stream from client code. Quality varies by vendor, but getting better.
ARB_debug_output contd
// Our simple callback void APIENTRY myErrorCallback( GLenum _source, GLenum _type, GLuint _id, GLenum _severity, GLsizei _length, const char* _message, void* _userParam) { printf("%s\n", _message); } // First check for GL_ARB_debug_output, then... glDebugMessageCallbackARB( myErrorCallback, NULL ); glEnable( GL_DEBUG_OUTPUT );
GL_GREMEDY_string_marker
D3DPERF-equivalent
GL_ARB_vertex_array_bgra
better matches UINT-expectations of D3D
GL_APPLE_client_storage / GL_APPLE_texture_range
Not for linux, but useful for Mac.
GL Pitfalls
Several pitfalls along the way
Functional
Texture State Handedness Texture origin differences Pixel Center Convention (D3D9->GL only)
Performance
MakeCurrent issues Driver Serialization
Texture State
By default, GL stores information about how to access a texture in a header that is directly tied to the texture.
Texture*
Sampler Info
Image Data
* Not to scale
ARB_sampler_objects
With ARB_sampler_objects, textures can now be accessed different ways through different units. Samplers take precedence over texture headers If sampler 0 is bound, the texture header will be read. No shader changes required
http://www.opengl.org/registry/specs/ARB/sampler_objects.txt
Pixel Centers
OpenGL matches D3D10+
MakeCurrent issues
Responsible for several bugs on TF2 Font rendering glitches (the thread creating text tries to update the texture page, but didnt own the context
MakeCurrent Performance
Single-threaded is best here. MakeCurrent is very expensivetry not to call even once/twice per frame.
MakeCurrent Fixed
Driver Serialization
Modern OpenGL drivers are dual-core / multithreaded
Your application speaks to a thin shim The shim moves data over to another thread to prepare for submission Similar to D3D
Issuing certain calls causes the shim to need to flush all work, then synchronize with the server thread. This is very expensive
Whether this gets you a Core or Compatibility context is unspecified , but most vendors give you Compatibility. Creating a robust context with a specific GL-support version requires using a WGL/GLX extension, and is trickier:
Vertex Attributes
glBindBuffer( GL_ARRAY_BUFFER, mPositions ); // glVertexAttribPointer remembers mPositions glVertexAttribPointer( mProgram_v4Pos, 4, GL_FLOAT, GL_FALSE, 0, 0 ); glEnableVertexAttribArray( mProgram_v4Pos ); glBindBuffer( GL_ARRAY_BUFFER, mNormals ); // glVertexAttribPointer remembers mNormals glVertexAttribPointer( mProgram_v3Normal, 3, GL_FLOAT, GL_FALSE, 0, 0 ); glEnableVertexAttribArray( mProgram_v3Normal );
ARB_vertex_attrib_binding
Separates Format from Binding Code is easy to read glVertexAttribFormat( 0, 4, GL_FLOAT, FALSE, 0 ); glVertexAttribBinding( 0, 0 ); glBindVertexBuffer( 0, buffer0, 0, 24 );
http://www.opengl.org/registry/specs/ARB/vertex_attrib_binding.txt
Render to Texture
Render-to-texture in GL utilizes Frame Buffer Objects (FBOs) FBOs are created like other objects, and have attachment points. Many color points, one depth, one stencil, one depth-stencil FBOs must be framebuffer complete to be rendered to. FBOs, like other container objects, are not shared between contexts.
http://www.opengl.org/registry/specs/ARB/framebuffer_object.txt
Frame Buffers
Spec has fantastic examples for creation, updating, etc, so not replicating here Watch BindRenderTarget (and BindDepthStencil) etc calls At draw time, check whether render targets are in an existing FBO configuration (exactly) via hash lookup If so, use it. If not, create a new FBO, bind attachments, check for completeness and store in cache.
Shaders/Programs
In GL, Shaders are attached to a Program.
Each Shader covers a single shader stage (VS, PS, etc)
Shaders are Compiled Programs are Linked The Program is used This clearly doesnt map particularly well to D3D, which supports mix-and-match.
Shaders/Programs contd
GL Uniforms == D3D Constants Uniforms are part of program state
Swapping out programs also swaps uniforms This also maps poorly to D3D.
Uniform problem
To solve the uniform problem, consider uniform buffer objects
Create a single buffer, bind to all programs Modify parameters in the buffer
Or, keep track of global uniform state and set values just prior to draw time If youre coming from D3D11, Uniform Buffers ARE Constant Buffersno problems there.
http://www.opengl.org/wiki/Uniform_Buffer_Object http://www.opengl.org/registry/specs/ARB/uniform_buffer_object.txt
Shader Translation
You have a pile of HLSL. You need to give GL GLSL.
ARB_vertex_program / ARB_fragment_program is a possible alternative, but only for DX9.
No *_tessellation_program
Performance tips
Profile Profile Profile
GL Debugging Tricks
Compare D3D to GL images Keep them both working on the same platform Bonus points: Have the game running on two machines, broadcast inputs to both, compare images in realtime.
Questions?
jmcdonald at nvidia dot com richg at valvesoftware dot com
Appendix
Some other GL gotchas/helpers
Performance tips
Force-inline is your friendmany of the functions youll be implementing are among the most-called functions in the application. With few exceptions, you can maintain a GL:D3D call ratio of 1:1 or less.
For example, use glBindMultiTextureEXT instead of glActiveTexture/glBindTexture. glBindMultiTextureEXT(texUnit, target, texture)
Sampler gotchas
On certain drivers, GL_TEXTURE_COMPARE_MODE (for shadow map lookups) is buggy when set via sampler. For robustness, use texture setting on those particular drivers.
Latched State
Recall that GL is very stateful. State set by an earlier call is often captured (latched) by a later call. Vertex Attributes are the prime example of this, but there are numerous other examples.
Textures (Creation)
GLuint texId = 0; // Says This handle is a texture glGenTextures(1, &texId);
// Allocates memory glTextureStorage2DEXT( texId, GL_TEXTURE_2D, mipCount, texFmt, mip0Width, mip0Height ); // Pushes datanote that conversion is performed if necessary foreach (mipLevel) { glTextureSubImage2DEXT( texId, GL_TEXTURE_2D, mipLevel, 0, 0, mipWidth, mipHeight, srcFmt, srcType, mipData ); }
Textures (Updating)
With TexStorage, updates are just like initial data specification (glTextureSubImage or glCompressedTextureSubImage). Texture->Texture updates are covered later On-GPU compression is straightforward, implemented in
https://code.google.com/p/nvidia-texture-tools/
Textures (Using)
// Binds texture 7 to texture unit 3. glBindMultiTextureEXT(3, GL_TEXTURE_2D, 7);
StretchRect
Implementing StretchRect in GL involves using Read/Write FBOs. Bind source as a read target Bind destination as a write target Draw! Alternatives:
No stretching/format conversion? EXT_copy_texture Stretching / format conversion? NV_draw_texture
GL
-w <= x <= w -w <= y <= w -w <= z <= w