Shader compile flags

We are investigating how to change shader compiler flags in Unreal.

After setting r.DumpShaderDebugWorkerCommandLine=1 in ConsoleVariables.ini we can see in the dump that shaders are compiled using %FXC% ShaderName.usf /E vsShaderNameTextured0 /Zpr /Gec /O3 /T vs_5_0 /Ni /FcShaderName.d3dasm.

Apparently, Unreal sets the /Zpf shader compiler flag which packs matrices in row-major order, see documentation. In order to use our existing shader code (which should ideally stay unchanged), we need to control this flag.

We have identified two (and a half) ways of addressing this:

  1. Implement
static inline void ModifyCompilationEnvironment(
    const FGlobalShaderPermutationParameters& Parameters,
    ShaderCompilerEnvironment& OutEnvironment)

in the shader class and do something like

auto flags = OutEnvironment.CompilerFlags.GetData();

uint64 mask = ~D3DCOMPILE_PACK_MATRIX_ROW_MAJOR; 
flags &= mask;

OutEnvironment.CompilerFlags = FShaderCompilerFlags(flags); 

(D3DCOMPILE_PACK_MATRIX_ROW_MAJOR is the equivalent to /Zpf)
The problem with that approach is that Unreal only handles a small subset of flags when translating them; this is the code from the engine:

static uint32 TranslateCompilerFlagD3D11(ECompilerFlags CompilerFlag)
{
	switch(CompilerFlag)
	{
	case CFLAG_PreferFlowControl: return D3DCOMPILE_PREFER_FLOW_CONTROL;
	case CFLAG_AvoidFlowControl: return D3DCOMPILE_AVOID_FLOW_CONTROL;
	case CFLAG_WarningsAsErrors: return D3DCOMPILE_WARNINGS_ARE_ERRORS;
	default: return 0;
	};
}

Hence, our changes do not have any effect.

  1. Add row/column_major qualifier to the shader. This can either be done globally via #pragma pack_matrix(column_major) or as type qualifier for the actual matrix like column_major float4x4 mvp;

For some reason, both of these changes have no effect at all. We understand that these qualifiers have no effect for objects declared in the shader but this should work for the matrix we upload.

One could of course just transpose the matrix on the CPU before uploading it but we feel that it would be more elegant and probably performant to switch the matrix packing.

Does anyone have any experience with this?