It isn't related to the shader backend any longer. The nvts_enable in
the ffp code isn't quite right as well, it should be moved away once
there is a dedicated nvts fragment pipeline replacement
Calling shader_select() from inside depth_blt() isn't necessarily
safe. shader_select() assumes CompileShader() has been called for the
current shaders, but that depends on STATE_VSHADER / STATE_PIXELSHADER
being applied. That isn't always true when depth_blt() gets called,
with the result that sometimes GLSL programs could be created with no
shader objects attached.
The previous logic assumed that if NVTS or ATIFS are available they
will be used. This happens to be true for NVTS, but ATIFS is only used
if neither ARBFP nor GLSL are supported. This breaks fixed function
fragment processing on ATI r300 and newer cards
The whole control structures in directx.c get terribly confusing with
the various codepaths for texturing and different shader
implementations. It is also hard to reflect the shader model
decisions this way too. This patch moves the shader specific parts of
the caps code into the shader backend where we can set our caps
dependent of the shader model decisions and without complex caps flag
checks.
Generating the shader ID and parts of the shader prolog and epilog was
done by the common vertexshader.c / pixelshader.c, which is ugly.
This patch doesn't get rid of all the uglyness, somewhen we'll still
have to sort out the relationship of [arb|glsl]_generate_shader and
[arb|glsl]_generate_declarations.
Add a new property of the shader backend which indicates whether the
shader backend is able to dirtify single constants rather than
dirtifying vshader and pshader constants as a whole. Depending on this
a different Set*ConstantF implementation is used which marks constants
dirty. The ARB shader backend uses this and marks constants clean
after uploading.
shader_get_registers_used is delayed until compile time for some 1.x
shaders, mostly to wait for the right vertex declaration to be
set. This means that on a recompile it will be run again, adding
another instance of each local constant, which in turn causes compile
errors because of constant redeclaration. Just purging the lists
before finding the constants is a simple and reliable solution.
The GL_ARB_vertex_program extension does not define a standard value for
output texture coordinates. This makes problems when using vertex
shaders with fixed function fragment processing because fffp divides the
texture coords by its .w component. This means that gl shaders have to
write to the .w component of texture coords. Direct3D shaders however
do not.
As spotted by Christoph Bumiller, these branches are now never
reached. Also, at least in the case of WINED3DSIO_TEXM3x3SPEC and
WINED3DSIO_TEXM3x3VSPEC the old code was not quite correct, since we
can lookup rather than guess the texture type these days.
- Implement if, else, endif, rep, endrep, break
- Implement ifc, breakc, using undocumented comparison bits in the instruction token
- Fix bug in main loop processing of codes with no dst token
- Fix bug in GLSL output modifier processing of codes with no dst token
- Fix bug in loop implementation (src1 contains the integer data, src0 is aL)
- Add versioning for all the instructions above, and remove
GLSL_REQUIRED thing, which is useless and should be removed from all
opcodes in general.
- move DEF, DEFI, DEFB handling into the register counting pass
- keep track of defined constants as a linked list (because there's a
few of them)
- apply immediate constants after global constants in the constant
loading function
- both types of constants now get loaded with array notation in the
shader (into the same array)
- currently half the shader selection code (GLSL vs ARB) is in
fillGLcaps. The parts that check for software shaders are in
GetDeviceCaps. That placement, will work, but is definitely not optimal.
FillGLcaps should detect support - it should not make decision as to
what's used, because that's not what the purpose of the function is.
GetDeviceCaps should report support as it has already been selected.
Instead, select shader mode in its own function, called in the
appropriate places.
- unifying pixel and vertex shaders into a single selection is a
mistake. A software vertex shader can be coupled with a hardware arb or
glsl pixel shader, or no shader at all. Split them back into two and add
a SHADER_NONE variant.
- drawprim is doing support checks for ARB_PROGRAM, and making shader
decisions based on that - that's wrong, support has already been
checked, and decided upon, and shaders can be implemented via software,
ARB_PROGRAm or GLSL, so that support check isn't valid.
- Store the shader selected mode into the shader itself. Different types
of shaders can be combined, so this is an improvement. In fact, storing
the mode into the settings globally is a mistake as well - it should be
done per device, since different cards have different capabilities.
This fixes the translations for a few instructions in GLSL and allows
Cubemap sampling in pixel shaders < 2.0. It makes some of the
lighting on textures in Half Life 2 look better, including some of the
water effects. It's not perfect yet, but much closer now.
- Implement D3DSIO_DP2ADD, D3DSIO_TEXKILL, D3DSIO_TEXM3X3PAD
- Partially implement D3DSIO_TEXBEM, D3DSIO_TEXM3X3VSPEC (as much as
they are implemented in ARB_fragment_program at least).
- Stop copying the SHADER_PARSE_STATE struct in each ARB shader
routine - use a pointer instead.
- Separate the declaration phase of the shader string generator into
the arb and glsl specific files.
- Add declarations and recognition for application-sent constant
integers and booleans (locally defined ones will follow).
- Standardize capitilization of pixel/vertex specific variable names.
- Implemented: D3DSIO_SGN, LOOP, ENDLOOP, LOGP, LIT, DST, SINCOS
- Process instruction-based modifiers (function existed, it just
wasn't being called)
- Add loop checking to register maps.
- Renamed "sng" to "sgn" for D3DSIO_SGN - it's not handled anywhere
except for GLSL, so won't matter.
There are a total of 17 instructions without a destination token. Of
those 9 have num_params != 0, which means that we will not process any
of them correctly, because we assume the first token (if present) is a
destination token.
Those are basically all the flow control instructions, which we plan to
support very soon. They have source tokens, and no destination. Add a
flag that marks them up to the ins table. Use this flag in the trace
pass, and generation pass.
- track sampler declarations and store the sampler usage in reg_maps structure
- store a fake sampler usage for 1.X shaders (defined as 2D sampler)
- re-sync glsl TEX implementation with the ARB one (no idea why they diverged..)
- use sampler type in new TEX implementation to support 2D, 3D, and Cube sampling
- change drawprim to bind pixel shader samplers
Additional improvements:
- rename texture limit to texcoord to prevent confusion
- add sampler limit, and use that for samplers - *not* the same as texcoord above
SM 3.0 can pack multiple "semantics" into 12 generic input/output registers.
To support that, define temporaries called IN and OUT, and use those as
the output registers. At the end of the vshader, unpack the OUT temps
into the proper GL variables. At the beginning of the pshader, pack the
GL variables back into 12 IN registers.
Various cleanups:
- do not use DWORD as a bitmask, that places artificial limit of 32 on
registers
- track attributes that are used and declare only those
- move declarations function call in pshader/vshader to allow us to
insert pixel or vertex specific code between the declarations and
the rest of the code
- remove redundant 0 intializers
- remove useless continue statement
Now that the declaration function is out of the way, the tracing pass,
which is very long and 100% the same can be shared between pixel and
vertex shaders.
The new function is called shader_trace_init(), and is responsible for:
- tracing the shader
- initializing the function length
- setting the shader version [needed very early]
The new function is called in pass 2 (getister counting/maps), and
it's now in baseshader. It operates on all INPUT and OUTPUT registers,
which, in addition to the old vertex shader input declarations covers
Shader Model 3.0 vshader output and pshader input declarations. The
result is stored into the reg_map structure.
Delete the entire namedArrays code path and all its dependencies (one
of which is quite long - storeOrder in drawprim is always FALSE, for
example). Delete declaredArrays, and make its code path the default.
I missed a register mask in the move to share the shader_hw_def()
function between pixel and vertex shaders for ARB shaders. Fixed
that, and made the GLSL version use the same mask for consistency.
- Declare more variable names for GLSL programs.
- Some of these won't need to be declared eventually, but it doesn't hurt to do it for now.
- Correct output name for pixel shaders (gl_FragColor instead of glFragColor).
- Add a new file glsl_shader.c which contains almost every GLSL specific function we'll need
- Move print_glsl_info() into glsl_shader.c
- Move the shader_reg_maps struct info into the private header, and make it part of SHADER_OPCODE_ARG.
- Create a new shared ps/vs register map for float constants (future patch will make ARB programs use this, too)
Each instruction can have a predication token. Account for it in the
trace pass, register count pass, and store it in the SHADER_OPCODE_ARG
structure for generation. MSDN claims the token is at the end of the
instruction, but that's not true - testing a demo, which lets me
manipulate the shader shows the predication token is the first source
token immediately following the destination token.
Currently we hardcode a0.x, which I think is correct for shaders 1.0.
However, for shaders 2.0, we must look into the address token, and
print the register there. Handle both cases to correct the trace.
Change the trace pass, the register counting pass, and the hw
generator pass to take into account the new get_params() function. For
hw generation, store the address tokens into the SHADER_OPCODE_ARG
structure, so they're available to generator functions.
Add a new function to process parameters.
On shaders 1.0, processing parameters amounts to *pToken++.
On shaders 2.0+, we have a relative addressing token to account for.
This function should be used, instead of relying on num_params everywhere.
Share shader_dump_ins_modifiers(), and make vertex shaders use it.
The saturate modifer (_sat) is valid on vs_3_0+, and it isn't being
shown in the trace.