Based on a patch by Stefan Dösinger. This is more flexible, and allows
the shader backend implementation to be simpler, since it doesn't have
to know about specific formats. The next patch makes use of this.
Some stateblock parameters have to be compiled into the GL pixel
shader code, like lines for pixelformat fixups. This leads to problems
when applications switch those settings, requiring a recompilation of
the shader. This patch enables wined3d to have multiple GL shaders for
a D3D shader(pixel shaders only so far) to handle this more
efficiently.
This was suggested by Ivan quite a while ago, and we need it to better
handle conflicting texture format corrections and similar stateblock
value changes which until now required a recompilation of the entire
shader
A number of considerations contribute to this:
1) The shader backend knows best which shader(s) it needs. GLSL needs
both, arb only one
2) The shader backend may pass some parameters to the compilation
code(e.g. which pixel format fixup to use)
3) The structures used in (2) are different in vs and ps, so a
baseshader::Compile won't work
4) The structures in (2) are wined3d-private structures, so
having a public method in the vtable won't work(its a bad idea
anyway).
SM3.0 requires 10 4 component float varyings for passing stuff between
vertex and pixel shaders. GF7 and earlier report 8 generic varyings +
gl_Color and gl_SecondaryColor in GLSL. This patch allows us to use
gl_Color and gl_SecondaryColor to get 2 extra varyings, which some
games, like C&C3 with highest gfx settings, require.
Generating the shader ID and parts of the shader prolog and epilog was
done by the common vertexshader.c / pixelshader.c, which is ugly.
This patch doesn't get rid of all the uglyness, somewhen we'll still
have to sort out the relationship of [arb|glsl]_generate_shader and
[arb|glsl]_generate_declarations.
- Replace gl_FragColor with gl_FragData[0] for GLSL pixel shader output.
- Subtract 1 more constant from total GLSL allowed float constants to
accommodate the PROJECTION matrix row that we reference.
- Implement if, else, endif, rep, endrep, break
- Implement ifc, breakc, using undocumented comparison bits in the instruction token
- Fix bug in main loop processing of codes with no dst token
- Fix bug in GLSL output modifier processing of codes with no dst token
- Fix bug in loop implementation (src1 contains the integer data, src0 is aL)
- Add versioning for all the instructions above, and remove
GLSL_REQUIRED thing, which is useless and should be removed from all
opcodes in general.
- move DEF, DEFI, DEFB handling into the register counting pass
- keep track of defined constants as a linked list (because there's a
few of them)
- apply immediate constants after global constants in the constant
loading function
- both types of constants now get loaded with array notation in the
shader (into the same array)
- currently half the shader selection code (GLSL vs ARB) is in
fillGLcaps. The parts that check for software shaders are in
GetDeviceCaps. That placement, will work, but is definitely not optimal.
FillGLcaps should detect support - it should not make decision as to
what's used, because that's not what the purpose of the function is.
GetDeviceCaps should report support as it has already been selected.
Instead, select shader mode in its own function, called in the
appropriate places.
- unifying pixel and vertex shaders into a single selection is a
mistake. A software vertex shader can be coupled with a hardware arb or
glsl pixel shader, or no shader at all. Split them back into two and add
a SHADER_NONE variant.
- drawprim is doing support checks for ARB_PROGRAM, and making shader
decisions based on that - that's wrong, support has already been
checked, and decided upon, and shaders can be implemented via software,
ARB_PROGRAm or GLSL, so that support check isn't valid.
- Store the shader selected mode into the shader itself. Different types
of shaders can be combined, so this is an improvement. In fact, storing
the mode into the settings globally is a mistake as well - it should be
done per device, since different cards have different capabilities.
- Implement D3DSIO_DP2ADD, D3DSIO_TEXKILL, D3DSIO_TEXM3X3PAD
- Partially implement D3DSIO_TEXBEM, D3DSIO_TEXM3X3VSPEC (as much as
they are implemented in ARB_fragment_program at least).
- Stop copying the SHADER_PARSE_STATE struct in each ARB shader
routine - use a pointer instead.
- Implemented: D3DSIO_SGN, LOOP, ENDLOOP, LOGP, LIT, DST, SINCOS
- Process instruction-based modifiers (function existed, it just
wasn't being called)
- Add loop checking to register maps.
- Renamed "sng" to "sgn" for D3DSIO_SGN - it's not handled anywhere
except for GLSL, so won't matter.
There are a total of 17 instructions without a destination token. Of
those 9 have num_params != 0, which means that we will not process any
of them correctly, because we assume the first token (if present) is a
destination token.
Those are basically all the flow control instructions, which we plan to
support very soon. They have source tokens, and no destination. Add a
flag that marks them up to the ins table. Use this flag in the trace
pass, and generation pass.
- track sampler declarations and store the sampler usage in reg_maps structure
- store a fake sampler usage for 1.X shaders (defined as 2D sampler)
- re-sync glsl TEX implementation with the ARB one (no idea why they diverged..)
- use sampler type in new TEX implementation to support 2D, 3D, and Cube sampling
- change drawprim to bind pixel shader samplers
Additional improvements:
- rename texture limit to texcoord to prevent confusion
- add sampler limit, and use that for samplers - *not* the same as texcoord above
SM 3.0 can pack multiple "semantics" into 12 generic input/output registers.
To support that, define temporaries called IN and OUT, and use those as
the output registers. At the end of the vshader, unpack the OUT temps
into the proper GL variables. At the beginning of the pshader, pack the
GL variables back into 12 IN registers.
Various cleanups:
- do not use DWORD as a bitmask, that places artificial limit of 32 on
registers
- track attributes that are used and declare only those
- move declarations function call in pshader/vshader to allow us to
insert pixel or vertex specific code between the declarations and
the rest of the code
- remove redundant 0 intializers
- remove useless continue statement
Now that the declaration function is out of the way, the tracing pass,
which is very long and 100% the same can be shared between pixel and
vertex shaders.
The new function is called shader_trace_init(), and is responsible for:
- tracing the shader
- initializing the function length
- setting the shader version [needed very early]
The new function is called in pass 2 (getister counting/maps), and
it's now in baseshader. It operates on all INPUT and OUTPUT registers,
which, in addition to the old vertex shader input declarations covers
Shader Model 3.0 vshader output and pshader input declarations. The
result is stored into the reg_map structure.