Using GL_NV_vertex_program2_option so far. If we're really desparate we can
handle some cases without the extension by using a custom varying and texkill
in the fragment program.
This gives a small performance improvement. Don't enable NVfp for it though,
because the NVfp penalty is bigger than the gain from this patch. But if NVfp
is enabled anyway, make use of it.
This reverts patch ba35760f9f.
The original patch did not achive its goal, because CMP is a macro that is
expanded to SLT, SGE, MUL, MAD, at least on nvidia hardware. To make matters
worse, it uses a temporary register, and the assembler usually is not clever
enough to find a free temporary from the shader code. If we generate the code
outselves we can pick one of our temps for this job.
Many 2.0 and 3.0 shaders end with a "mov oC0, rx". If sRGB writing is enabled,
the ARB backend writes to a TMP_COLOR temporary, and at the end of the shader
writes the sRGB corrected color to result.color. If oC0 is not partially
rewritten after the mov, we can ignore the mov, not declare TMP_COLOR at all,
and just use the rx register as input for the sRGB correction code. This saves
a temporary and an instruction.
This reduces the number of methods in the shader backend(the instr
modifiers can be handled in that wrapper) and it will help flow
control emulation in the ARB backend.
SCS is unfortunately a fragment program only instruction. If we have the NV
extensions we can use SIN and COS. Otherwise we have to approximate sine and
cosine with a taylor series. Luckily we're provided with the necessary
constants by the application.
TMP_POS is only used in vertex shaders, declare it in the vshader
specific code. The sRGB constants are only used by pixel shaders, so
move them to the ps specific code, and avoid reading the stateblock.
To be able keep the temporary register in the type independent NRM
instruction, the vertex temporary register is renamed to TA to match
the name of a pixel shader register.
texm3x2pad knows which register the following texm3x2depth or tex instruction
will use, and it knows that this register is uninitialized. So use it for
temporary storage instead of TMP.
This is the Nth attemt to make clipping work with GLSL shaders. The patch now
uses the GLSL quirk table to handle cards that need a custom varying for
gl_ClipPos, and the code is adapted to the changed state table and shader
backend system.
This simplifies the loading code a bit. The constants were never
designed to be at the same location in all shaders, so there's no
point in using program.env. This way we don't collide with the d3d
shader constants and its easier to work together with NP2 fixups and
other shaders.
This was needed unconditionally in the past to apply fog, but since we're
using the ARBfp fog defines it is only needed if an sRGB correction is done
at the end of the shader.
ps_1_3 uses Tx to pass in texture coordinates, but also as temporary
registers. ps_1_4 and ps_2_0 only use them for texture coordinates. This patch
gets rid of the Tx = fragment.texcoord[x] assign in all shader versions, and
doesn't even declare Tx in ps_1_4 and ps_2_0.
The <=ps_1_3 instructions know which kind of input they expect from the Tx
register, so the instruction handlers now know if they have to read the
tempreg Tx or the varying fragment.texcoord[x].
shader_arb_add_src_param handled DW and TXP undid it again. Remove DZ DW from
the modifiers and handle it in the instruction. DZ cannot be handled by TXP as
is, so move the .z component to .w and make it DW-like. Using SZW+TXP is
likely more efficient than the RCP, MUL, TEX we'd get if we let
shader_arb_add_src_param do the job.
Use shader_arb_get_dst_param instead of get_register_name to find the register
name. Even though this adds support for modifiers(which aren't allowed by
native), this shouldn't hurt. If an app passes in an incorrect shader it
should be caught in the frontend.
Mostly based on the code of pshader_gen_input_modifier_line. The space-adding
behavior of shader_arb_add_src_param was removed because the plurality of
instruction handlers passes an uninitialized buffer in and expects a register
name written to its start, and only map2gl and rcp_rsq use the space-adding
stuff. I'll change rcp_rsq in a later patch anyway. I changed the name to
shader_arb_get_src_param to reflect this behavior.
Use shader_arb_add_instruction_modifiers instead. This avoids calling the
fixup function from each single instruction handler to handle shifts. It does
not yet get rid of the modifier handler in each instruction because we don't
want a separate line if we can just append _SAT to the instruction name.
This is needed to raise the number of advertised constants to the GL
limit. The ARB assembler ususally does not optimize away unused
constants, so we have to do this.
This moves the GLSL and ARB specific reserved constants out of directx.c into
the get_caps methods of the shader backends. That way the number of reserved
constants remains in the backend.
GL_LIMITS({v/p}shader_constantsF) now contains the real number of constants as
advertised by GL instead of some mixture of GL info and backend implementation
specifics. This makes it easier for backends to decide how many constants to
use.
shader_select takes care of compiling the shader since we have the shader
duplication infrastructure. However, there's still a motivation to skip the
shader_select call if the vertex shader is dirty.
The color correction cannot be done behind the back of the individual
instruction handlers because it might conflict with the instruction's
color modifications and the D3D provided writemask.
The fog settings do not depend on wether the shader writes to oFog or not,
instead they depend on the FOGVERTEXMODE and FOGTABLEMODE settings, and if a
vertex shader is bound at all.
It works the same way as with the fixed function, and having a vertex shader
is the same as using pretransformed vertices, just that the fog coord comes
from the shader instead of the specular color:
FOGTABLEMODE != NONE: The Z coord is used, oFog is ignored
FOGTABLEMODE == NONE, with VS: oFog is used
FOGTABLEMODE == NONE, no VS, XYZ: Z is used
FOGTABLEMODE == NONE, no VS, XYZRHW: diffuse color is used
Most callers work on a stateblock rather than a device, and the main fields
we check (vertexShader and pixelShader) are part of the stateblock as well.
This is a first step towards cleaning up the fog mess. The fog
parameter is added to the pixelshader compile args structure. That way
multiple pshaders are compiled for different fog settings, and the
pixel shader can remove the fog line if fog is not enabled. That way
we don't need special fog start and end settings, and this allows us
to implement EXP and EXP2 fog in the future too.
Based on a patch by Stefan Dösinger. This is more flexible, and allows
the shader backend implementation to be simpler, since it doesn't have
to know about specific formats. The next patch makes use of this.
Some stateblock parameters have to be compiled into the GL pixel
shader code, like lines for pixelformat fixups. This leads to problems
when applications switch those settings, requiring a recompilation of
the shader. This patch enables wined3d to have multiple GL shaders for
a D3D shader(pixel shaders only so far) to handle this more
efficiently.
This was suggested by Ivan quite a while ago, and we need it to better
handle conflicting texture format corrections and similar stateblock
value changes which until now required a recompilation of the entire
shader
A number of considerations contribute to this:
1) The shader backend knows best which shader(s) it needs. GLSL needs
both, arb only one
2) The shader backend may pass some parameters to the compilation
code(e.g. which pixel format fixup to use)
3) The structures used in (2) are different in vs and ps, so a
baseshader::Compile won't work
4) The structures in (2) are wined3d-private structures, so
having a public method in the vtable won't work(its a bad idea
anyway).
The code here skipped constant loading when a pixel shader was in use,
and only reloaded them on ffp use if the shader implementation used
ARB too. This way a e.g. texfactor change could get lost if GLSL
shaders are used, and the texfactor changed while a pixel shader was
in use.
GL_ATI_envmap_bumpmap provides two things: Signed V8U8 pixel formats,
and bump mapping. The extension is only supported on fglrx, and this
driver also supports GL_ARB_fragment_program. Thus the bump mapping
code is never used on any driver out there. Furthermore, if it is
used, it tends to crash the driver
The signed pixel format is used, as it can be used by pixel shaders or
the ARBfp replacement. However, the format is broken in fglrx, and
negative values are clamped to 0.0. This results in test
failures. WineD3D has an alternative codepath using scale+bias to
enable V8U8 using a standard signed RGB which works correctly on
fglrx.
The current code properly enabled/disabled GL_ARB_fragment_program
after a depth blit, but it did not restore the bound fragment program
properly. This leads to problems if a depth blit was done between two
draws without any change of the fragment processing settings.