If we have start = 4, count = 3, the highest dirty constant is the one with index 6. start + count gives 7,
so it already includes the zero-based array correction. Don't add an additional 1.
Additionally to the inefficiency of looking at one extra constant, this causes problems if the driver
rejects loading 257 constants on the initial load. In this case no constant is loaded if
GL_EXT_gpu_program_parameters is used.
If the replacement pipeline is used, ARBfp is always on. Disabling it
can break shaders or the replacement pipeline, because the shader and
ffp code assumes the extension is on.
I don't like that I have to do this because the posFixup is in all
vertex programs, so its at the same position and could be loaded
globally. Unfortunately, there are only 256 env parameters usually,
which makes it impossible for any shader to use c256, even if it does
not use indirect addressing, and so we can't claim 256 constant
support.
I find it helpful for debugging to have this controlled at a central place,
without having to disable the entire GL extension or manually find all the
places where GL_SUPPORT(NV_VERTEX_PROGRAM2_OPTION) controls clipplane use. It
is useful for debugging the emulation code on NV cards and for debugging mac
driver issues.
b2f09fd204 accidentally got the
device->vs_clipping check wrong. The FFP replacement should emulate
clipping if GL can't do this natively with vertex shaders, not the
other way. Also don't emulate clipping if we're using fixed function
vertex processing because (a) clipping is always supported by GL in
this case, and (b), fragment.texcoord[7] is undefined. (Or in the
worst case set to something bad by the app).
If the needed constants are available, we can support all vs_2_0 and ps_2_0
requirements with the plain ARB extensions. We cannot however, run SM 2.0a or
SM 2.0b.
ps 1.x constants are clamped to [-1;1], constants in >= 2.0 pshaders
are not. This means we have to reload constants when switching between
those shader types in ARB. In GLSL this is not a concern because
constants are tied to program objects and are reloaded on a shader
change anyway.
This patch tries to find a free texture coordinate to load up to 4 clip
coordinates into the pixel shader, and uses KIL to throw away fragments
that are cut by a clipplane. If no free texture coordinate is found,
clipping is not done. If more than 4 clipplanes are used, only the first
4 are actually enabled. That should be pretty rare though.
Using GL_NV_vertex_program2_option so far. If we're really desparate we can
handle some cases without the extension by using a custom varying and texkill
in the fragment program.
This gives a small performance improvement. Don't enable NVfp for it though,
because the NVfp penalty is bigger than the gain from this patch. But if NVfp
is enabled anyway, make use of it.
This reverts patch ba35760f9f.
The original patch did not achive its goal, because CMP is a macro that is
expanded to SLT, SGE, MUL, MAD, at least on nvidia hardware. To make matters
worse, it uses a temporary register, and the assembler usually is not clever
enough to find a free temporary from the shader code. If we generate the code
outselves we can pick one of our temps for this job.
Many 2.0 and 3.0 shaders end with a "mov oC0, rx". If sRGB writing is enabled,
the ARB backend writes to a TMP_COLOR temporary, and at the end of the shader
writes the sRGB corrected color to result.color. If oC0 is not partially
rewritten after the mov, we can ignore the mov, not declare TMP_COLOR at all,
and just use the rx register as input for the sRGB correction code. This saves
a temporary and an instruction.
This reduces the number of methods in the shader backend(the instr
modifiers can be handled in that wrapper) and it will help flow
control emulation in the ARB backend.
SCS is unfortunately a fragment program only instruction. If we have the NV
extensions we can use SIN and COS. Otherwise we have to approximate sine and
cosine with a taylor series. Luckily we're provided with the necessary
constants by the application.
TMP_POS is only used in vertex shaders, declare it in the vshader
specific code. The sRGB constants are only used by pixel shaders, so
move them to the ps specific code, and avoid reading the stateblock.
To be able keep the temporary register in the type independent NRM
instruction, the vertex temporary register is renamed to TA to match
the name of a pixel shader register.
texm3x2pad knows which register the following texm3x2depth or tex instruction
will use, and it knows that this register is uninitialized. So use it for
temporary storage instead of TMP.
This is the Nth attemt to make clipping work with GLSL shaders. The patch now
uses the GLSL quirk table to handle cards that need a custom varying for
gl_ClipPos, and the code is adapted to the changed state table and shader
backend system.
This simplifies the loading code a bit. The constants were never
designed to be at the same location in all shaders, so there's no
point in using program.env. This way we don't collide with the d3d
shader constants and its easier to work together with NP2 fixups and
other shaders.
This was needed unconditionally in the past to apply fog, but since we're
using the ARBfp fog defines it is only needed if an sRGB correction is done
at the end of the shader.
ps_1_3 uses Tx to pass in texture coordinates, but also as temporary
registers. ps_1_4 and ps_2_0 only use them for texture coordinates. This patch
gets rid of the Tx = fragment.texcoord[x] assign in all shader versions, and
doesn't even declare Tx in ps_1_4 and ps_2_0.
The <=ps_1_3 instructions know which kind of input they expect from the Tx
register, so the instruction handlers now know if they have to read the
tempreg Tx or the varying fragment.texcoord[x].
shader_arb_add_src_param handled DW and TXP undid it again. Remove DZ DW from
the modifiers and handle it in the instruction. DZ cannot be handled by TXP as
is, so move the .z component to .w and make it DW-like. Using SZW+TXP is
likely more efficient than the RCP, MUL, TEX we'd get if we let
shader_arb_add_src_param do the job.
Use shader_arb_get_dst_param instead of get_register_name to find the register
name. Even though this adds support for modifiers(which aren't allowed by
native), this shouldn't hurt. If an app passes in an incorrect shader it
should be caught in the frontend.
Mostly based on the code of pshader_gen_input_modifier_line. The space-adding
behavior of shader_arb_add_src_param was removed because the plurality of
instruction handlers passes an uninitialized buffer in and expects a register
name written to its start, and only map2gl and rcp_rsq use the space-adding
stuff. I'll change rcp_rsq in a later patch anyway. I changed the name to
shader_arb_get_src_param to reflect this behavior.
Use shader_arb_add_instruction_modifiers instead. This avoids calling the
fixup function from each single instruction handler to handle shifts. It does
not yet get rid of the modifier handler in each instruction because we don't
want a separate line if we can just append _SAT to the instruction name.
This is needed to raise the number of advertised constants to the GL
limit. The ARB assembler ususally does not optimize away unused
constants, so we have to do this.