Implement some basic opengl accelerated blts from and to render
targets. It's not perfect yet, but enought to make some D3D apps
happy. For now the only supported operations are:
- Full screen back -> Front buffer: Just call present
- Offscreen surface -> render target
- Render target -> offscreen surface(slow)
- render target colorfill
Each instruction can have a predication token. Account for it in the
trace pass, register count pass, and store it in the SHADER_OPCODE_ARG
structure for generation. MSDN claims the token is at the end of the
instruction, but that's not true - testing a demo, which lets me
manipulate the shader shows the predication token is the first source
token immediately following the destination token.
As previously mentioned, RASTOUT is invalid on pixel shaders.
On shaders 1.x, r0 is treated as the color output register:
http://www.gamedev.net/columns/hardcore/dxshader3/page2.asp
That's what we currently do in all cases, change it not to do so
for shaders >= 2.0. Support COLOROUT/DEPTHOUT instead.
Currently we hardcode a0.x, which I think is correct for shaders 1.0.
However, for shaders 2.0, we must look into the address token, and
print the register there. Handle both cases to correct the trace.
Change the trace pass, the register counting pass, and the hw
generator pass to take into account the new get_params() function. For
hw generation, store the address tokens into the SHADER_OPCODE_ARG
structure, so they're available to generator functions.
Add a new function to process parameters.
On shaders 1.0, processing parameters amounts to *pToken++.
On shaders 2.0+, we have a relative addressing token to account for.
This function should be used, instead of relying on num_params everywhere.
Share shader_dump_ins_modifiers(), and make vertex shaders use it.
The saturate modifer (_sat) is valid on vs_3_0+, and it isn't being
shown in the trace.