Note that using GL_DEPTH_COMPONENT instead of eg. GL_DEPTH_COMPONENT24
will work, but will create a renderbuffer with the format of the
onscreen depth buffer.
Turns out the original fix was correct for fixed function, but for the
wrong reason. The shader path was already correct. This fixes a
regresssion introduced by 932e95c111.
Since those surfaces are stored in blocks, the 4 pixel step doesn't only apply to surfaces < 4, but
also to surfaces bigger than that, with a non-multiple-of-4 size.
This prevents shader path from being entered for an offscreen surface
when there is p8 render target and fixes failures in ddraw visual test
(with opengl rendering and RTL_READDRAW mode) and visual glitches in
Red Alert.
Clear all attachments before deleting FBOs. It should be valid to
delete FBOs that still have attachments, but for some reason the
nvidia drivers don't like it. The resulting memory corruption can be
pretty nasty, and this workaround seems clean enough.
This is the prefered format of many codecs, and for some codecs this
is the only supported output format. As usual I try to handle all the
conversion in the GPU and keep the CPU involvement minimal to gain the
full performance of PBO transfers.
GL_ARB_fragment_program and GL_ATI_fragment_shader can disable
projected textures properly, and they can also handle
D3DTTFF_PROJECTED | D3DTTFF_COUNT3 properly.
Since some of those function pointers are direct GL functions the function
prototype needs the WINE_GLAPI calling convention. This makes prevents
drawStridedSlow from crashing with USE_WIN32_OPENGL.
If we're heading out of the pixelshader handler early, and a pixel
shader is in use, the pixel shader may not be compiled. The vertex
shader handler then checks if the pixel shader is dirty, and calls the
shader backend to apply the shader if it isn't. Thus, in the case of
GLSL, the shader code could attempt to link an uncompiled shader into
the program. This isn't much of a problem because when the fog is
applied, the pixel shader is compiled and the program re-linked.
GL_RGBA doesn't gaurantee an internal storage depth, which can cause the test
to fail if it's stored with less than 8 bits of precision. Some nVidia
drivers would actually store with 4 bits of precision.
Although sharing FBOs across contexts is allowed by EXT_framebuffer_object
(issue 76), it causes issues with nVidia drivers. Considering the GL 3 spec
explicitly disallows sharing of FBOs accross contexts (Appendix D), this
patch is probably the right thing to do.
There's no need to do that with the nvts and opengl ffp fixed function
fragment pipeline, it's perfectly well defined in GL which one takes
effect. This removes a few more troubles when switching between
shaders and arbfp.
ARB and GLSL don't need that. If a shader backend like atifs or nvts
need it in the future, the shader backend should deal with that rather
than the ffp pipeline.
Half Life 2 uses D3DFMT_X8R8G8B8 for the back buffer, but macos
supports aux buffers only on D3DFMT_A8R8G8B8. I think having aux
buffers is more important right now than having a precise alpha
match.
If a format is not supported natively by opengl, a shader may be able
to convert it. Up to now, CheckDeviceFormat had magic knowldge which
GL extensions lead to which supported format. This patch adds
functions that allow CheckDeviceFormat to ask the actual
implementation for its capabilities.
DDraw can draw to the front buffer only, thus there's never a Present
call which could pass this window. Due to that a drawing-independent
method is needed.
This is a long-needed cleanup aimed at removing the ddraw_primary,
ddraw_window, ddraw_width and ddraw_height members from
IWineD3DDeviceImpl, which just do not belong there. Destination
window and screen handling is supposed to be done by swapchains.
Currently the ddraw capabilities were almost static, except of D3D
support. When overlay support is added, the caps depend on certain
settings in WineD3D or capabilities available from OpenGL and Xv. So
set those caps in wined3d as well.
Fixed function processing can only deal with values between 0 and 1
generally. Clamp the results of instructions that could produce bigger
or smaller values.
It's probably rare for higher render targets to get locked or updated
from sysmem, but this should still be more correct. It also makes the
code simpler.
This is an ATI specific format designed for compressed normal maps,
and quite a few games check for its existence. While it is an
ATI-specific "extension" in d3d9, it is a core part of
D3D10(DXGI_FORMAT_BC5), and supported on Geforce 8 cards.
This happened to work because most cards have the same amount of
pshader and vshader constants, but for some reason this doesn't hold
true on this macbook pro here, which lead to a crash due to heap
corruption
Some drivers(the open source ones most notably) cannot satisfy all
possible D3D formats. This doesn't mean we should fall back to the
emergency fallback instantly. Instead, try to loosen the requirements
step by step.
This is cleaner than the if statements in the code. Also np2 textures
should in theory support linear filtering, but fglrx doesn't seem to
like it. This needs further investigation. So far we've never used
linear filtering on np2 textures, so there should not be a
regression. Furthermore I think shader support is more important than
filtering, since NP2 textures are mostly used for 1:1 copying to the
screen.
ATI cards prior to the radeon HD series did not have unconditional non
power of two support. So far we've used texture_rectangle for that, or
created a bigger power of two texture with padding. This had the
disadvantage that we had to correct the coordinates, which causes
extreme problems with shaders(doesn't work, pretty much).
Both the MacOS and the fglrx driver have support for
GL_ARB_texture_non_power_of_two, and run it on the hardware as long as
we stay within the texture_rectangle limitations. This allows us to
have conditional non power of two textures with normalized
coordinates. This patch adds an internal extension, and the code
creates a regular GL_TEXTURE_2D texture with NP2 size, but refuses
mipmapping, filtering and texture_rectangle incompatible
operations. This makes np2 textures work with shaders on fglrx and
macos.
The opengl extension mentioned in that code was never finished, and as
far as I know there is no way to make use of tangent data in the d3d
fixed function pipeline as well.
Note that GL_ATI_envmap_bumpmap is not the same as
GL_ATI_fragment_shader. envmap_bumpmap is used together with the
regular opengl ffp pipeline and is not used (other than for
pixelformats) if GL_ATI_fragment_shader is used.
This patch adds a new field to the state templates. If this extension
field is != 0, then the line is only applied to the final state table
if the extension is supported. Once a line is applied to the final
table, all further templates for this state from the same pipeline
part are ignored. This allows removing some extension checks from the
state handlers, which cleans them up and saves a few CPU cycles when
applying the states.
This patch enables texture filtering for textures using the A4R4G4B4
format, I can see no reason why it shouldn't be filtered (especially
considering X4R4G4B4 has it).
This creates an nvts version of this function, and removes the nvts
code from the original one. The nvts version is used by the nvts
pipeline implementation, the original one by the nvrc-only, atifs and
ffp one.
As long as we have the shader constants in misc, it is best to keep
all the code that affects shader constants, like bumpenvmat setting,
in there as well.
This code creates the structures and the pipeline selection, as well
as the caps filling. It does not yet move the actual code around,
since this will be a bigger task.
When a sampler is changed and unconditional NP2 textures are not
supported, the texture matrix may need adjustment. The sampler state
function checks for that, and calls the texture transform setting
function in that case. However, samplers are a misc state, and the
texture transform flags a vertex state. Thus split up the code and
move the matrix changes to the vertex side.
Since atifs is only doing the fragment pipeline replacement right now
there is no need for the shader backend structure any longer. The ffp
private data is stored in new fragment pipeline private data(which
could potentially be set to equal the shader private data if needed).
It isn't related to the shader backend any longer. The nvts_enable in
the ffp code isn't quite right as well, it should be moved away once
there is a dedicated nvts fragment pipeline replacement
Destroying the stateblock potentially references the shader backend.
If the stateblock has active shaders when it is released, the shader's
destructor will tell the shader backend to destroy the corresponding
resources. This was exposed by my patch that moved the glsl program
lookup table into the backend's private data.
Calling shader_select() from inside depth_blt() isn't necessarily
safe. shader_select() assumes CompileShader() has been called for the
current shaders, but that depends on STATE_VSHADER / STATE_PIXELSHADER
being applied. That isn't always true when depth_blt() gets called,
with the result that sometimes GLSL programs could be created with no
shader objects attached.
For now the atifs selection sticks to the old rules, thus it is bound to
the available and selected shader capabilities. We may want to change that
in the future.
The idea of this patchset is to split the monolithic state set into 3
parts, vertex processing, fragment processing and other states(depth,
stencil, scissor, ...). The states will be provided in templates which
can be (mostly) independently combined, and are merged into a single
state table at device creation time. This way we retain the advantages
of the single state table and having the advantage of separated
pipeline implementations which can be combined without any manually
written glue code.
The atifs fragment processing implementation doesn't borrow a pixel shader
implementation from anywhere. It was a hack during development, but never needed.
Constant numbers start at 0, and the loading loop has a for(i; i <
dirtyconsts; i++). This means that the highest dirty constant isn't
loaded correctly. Rather than replacing the < with <=, which would
make it impossible to have no dirty constant, add 1 to the dirty
constant counter.
This gets rid of depth_copy_state in the device, and instead tracks
the most up to date location per-surface. This makes things a lot
easier to follow, and allows us to make a copy when switching depth
stencils in SetDepthStencilSurface().
This makes the depth copy independent of the currently attached render
targets. This is important for the next patch because it might do a
depth copy when the render targets aren't in a valid configuration
(SetDepthStencilSurface()).
SetupForBlit sets up the GL viewport and projection matrix for
screen-cordinate access to the framebuffer. These settings were not
updated if the other gl states were already set up for blitting. Guild
Wars reads back an offscreen rendered texture from the framebuffer,
which currently sets up CTXUSAGE_BLIT, then changes the render target,
and draws to the texture, which has to be reloaded from system memory
before it can be rendered to(since GW loaded some data into it). If the
two render targets had different size this failed.
The idea is to make setting depth attachments a bit more consistent
with set_render_target_fbo()/attach_surface_fbo(). I've also got an
upcoming patch in my tree that needs this.
Just unsetting SFLAG_INTEXTURE doesn't work for FBOs because the
drawable and texture are the same there (and ModifyLocation() is the
correct way to do this anyway). Fixes another ddraw test failure with
FBO ORM.
As far as I can tell we support post ps blending in combination with
MRTs fine. Tabula Rasa needs this cap in order to enable some of the
higher graphics settings.
Currently we only check if ARB_HALF_FLOAT_PIXEL is supported. This is
not enough, we need ARB_TEXTURE_FLOAT as well. This fixes some errors
when running the d3d9 visual test with Mesa swrast.
Currently depth formats are handled separately from the other formats,
but depth formats can support things like filtering as well, so we
should check those caps as well.
SM3.0 requires 10 4 component float varyings for passing stuff between
vertex and pixel shaders. GF7 and earlier report 8 generic varyings +
gl_Color and gl_SecondaryColor in GLSL. This patch allows us to use
gl_Color and gl_SecondaryColor to get 2 extra varyings, which some
games, like C&C3 with highest gfx settings, require.
There is no reason to do that, now that the SetGLTextureDesc bug is
fixed. This avoids an infinite recursion because PreLoad calls
ActivateContext at some point.
Fixes screen not updating or getting updated inconsistently when apps blit to
front buffer or lock it when RenderTargetLockMode=readtex, as happens in e.g.
Red Alert 2 and also in p8_primary_test in ddraw tests.