The basic rule is that you can't call anything that takes the user32 / gdi32
lock while under the GL (winex11) lock. As a consequence, you can't call
anything like context_acquire() or context_destroy() either.
The comment above the code correctly mentions that this optimization
does not work if oC0 is written partially, but the code doesn't
actually check for this condition.
If an application switches between render targets of a different size, but
with the same depth/stencil surface it'll typically clear the depth/stencil
surface before drawing. However, in case of the smaller render target that
wouldn't be a full clear, so we'd have to do a depth copy if we also switched
between onscreen and offscreen rendering. Keeping track of which part of the
depth/stencil surface is current for onscreen/offscreen allows us to avoid
most of these kinds of copies. The current scheme requires the current/dirty
rectangle to have an origin at (0,0). This could be extended to an arbitrary
rectangle, but the bookkeeping becomes somewhat more complex in that case, and
it's not clear that there would be much of a benefit at this point.
The "attributes" vertexshader field is now derived from the input signature,
and only used to speed up matching D3D9 vertex declaration elements to shader
inputs. D3D8 and D3D10 both explicitly specify input registers.