438 lines
15 KiB
Plaintext
438 lines
15 KiB
Plaintext
<chapter id="implementation">
|
|
<title>Low-level Implementation</title>
|
|
<para>Details of Wine's Low-level Implementation...</para>
|
|
|
|
<sect1 id="undoc-func">
|
|
<title>Undocumented APIs</title>
|
|
|
|
<para>
|
|
Some background: On the i386 class of machines, stack entries are
|
|
usually dword (4 bytes) in size, little-endian. The stack grows
|
|
downward in memory. The stack pointer, maintained in the
|
|
<literal>esp</literal> register, points to the last valid entry;
|
|
thus, the operation of pushing a value onto the stack involves
|
|
decrementing <literal>esp</literal> and then moving the value into
|
|
the memory pointed to by <literal>esp</literal>
|
|
(i.e., <literal>push p</literal> in assembly resembles
|
|
<literal>*(--esp) = p;</literal> in C). Removing (popping)
|
|
values off the stack is the reverse (i.e., <literal>pop p</literal>
|
|
corresponds to <literal>p = *(esp++);</literal> in C).
|
|
</para>
|
|
|
|
<para>
|
|
In the <literal>stdcall</literal> calling convention, arguments are
|
|
pushed onto the stack right-to-left. For example, the C call
|
|
<function>myfunction(40, 20, 70, 30);</function> is expressed in
|
|
Intel assembly as:
|
|
<screen>
|
|
push 30
|
|
push 70
|
|
push 20
|
|
push 40
|
|
call myfunction
|
|
</screen>
|
|
The called function is responsible for removing the arguments
|
|
off the stack. Thus, before the call to myfunction, the
|
|
stack would look like:
|
|
<screen>
|
|
[local variable or temporary]
|
|
[local variable or temporary]
|
|
30
|
|
70
|
|
20
|
|
esp -> 40
|
|
</screen>
|
|
After the call returns, it should look like:
|
|
<screen>
|
|
[local variable or temporary]
|
|
esp -> [local variable or temporary]
|
|
</screen>
|
|
</para>
|
|
|
|
<para>
|
|
To restore the stack to this state, the called function must know how
|
|
many arguments to remove (which is the number of arguments it takes).
|
|
This is a problem if the function is undocumented.
|
|
</para>
|
|
|
|
<para>
|
|
One way to attempt to document the number of arguments each function
|
|
takes is to create a wrapper around that function that detects the
|
|
stack offset. Essentially, each wrapper assumes that the function will
|
|
take a large number of arguments. The wrapper copies each of these
|
|
arguments into its stack, calls the actual function, and then calculates
|
|
the number of arguments by checking esp before and after the call.
|
|
</para>
|
|
|
|
<para>
|
|
The main problem with this scheme is that the function must actually
|
|
be called from another program. Many of these functions are seldom
|
|
used. An attempt was made to aggressively query each function in a
|
|
given library (<filename>ntdll.dll</filename>) by passing 64 arguments,
|
|
all 0, to each function. Unfortunately, Windows NT quickly goes to a
|
|
blue screen of death, even if the program is run from a
|
|
non-administrator account.
|
|
</para>
|
|
|
|
<para>
|
|
Another method that has been much more successful is to attempt to
|
|
figure out how many arguments each function is removing from the
|
|
stack. This instruction, <literal>ret hhll</literal> (where
|
|
<symbol>hhll</symbol> is the number of bytes to remove, i.e. the
|
|
number of arguments times 4), contains the bytes
|
|
<literal>0xc2 ll hh</literal> in memory. It is a reasonable
|
|
assumption that few, if any, functions take more than 16 arguments;
|
|
therefore, simply searching for
|
|
<literal>hh == 0 && ll < 0x40</literal> starting from the
|
|
address of a function yields the correct number of arguments most
|
|
of the time.
|
|
</para>
|
|
|
|
<para>
|
|
Of course, this is not without errors. <literal>ret 00ll</literal>
|
|
is not the only instruction that can have the byte sequence
|
|
<literal>0xc2 ll 0x0</literal>; for example,
|
|
<literal>push 0x000040c2</literal> has the byte sequence
|
|
<literal>0x68 0xc2 0x40 0x0 0x0</literal>, which matches
|
|
the above. Properly, the utility should look for this sequence
|
|
only on an instruction boundary; unfortunately, finding
|
|
instruction boundaries on an i386 requires implementing a full
|
|
disassembler -- quite a daunting task. Besides, the probability
|
|
of having such a byte sequence that is not the actual return
|
|
instruction is fairly low.
|
|
</para>
|
|
|
|
<para>
|
|
Much more troublesome is the non-linear flow of a function. For
|
|
example, consider the following two functions:
|
|
<screen>
|
|
somefunction1:
|
|
jmp somefunction1_impl
|
|
|
|
somefunction2:
|
|
ret 0004
|
|
|
|
somefunction1_impl:
|
|
ret 0008
|
|
</screen>
|
|
In this case, we would incorrectly detect both
|
|
<function>somefunction1</function> and
|
|
<function>somefunction2</function> as taking only a single
|
|
argument, whereas <function>somefunction1</function> really
|
|
takes two arguments.
|
|
</para>
|
|
|
|
<para>
|
|
With these limitations in mind, it is possible to implement more stubs
|
|
in Wine and, eventually, the functions themselves.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="accel-impl">
|
|
<title>Accelerators</title>
|
|
|
|
<para>
|
|
There are <emphasis>three</emphasis> differently sized
|
|
accelerator structures exposed to the user:
|
|
</para>
|
|
<orderedlist>
|
|
<listitem>
|
|
<para>
|
|
Accelerators in NE resources. This is also the internal
|
|
layout of the global handle <type>HACCEL</type> (16 and
|
|
32) in Windows 95 and Wine. Exposed to the user as Win16
|
|
global handles <type>HACCEL16</type> and
|
|
<type>HACCEL32</type> by the Win16/Win32 API.
|
|
These are 5 bytes long, with no padding:
|
|
<programlisting>
|
|
BYTE fVirt;
|
|
WORD key;
|
|
WORD cmd;
|
|
</programlisting>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Accelerators in PE resources. They are exposed to the user
|
|
only by direct accessing PE resources.
|
|
These have a size of 8 bytes:
|
|
</para>
|
|
<programlisting>
|
|
BYTE fVirt;
|
|
BYTE pad0;
|
|
WORD key;
|
|
WORD cmd;
|
|
WORD pad1;
|
|
</programlisting>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Accelerators in the Win32 API. These are exposed to the
|
|
user by the <function>CopyAcceleratorTable</function>
|
|
and <function>CreateAcceleratorTable</function> functions
|
|
in the Win32 API.
|
|
These have a size of 6 bytes:
|
|
</para>
|
|
<programlisting>
|
|
BYTE fVirt;
|
|
BYTE pad0;
|
|
WORD key;
|
|
WORD cmd;
|
|
</programlisting>
|
|
</listitem>
|
|
</orderedlist>
|
|
|
|
<para>
|
|
Why two types of accelerators in the Win32 API? We can only
|
|
guess, but my best bet is that the Win32 resource compiler
|
|
can/does not handle struct packing. Win32 <type>ACCEL</type>
|
|
is defined using <function>#pragma(2)</function> for the
|
|
compiler but without any packing for RC, so it will assume
|
|
<function>#pragma(4)</function>.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="hardware-trace">
|
|
<title>Doing A Hardware Trace</title>
|
|
|
|
<para>
|
|
The primary reason to do this is to reverse engineer a
|
|
hardware device for which you don't have documentation, but
|
|
can get to work under Wine.
|
|
</para>
|
|
<para>
|
|
This lot is aimed at parallel port devices, and in particular
|
|
parallel port scanners which are now so cheap they are
|
|
virtually being given away. The problem is that few
|
|
manufactures will release any programming information which
|
|
prevents drivers being written for Sane, and the traditional
|
|
technique of using DOSemu to produce the traces does not work
|
|
as the scanners invariably only have drivers for Windows.
|
|
</para>
|
|
<para>
|
|
Presuming that you have compiled and installed wine the first
|
|
thing to do is is to enable direct hardware access to your
|
|
parallel port. To do this edit <filename>config</filename>
|
|
(usually in <filename>~/.wine/</filename>) and in the
|
|
ports section add the following two lines
|
|
</para>
|
|
<programlisting>
|
|
read=0x378,0x379,0x37a,0x37c,0x77a
|
|
write=0x378,x379,0x37a,0x37c,0x77a
|
|
</programlisting>
|
|
<para>
|
|
This adds the necessary access required for SPP/PS2/EPP/ECP
|
|
parallel port on LPT1. You will need to adjust these number
|
|
accordingly if your parallel port is on LPT2 or LPT0.
|
|
</para>
|
|
<para>
|
|
When starting wine use the following command line, where
|
|
<literal>XXXX</literal> is the program you need to run in
|
|
order to access your scanner, and <literal>YYYY</literal> is
|
|
the file your trace will be stored in:
|
|
</para>
|
|
<programlisting>
|
|
wine -debugmsg +io XXXX 2> >(sed 's/^[^:]*:io:[^ ]* //' > YYYY)
|
|
</programlisting>
|
|
<para>
|
|
You will need large amounts of hard disk space (read hundreds
|
|
of megabytes if you do a full page scan), and for reasonable
|
|
performance a really fast processor and lots of RAM.
|
|
</para>
|
|
<para>
|
|
You will need to postprocess the output into a more manageable
|
|
format, using the <command>shrink</command> program. First
|
|
you need to compile the source (which is located at the end of
|
|
this section):
|
|
<programlisting>
|
|
cc shrink.c -o shrink
|
|
</programlisting>
|
|
</para>
|
|
<para>
|
|
Use the <command>shrink</command> program to reduce the
|
|
physical size of the raw log as follows:
|
|
</para>
|
|
<programlisting>
|
|
cat log | shrink > log2
|
|
</programlisting>
|
|
<para>
|
|
The trace has the basic form of
|
|
</para>
|
|
<programlisting>
|
|
XXXX > YY @ ZZZZ:ZZZZ
|
|
</programlisting>
|
|
<para>
|
|
where <literal>XXXX</literal> is the port in hexidecimal being
|
|
accessed, <literal>YY</literal> is the data written (or read)
|
|
from the port, and <literal>ZZZZ:ZZZZ</literal> is the address
|
|
in memory of the instruction that accessed the port. The
|
|
direction of the arrow indicates whether the data was written
|
|
or read from the port.
|
|
</para>
|
|
<programlisting>
|
|
> data was written to the port
|
|
< data was read from the port
|
|
</programlisting>
|
|
<para>
|
|
My basic tip for interpreting these logs is to pay close
|
|
attention to the addresses of the IO instructions. Their
|
|
grouping and sometimes proximity should reveal the presence of
|
|
subroutines in the driver. By studying the different versions
|
|
you should be able to work them out. For example consider the
|
|
following section of trace from my UMAX Astra 600P
|
|
</para>
|
|
<programlisting>
|
|
0x378 > 55 @ 0297:01ec
|
|
0x37a > 05 @ 0297:01f5
|
|
0x379 < 8f @ 0297:01fa
|
|
0x37a > 04 @ 0297:0211
|
|
0x378 > aa @ 0297:01ec
|
|
0x37a > 05 @ 0297:01f5
|
|
0x379 < 8f @ 0297:01fa
|
|
0x37a > 04 @ 0297:0211
|
|
0x378 > 00 @ 0297:01ec
|
|
0x37a > 05 @ 0297:01f5
|
|
0x379 < 8f @ 0297:01fa
|
|
0x37a > 04 @ 0297:0211
|
|
0x378 > 00 @ 0297:01ec
|
|
0x37a > 05 @ 0297:01f5
|
|
0x379 < 8f @ 0297:01fa
|
|
0x37a > 04 @ 0297:0211
|
|
0x378 > 00 @ 0297:01ec
|
|
0x37a > 05 @ 0297:01f5
|
|
0x379 < 8f @ 0297:01fa
|
|
0x37a > 04 @ 0297:0211
|
|
0x378 > 00 @ 0297:01ec
|
|
0x37a > 05 @ 0297:01f5
|
|
0x379 < 8f @ 0297:01fa
|
|
0x37a > 04 @ 0297:0211
|
|
</programlisting>
|
|
<para>
|
|
As you can see there is a repeating structure starting at
|
|
address <literal>0297:01ec</literal> that consists of four io
|
|
accesses on the parallel port. Looking at it the first io
|
|
access writes a changing byte to the data port the second
|
|
always writes the byte <literal>0x05</literal> to the control
|
|
port, then a value which always seems to
|
|
<literal>0x8f</literal> is read from the status port at which
|
|
point a byte <literal>0x04</literal> is written to the control
|
|
port. By studying this and other sections of the trace we can
|
|
write a C routine that emulates this, shown below with some
|
|
macros to make reading/writing on the parallel port easier to
|
|
read.
|
|
</para>
|
|
<programlisting>
|
|
#define r_dtr(x) inb(x)
|
|
#define r_str(x) inb(x+1)
|
|
#define r_ctr(x) inb(x+2)
|
|
#define w_dtr(x,y) outb(y, x)
|
|
#define w_str(x,y) outb(y, x+1)
|
|
#define w_ctr(x,y) outb(y, x+2)
|
|
|
|
/* Seems to be sending a command byte to the scanner */
|
|
int udpp_put(int udpp_base, unsigned char command)
|
|
{
|
|
int loop, value;
|
|
|
|
w_dtr(udpp_base, command);
|
|
w_ctr(udpp_base, 0x05);
|
|
|
|
for (loop=0; loop < 10; loop++)
|
|
if ((value = r_str(udpp_base)) & 0x80)
|
|
{
|
|
w_ctr(udpp_base, 0x04);
|
|
return value & 0xf8;
|
|
}
|
|
|
|
return (value & 0xf8) | 0x01;
|
|
}
|
|
</programlisting>
|
|
<para>
|
|
For the UMAX Astra 600P only seven such routines exist (well
|
|
14 really, seven for SPP and seven for EPP). Whether you
|
|
choose to disassemble the driver at this point to verify the
|
|
routines is your own choice. If you do, the address from the
|
|
trace should help in locating them in the disassembly.
|
|
</para>
|
|
<para>
|
|
You will probably then find it useful to write a script/perl/C
|
|
program to analyse the logfile and decode them futher as this
|
|
can reveal higher level grouping of the low level routines.
|
|
For example from the logs from my UMAX Astra 600P when decoded
|
|
further reveal (this is a small snippet)
|
|
</para>
|
|
<programlisting>
|
|
start:
|
|
put: 55 8f
|
|
put: aa 8f
|
|
put: 00 8f
|
|
put: 00 8f
|
|
put: 00 8f
|
|
put: c2 8f
|
|
wait: ff
|
|
get: af,87
|
|
wait: ff
|
|
get: af,87
|
|
end: cc
|
|
start:
|
|
put: 55 8f
|
|
put: aa 8f
|
|
put: 00 8f
|
|
put: 03 8f
|
|
put: 05 8f
|
|
put: 84 8f
|
|
wait: ff
|
|
</programlisting>
|
|
<para>
|
|
From this it is easy to see that <varname>put</varname>
|
|
routine is often grouped together in five successive calls
|
|
sending information to the scanner. Once these are understood
|
|
it should be possible to process the logs further to show the
|
|
higher level routines in an easy to see format. Once the
|
|
highest level format that you can derive from this process is
|
|
understood, you then need to produce a series of scans varying
|
|
only one parameter between them, so you can discover how to
|
|
set the various parameters for the scanner.
|
|
</para>
|
|
|
|
<para>
|
|
The following is the <filename>shrink.c</filename> program:
|
|
<programlisting>
|
|
/* Copyright David Campbell <campbell@torque.net> */
|
|
#include <stdio.h>
|
|
#include <string.h>
|
|
|
|
int main (void)
|
|
{
|
|
char buff[256], lastline[256] = "";
|
|
int count = 0;
|
|
|
|
while (!feof (stdin))
|
|
{
|
|
fgets (buff, sizeof (buff), stdin);
|
|
if (strcmp (buff, lastline))
|
|
{
|
|
if (count > 1)
|
|
printf ("# Last line repeated %i times #\n", count);
|
|
printf ("%s", buff);
|
|
strcpy (lastline, buff);
|
|
count = 1;
|
|
}
|
|
else count++;
|
|
}
|
|
return 0;
|
|
}
|
|
</programlisting>
|
|
</para>
|
|
</sect1>
|
|
|
|
</chapter>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode: sgml
|
|
sgml-parent-document:("wine-devel.sgml" "set" "book" "part" "chapter" "")
|
|
End:
|
|
-->
|