Sweden-Number/documentation/implementation.sgml

  <chapter id="implementation">
    <title>Low-level Implementation</title>
    <para>Details of Wine's Low-level Implementation...</para>

<sect1 id="config-keyboard">
<title>Keyboard</title>

<para>
Wine now needs to know about your keyboard layout. This
requirement comes from a need from many apps to have the
correct scancodes available, since they read these directly,
instead of just taking the characters returned by the X
server. This means that Wine now needs to have a mapping from
X keys to the scancodes these programs expect.
</para>
<para>
On startup, Wine will try to recognize the active X layout by
seeing if it matches any of the defined tables. If it does,
everything is alright. If not, you need to define it.
</para>
<para>
To do this, open the file
<filename>dlls/x11drv/keyboard.c</filename> and take a look
at the existing tables. Make a backup copy of it, especially
if you don't use CVS.
</para>
<para>
What you really would need to do, is find out which scancode
each key needs to generate.  Find it in the
<function>main_key_scan</function> table, which looks like
this:
</para>
<programlisting>
static const int main_key_scan[MAIN_LEN] =
{
/* this is my (102-key) keyboard layout, sorry if it doesn't quite match yours */
0x29,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0A,0x0B,0x0C,0x0D,
0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,0x18,0x19,0x1A,0x1B,
0x1E,0x1F,0x20,0x21,0x22,0x23,0x24,0x25,0x26,0x27,0x28,0x2B,
0x2C,0x2D,0x2E,0x2F,0x30,0x31,0x32,0x33,0x34,0x35,
0x56 /* the 102nd key (actually to the right of l-shift) */
};
</programlisting>
<para>
Next, assign each scancode the characters imprinted on the
keycaps. This was done (sort of) for the US 101-key keyboard,
which you can find near the top in
<filename>keyboard.c</filename>. It also shows that if there
is no 102nd key, you can skip that.
</para>
<para>
However, for most international 102-key keyboards, we have
done it easy for you. The scancode layout for these already
pretty much matches the physical layout in the
<function>main_key_scan</function>, so all you need to do is
to go through all the keys that generate characters on your
main keyboard (except spacebar), and stuff those into an
appropriate table. The only exception is that the 102nd key,
which is usually to the left of the first key of the last line
(usually <keycap>Z</keycap>), must be placed on a separate
line after the last line.
</para>
<para>
For example, my Norwegian keyboard looks like this
</para>
<screen>
<EFBFBD>  !  "  #  <20>  %  &  /  (  )  =  ?  `  Back-
|  1  2@ 3<> 4$ 5  6  7{ 8[ 9] 0} +  \<5C> space

Tab Q  W  E  R  T  Y  U  I  O  P  <20>  ^
			     <20>~
				Enter
Caps A  S  D  F  G  H  J  K  L  <20>  <20>  *
Lock                                  '

Sh- > Z  X  C  V  B  N  M  ;  :  _  Shift
ift &lt;                      ,  .  -

Ctrl  Alt       Spacebar       AltGr  Ctrl
</screen>
<para>
Note the 102nd key, which is the <keycap>&lt;></keycap> key, to
the left of <keycap>Z</keycap>. The character to the right of
the main character is the character generated by
<keycap>AltGr</keycap>.
</para>
<para>
This keyboard is defined as follows:
</para>
<programlisting>
static const char main_key_NO[MAIN_LEN][4] =
{
"|<7C>","1!","2\"@","3#<23>","4<>$","5%","6&","7/{","8([","9)]","0=}","+?","\\<5C>",
"qQ","wW","eE","rR","tT","yY","uU","iI","oO","pP","<22><>","<22>^~",
"aA","sS","dD","fF","gG","hH","jJ","kK","lL","<22><>","<22><>","'*",
"zZ","xX","cC","vV","bB","nN","mM",",;",".:","-_",
"&lt;>"
};
</programlisting>
<para>
Except that " and \ needs to be quoted with a backslash, and
that the 102nd key is on a separate line, it's pretty
straightforward.
</para>
<para>
After you have written such a table, you need to add it to the
<function>main_key_tab[]</function> layout index table. This
will look like this:
</para>
<programlisting>
static struct {
WORD lang, ansi_codepage, oem_codepage;
const char (*key)[MAIN_LEN][4];
} main_key_tab[]={
...
...
{MAKELANGID(LANG_NORWEGIAN,SUBLANG_DEFAULT),  1252, 865, &amp;main_key_NO},
...
</programlisting>
<para>
After you have added your table, recompile Wine and test that
it works. If it fails to detect your table, try running
</para>
<screen>
wine --debugmsg +key,+keyboard >& key.log
      </screen>
      <para>
        and look in the resulting <filename>key.log</filename> file to
        find the error messages it gives for your layout.
      </para>
      <para>
        Note that the <constant>LANG_*</constant> and
        <constant>SUBLANG_*</constant> definitions are in
        <filename>include/winnls.h</filename>, which you might need to
        know to find out which numbers your language is assigned, and
        find it in the debugmsg output. The numbers will be
        <literal>(SUBLANG * 0x400 + LANG)</literal>, so, for example
        the combination <literal>LANG_NORWEGIAN (0x14)</literal> and
        <literal>SUBLANG_DEFAULT (0x1)</literal> will be (in hex)
        <literal>14 + 1*400 = 414</literal>, so since I'm Norwegian, I
        could look for <literal>0414</literal> in the debugmsg output
        to find out why my keyboard won't detect.
      </para>
      <para>
        Once it works, submit it to the Wine project. If you use CVS,
        you will just have to do
      </para>
      <screen>
cvs -z3 diff -u dlls/x11drv/keyboard.c > layout.diff
      </screen>
      <para>
        from your main Wine directory, then submit
        <filename>layout.diff</filename> to
        <email>wine-patches@winehq.org</email> along with a brief note
        of what it is.
      </para>
      <para>
        If you don't use CVS, you need to do
      </para>
      <screen>
diff -u the_backup_file_you_made dlls/x11drv/keyboard.c > layout.diff
      </screen>
      <para>
        and submit it as explained above.
      </para>
      <para>
        If you did it right, it will be included in the next Wine
        release, and all the troublesome programs (especially
        remote-control programs) and games that use scancodes will
        be happily using your keyboard layout, and you won't get those
        annoying fixme messages either.
      </para>
    </sect1>


    <sect1 id="undoc-func">
      <title>Undocumented APIs</title>

	<para>
	  Some background:  On the i386 class of machines, stack entries are
	  usually dword (4 bytes) in size, little-endian.  The stack grows
	  downward in memory.  The stack pointer, maintained in the
	  <literal>esp</literal> register, points to the last valid entry;
	  thus, the operation of pushing a value onto the stack involves
	  decrementing <literal>esp</literal> and then moving the value into
	  the memory pointed to by <literal>esp</literal>
	  (i.e., <literal>push p</literal> in assembly resembles
	  <literal>*(--esp) = p;</literal> in C).  Removing (popping)
	  values off the stack is the reverse (i.e., <literal>pop p</literal>
	  corresponds to <literal>p = *(esp++);</literal> in C).
	</para>

	<para>
	  In the <literal>stdcall</literal> calling convention, arguments are
	  pushed onto the stack right-to-left.  For example, the C call
	  <function>myfunction(40, 20, 70, 30);</function> is expressed in
	  Intel assembly as:
	  <screen>
    push 30
    push 70
    push 20
    push 40
    call myfunction
	  </screen>
	  The called function is responsible for removing the arguments
	  off the stack.  Thus, before the call to myfunction, the
	  stack would look like:
	  <screen>
             [local variable or temporary]
             [local variable or temporary]
              30
              70
              20
    esp ->    40
	  </screen>
	  After the call returns, it should look like:
	  <screen>
             [local variable or temporary]
    esp ->   [local variable or temporary]
	  </screen>
	</para>

	<para>
	  To restore the stack to this state, the called function must know how
	  many arguments to remove (which is the number of arguments it takes).
	  This is a problem if the function is undocumented.
	</para>

	<para>
	  One way to attempt to document the number of arguments each function
	  takes is to create a wrapper around that function that detects the
	  stack offset.  Essentially, each wrapper assumes that the function will
	  take a large number of arguments.  The wrapper copies each of these
	  arguments into its stack, calls the actual function, and then calculates
	  the number of arguments by checking esp before and after the call.
	</para>

	<para>
	  The main problem with this scheme is that the function must actually
	  be called from another program.  Many of these functions are seldom
	  used.  An attempt was made to aggressively query each function in a
	  given library (<filename>ntdll.dll</filename>) by passing 64 arguments,
	  all 0, to each function.  Unfortunately, Windows NT quickly goes to a
	  blue screen of death, even if the program is run from a
	  non-administrator account.
	</para>

	<para>
	  Another method that has been much more successful is to attempt to
	  figure out how many arguments each function is removing from the
	  stack.  This instruction, <literal>ret hhll</literal> (where
	  <symbol>hhll</symbol> is the number of bytes to remove, i.e. the
	  number of arguments times 4), contains the bytes
	  <literal>0xc2 ll hh</literal> in memory.  It is a reasonable
	  assumption that few, if any, functions take more than 16 arguments;
	  therefore, simply searching for
	  <literal>hh == 0 && ll &lt; 0x40</literal>  starting from the
	  address of a function yields the correct number of arguments most
	  of the time.
	</para>

	<para>
	  Of course, this is not without errors. <literal>ret 00ll</literal>
	  is not the only instruction that can have the byte sequence
	  <literal>0xc2 ll 0x0</literal>; for example,
	  <literal>push 0x000040c2</literal> has the byte sequence
	  <literal>0x68 0xc2 0x40 0x0 0x0</literal>, which matches
	  the above.  Properly, the utility should look for this sequence
	  only on an instruction boundary; unfortunately, finding
	  instruction boundaries on an i386 requires implementing a full
	  disassembler -- quite a daunting task.  Besides, the probability
	  of having such a byte sequence that is not the actual return
	  instruction is fairly low.
	</para>

	<para>
	  Much more troublesome is the non-linear flow of a function.  For
	  example, consider the following two functions:
	  <screen>
    somefunction1:
        jmp  somefunction1_impl

    somefunction2:
        ret  0004

    somefunction1_impl:
        ret  0008
	  </screen>
	  In this case, we would incorrectly detect both
	  <function>somefunction1</function> and
	  <function>somefunction2</function> as taking only a single
	  argument, whereas <function>somefunction1</function> really
	  takes two arguments.
	</para>

	<para>
	  With these limitations in mind, it is possible to implement more stubs
	  in Wine and, eventually, the functions themselves.
	</para>
    </sect1>

    <sect1 id="accel-impl">
      <title>Accelerators</title>

      <para>
        There are <emphasis>three</emphasis> differently sized
        accelerator structures exposed to the user:
      </para>
      <orderedlist>
        <listitem>
          <para>
            Accelerators in NE resources.  This is also the internal
            layout of the global handle <type>HACCEL</type> (16 and
            32) in Windows 95 and Wine. Exposed to the user as Win16
            global handles <type>HACCEL16</type> and
            <type>HACCEL32</type> by the Win16/Win32 API.
	    These are 5 bytes long, with no padding:
            <programlisting>
BYTE   fVirt;
WORD   key;
WORD   cmd;
            </programlisting>
          </para>
        </listitem>
        <listitem>
          <para>
            Accelerators in PE resources. They are exposed to the user
	    only by direct accessing PE resources.
	    These have a size of 8 bytes:
          </para>
          <programlisting>
BYTE   fVirt;
BYTE   pad0;
WORD   key;
WORD   cmd;
WORD   pad1;
          </programlisting>
        </listitem>
        <listitem>
          <para>
            Accelerators in the Win32 API.  These are exposed to the
            user by the  <function>CopyAcceleratorTable</function>
	    and  <function>CreateAcceleratorTable</function> functions
	    in the Win32 API.
	    These have a size of 6 bytes:
          </para>
          <programlisting>
BYTE   fVirt;
BYTE   pad0;
WORD   key;
WORD   cmd;
          </programlisting>
        </listitem>
      </orderedlist>

      <para>
        Why two types of accelerators in the Win32 API? We can only
        guess, but my best bet is that the Win32 resource compiler
        can/does not handle struct packing. Win32 <type>ACCEL</type>
        is defined using <function>#pragma(2)</function> for the
        compiler but without any packing for RC, so it will assume
        <function>#pragma(4)</function>.
      </para>

    </sect1>

    <sect1 id="hardware-trace">
      <title>Doing A Hardware Trace</title>

      <para>
        The primary reason to do this is to reverse engineer a
        hardware device for which you don't have documentation, but
        can get to work under Wine.
      </para>
      <para>
        This lot is aimed at parallel port devices, and in particular
        parallel port scanners which are now so cheap they are
        virtually being given away. The problem is that few
        manufactures will release any programming information which
        prevents drivers being written for Sane, and the traditional
        technique of using DOSemu to produce the traces does not work
        as the scanners invariably only have drivers for Windows.
      </para>
      <para>
        Presuming that you have compiled and installed wine the first
        thing to do is is to enable direct hardware access to your
        parallel port. To do this edit <filename>config</filename>
        (usually in <filename>~/.wine/</filename>) and in the
        ports section add the following two lines
      </para>
      <programlisting>
read=0x378,0x379,0x37a,0x37c,0x77a
write=0x378,x379,0x37a,0x37c,0x77a
      </programlisting>
      <para>
        This adds the necessary access required for SPP/PS2/EPP/ECP
        parallel port on LPT1. You will need to adjust these number
        accordingly if your parallel port is on LPT2 or LPT0.
      </para>
      <para>
        When starting wine use the following command line, where
        <literal>XXXX</literal> is the program you need to run in
        order to access your scanner, and <literal>YYYY</literal> is
        the file your trace will be stored in:
      </para>
      <programlisting>
wine -debugmsg +io XXXX 2&gt; &gt;(sed 's/^[^:]*:io:[^ ]* //' &gt; YYYY)
      </programlisting>
      <para>
        You will need large amounts of hard disk space (read hundreds
        of megabytes if you do a full page scan), and for reasonable
        performance a really fast processor and lots of RAM.
      </para>
      <para>
	You will need to postprocess the output into a more manageable
	format, using the <command>shrink</command> program. First
        you need to compile the source (which is located at the end of
	this section):
      <programlisting>
cc shrink.c -o shrink
      </programlisting>
      </para>
      <para>
        Use the <command>shrink</command> program to reduce the
        physical size of the raw log as follows:
      </para>
      <programlisting>
cat log | shrink &gt; log2
      </programlisting>
      <para>
        The trace has the basic form of
      </para>
      <programlisting>
XXXX &gt; YY @ ZZZZ:ZZZZ
      </programlisting>
      <para>
        where <literal>XXXX</literal> is the port in hexidecimal being
        accessed, <literal>YY</literal> is the data written (or read)
        from the port, and <literal>ZZZZ:ZZZZ</literal> is the address
        in memory of the instruction that accessed the port. The
        direction of the arrow indicates whether the data was written
        or read from the port.
      </para>
      <programlisting>
&gt; data was written to the port
&lt; data was read from the port
      </programlisting>
      <para>
        My basic tip for interpreting these logs is to pay close
        attention to the addresses of the IO instructions. Their
        grouping and sometimes proximity should reveal the presence of
        subroutines in the driver. By studying the different versions
        you should be able to work them out. For example consider the
        following section of trace from my UMAX Astra 600P
      </para>
      <programlisting>
0x378 &gt; 55 @ 0297:01ec
0x37a &gt; 05 @ 0297:01f5
0x379 &lt; 8f @ 0297:01fa
0x37a &gt; 04 @ 0297:0211
0x378 &gt; aa @ 0297:01ec
0x37a &gt; 05 @ 0297:01f5
0x379 &lt; 8f @ 0297:01fa
0x37a &gt; 04 @ 0297:0211
0x378 &gt; 00 @ 0297:01ec
0x37a &gt; 05 @ 0297:01f5
0x379 &lt; 8f @ 0297:01fa
0x37a &gt; 04 @ 0297:0211
0x378 &gt; 00 @ 0297:01ec
0x37a &gt; 05 @ 0297:01f5
0x379 &lt; 8f @ 0297:01fa
0x37a &gt; 04 @ 0297:0211
0x378 &gt; 00 @ 0297:01ec
0x37a &gt; 05 @ 0297:01f5
0x379 &lt; 8f @ 0297:01fa
0x37a &gt; 04 @ 0297:0211
0x378 &gt; 00 @ 0297:01ec
0x37a &gt; 05 @ 0297:01f5
0x379 &lt; 8f @ 0297:01fa
0x37a &gt; 04 @ 0297:0211
      </programlisting>
      <para>
        As you can see there is a repeating structure starting at
        address <literal>0297:01ec</literal> that consists of four io
        accesses on the parallel port. Looking at it the first io
        access writes a changing byte to the data port the second
        always writes the byte <literal>0x05</literal> to the control
        port, then a value which always seems to
        <literal>0x8f</literal> is read from the status port at which
        point a byte <literal>0x04</literal> is written to the control
        port. By studying this and other sections of the trace we can
        write a C routine that emulates this, shown below with some
        macros to make reading/writing on the parallel port easier to
        read.
      </para>
      <programlisting>
#define r_dtr(x)        inb(x)
#define r_str(x)        inb(x+1)
#define r_ctr(x)        inb(x+2)
#define w_dtr(x,y)      outb(y, x)
#define w_str(x,y)      outb(y, x+1)
#define w_ctr(x,y)      outb(y, x+2)

/* Seems to be sending a command byte to the scanner */
int udpp_put(int udpp_base, unsigned char command)
{
    int loop, value;

    w_dtr(udpp_base, command);
    w_ctr(udpp_base, 0x05);

    for (loop=0; loop &lt; 10; loop++)
        if ((value = r_str(udpp_base)) & 0x80)
	{
            w_ctr(udpp_base, 0x04);
            return value & 0xf8;
        }

    return (value & 0xf8) | 0x01;
}
      </programlisting>
      <para>
        For the UMAX Astra 600P only seven such routines exist (well
        14 really, seven for SPP and seven for EPP). Whether you
        choose to disassemble the driver at this point to verify the
        routines is your own choice. If you do, the address from the
        trace should help in locating them in the disassembly.
      </para>
      <para>
        You will probably then find it useful to write a script/perl/C
        program to analyse the logfile and decode them futher as this
        can reveal higher level grouping of the low level routines.
        For example from the logs from my UMAX Astra 600P when decoded
        further reveal (this is a small snippet)
      </para>
      <programlisting>
start:
put: 55 8f
put: aa 8f
put: 00 8f
put: 00 8f
put: 00 8f
put: c2 8f
wait: ff
get: af,87
wait: ff
get: af,87
end: cc
start:
put: 55 8f
put: aa 8f
put: 00 8f
put: 03 8f
put: 05 8f
put: 84 8f
wait: ff
      </programlisting>
      <para>
        From this it is easy to see that <varname>put</varname>
        routine is often grouped together in five successive calls
        sending information to the scanner. Once these are understood
        it should be possible to process the logs further to show the
        higher level routines in an easy to see format. Once the
        highest level format that you can derive from this process is
        understood, you then need to produce a series of scans varying
        only one parameter between them, so you can discover how to
        set the various parameters for the scanner.
      </para>

      <para>
        The following is the <filename>shrink.c</filename> program:
      <programlisting>
/* Copyright David Campbell &lt;campbell@torque.net&gt; */
#include &lt;stdio.h&gt;
#include &lt;string.h&gt;

int main (void)
{
    char buff[256], lastline[256] = "";
    int count = 0;

    while (!feof (stdin))
    {
        fgets (buff, sizeof (buff), stdin);
        if (strcmp (buff, lastline))
	{
	    if (count &gt; 1)
	        printf ("# Last line repeated %i times #\n", count);
	    printf ("%s", buff);
	    strcpy (lastline, buff);
	    count = 1;
	}
        else count++;
    }
    return 0;
}
      </programlisting>
      </para>
    </sect1>

  </chapter>

<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-parent-document:("wine-devel.sgml" "set" "book" "part" "chapter" "")
End:
-->