VX Heaven

Library Collection Sources Engines Constructors Simulators Utilities Links Forum

Heaven's Gate: 64-bit code in 32-bit file

roy g biv
Valhalla #1
June 2011

[Back to index] [Comments]

I found this technique in 2009, and I update it in 2011.

What is it?

On 64-bit platform, there is only one ntoskrnl.exe, and it is 64-bit code. It also uses a different calling convention (registers, so called "fastcall") compared to 32-bit code (stack, so called "stdcall", old name was "pascal"). So how can 32-bit code run on 64-bit platform? There is "thunking" layer in wow64cpu.dll, which saves 32-bit state, converts parameters to 64-bit form, then runs "Wow64SystemServiceEx" in wow64.dll. But 64-bit registers are visible only in 64-bit mode, so how does wow64cpu.dll work? Here is what I call Heaven's Gate, but first we must go back to ntdll.dll.

Thunking Layer

When an important function is called from a DLL like kernel32.dll, it calls into the native interface in ntdll.dll. The native interface powerful but mostly undocumented layer between user-mode and kernel-mode. For some detail, see my Chthon code in 29A#6. It used to be that to call into kernel mode, the code would do this:

        mov     eax, service
        lea     edx, dword ptr [esp + 4]
        int     2eh

In Windows XP, it became possible to use sysenter instead of int 2eh, for better performance. In 64-bit Windows, a "xor ecx, ecx" was added because of 64-bit pointer size, and the int 2eh was replaced by:

        call    dword ptr fs:[0c0h]

and now we are one step closer to Heaven's Gate. The field at fs:[0c0h] is called WOW32Reserved, and holds an address in wow64cpu.dll. If we follow the call, we reach a jump. A far jump. A special far jump. Heaven's Gate.

Heaven's Gate

The jump in wow64cpu.dll is a 64-bit gate. We can jump through it into the world of 64-bit code: 64-bit address space, 64-bit registers, 64-bit calls. We might think that jumping into wow64cpu.dll is useless because we cannot control where it goes after that, but of course we can change the address ourself to anywhere we like. We can alter the address inside wow64cpu.dll, we can alter the address at fs:[0c0h], or we can just call through the gate on our own. The gate maps the entire 4Gb of memory, and the selector value is always 33h. We can switch between the modes easily, too. All we need is the return address on the stack. We can switch modes in this long way:

        call    to64
        ;32-bit code continues here

        db      0eah    ;jmp 33:in64
        dd      offset in64
        dw      33h

        ;64-bit code goes here

To switch back to 32-bit code can be done this way:

        jmp     fword ptr [offset to32 - offset fr64]

        dd      offset in32
        dw      23h


Once in 64-bit mode, we can only use the native interface in ntdll.dll The 0eah-style jmp not supported in 64-bit mode, and there are no absolute memory addressing in 64-bit mode. All addressing is rip-relative, which is why the jmp is relative to the fr64 label.

Of course there's a simpler way, which looks like this:

        db      9ah     ;call 33:in64
        dd      offset in64
        dw      33h
        ;32-bit code continues here

        ;64-bit code goes here

To switch back to 32-bit code, just use a 32-bit retf. That's much easier.

Finding ntdll.dll

Once in 64-bit mode, we can only use the native interface in ntdll.dll because the kernel32.dll in our process memory is 32-bit, and won't run in 64-bit mode. We can get the base address of ntdll.dll this way:

        push    60h
        pop     rsi
        gs:lodsq        ;gs not fs
        mov     rax, qword ptr [rax+18h]
        mov     rax, qword ptr [rax+30h]
        mov     rax, qword ptr [rax+10h]

Mixing 32-bit and 64-bit

Best of all, Yasm now allows mixing 32-bit and 64-bit code in the same file. When I was writing Shrug48 (because half-way between 32-bit and 64-bit), this was not possible, so I had two source files that had to be built separately and then concatenated afterwards. Now with Yasm, we can use "bits 32" before the 32-bit code, and "bits 64" before the 64-bit code, anywhere in the file, and we can swap between them as much as we want, like this:

bits 32
        db      9ah     ;call 33:in64
        dd      offset in64
        dw      33h
        ;32-bit code continues here

bits 64
        push    60h
        pop     rsi
        gs:lodsq        ;gs not fs
        mov     rax, qword [rax+18h]
        mov     rax, qword [rax+30h]
        mov     rax, qword [rax+10h]

Another way to jump in a position-independent way is this:

        push    cs
        call    to64
        ;32-bit code continues here

        push    0cb0033h        ;combined selector 33h and retf
        call    to64 + 3
        ;now in 64-bit mode
        ;64-bit code goes here
        retf                    ;return to 32-bit mode

Current Directory

There is a separate current directory for 32-bit and 64-bit mode. Normally, the 64-bit current directory is never used, because all 32-bit APIs that work with the current directory do not switch to 64-bit first. We can make the directories the same by overwriting the 64-bit pointers with the 32-bit ones. Of course, we have to find the location for the 64-bit pointers, first. ;)

Even in 32-bit mode, there is a 64-bit Thread Information Block. It is 0x1000 after the 32-bit Thread Information Block. Inside the 64-bit TIB is a pointer to the 64-bit RTL_USER_PROCESS_PARAMETERS. At 0x28 bytes before the structure is the pointer to the current directory that is used by ntdll function RtlDosPathNameToRelativeNtPathName_U. There are other pointers to the current directory, but this is the one that we need.


We can use exceptions in 64-bit mode as usual, but SEH does not exist there. We must use Vectored Exception Handlers instead. There is also a small thing that surprised me. The 64-bit TIB has a context structure for saving 32-bit state during mode switching. During the switch, the esp slot is zeroed, and restored again afterwards. This prevents recursive switching from overwriting the context. This includes when an exception occurs. When exception occurs, no matter which mode, context is saved, and esp slot is zeroed. The problem is that when exception returns, esp slot is not restored. If exception occurs in 32-bit mode after that, then application will crash. So save esp slot from TIB (it is at gs:0x1480) if you will use exceptions in 64-bit mode.


Using the gate is another way to check for 64-bit support, without using the obvious IsWow64Process API call. Just place a SEH around the call, and if an exception occurs, then you are on a 32-bit platform. You can also check if gs selector is not zero. This is true only on the 64-bit platform.

64-bit code in 32-bit files. The ultimate emulator killer. ;)

Greets to friendly people (A-Z): Active - Benny - herm1t - hh86 - izee - jqwerty - Malum - Obleak - Prototype - Ratter - Ronin - RT Fishel - sars - SPTH - The Gingerbread Man - Ultras - uNdErX - Vallez - Vecna - Whitehead

rgb/defjam jun 2009/apr 2011
[email protected]
By accessing, viewing, downloading or otherwise using this content you agree to be bound by the Terms of Use! aka