StuBS
Loading...
Searching...
No Matches
x86-ABI: Register and stack layout

*Reminder: We assume here that the NASM assembler is used. It expects commands in the format: OP target, source.

For thread switching in software, the state of the processor must be saved in order to restore it later and continue the execution of the program at this point. What belongs to the state of the processor is highly dependent on the architecture and may need to be researched in detail in the ABI documentation. In this course we use the x86 architecture and only a small part of it, i.e. we will only concentrate on the essential registers.

An Application Binary Interface (ABI) in general is a calling convention that is defined by the architecture as well as by the compiler and programming language. It regulates how parameters are passed to functions and how structures are stored in memory. The convention serves as an interface both between separately compiled code (possibly different compilers) and between manually written assembler functions and generated code. An ABI is particularly important when using shared libraries, as these are pre-compiled and can therefore only be used with a specific ABI. The object files used in our case also require an ABI, as the linker also works with precompiled objects (unlike shared libraries, which are only loaded into the address space at load time). The C ABI has become widely accepted (mainly for historical reasons) and can be accessed from many other compiled languages. It has thus become the standard interface between different programming languages. We also want to use it in the following to mediate between assembler and C++ code.

Register

x86 has 8 general purpose registers, a flags register (eflags) and the instruction pointer (eip). The general purpose registers are described below. They originally had a fixed purpose and therefore have corresponding names. However, most registers can now be used for virtually all instructions:

  • eax (accumulator for some arithmetic instructions)
  • ebx (formerly base)
  • ecx (counter for loops)
  • edx (data register)
  • esi (source index for string operations)
  • edi (destination index for string operations)
  • ebp (base pointer)
  • esp (stack pointer)

x86 began as a 16-bit microprocessor family and initially had ax, bx, cx, dx, si, di, bp and sp, each of which was 16\,bit wide. Furthermore, the registers could be divided into the lower and upper byte, e.g. into ah and al, if 8-bit precision was required. Many 8086 instructions could only be executed with certain registers.

In 32-bit (protected) mode, the registers were extended by 16 bits, so that ax became part of eax. Many instructions were adopted in such a way that they became free in the use of their registers. Register splitting can still be carried out today, i.e. eax can be split into ax, and this in turn into ah and al if necessary.

esi and edi have a special meaning for string operations. esp is important for the stack because the stack pointer points to the last entry of the stack.

Volatile vs. non-volatile registers

In programs, calls to sub-functions are a common mechanism. In a call, a distinction is made between the calling function (caller) and the called function (callee). Both caller and callee need registers to do their work. The caller, however, often expects the same value in its registers before and after the call. This means that the call should be transparent for the caller with regard to the registers (an exception is the eax register, in which the return value is passed). Ideally, we want two completely separate sets of registers for each function call.

Unfortunately, registers are limited on the hardware side, so the usual way is to save registers to the stack before the call and restore them after the call. This saving can happen on the side of the caller, i.e. before the call instruction (caller-saved, volatile in relation to the caller) or on the side of the callee (callee-saved, non-volatile in relation to the caller). The callee would first save the register values to the stack and restore the registers at the end of the function before the ret instruction.

The x86 ABI divides the registers into two categories. The volatile registers are eax, ecx, edx and eflags, all others are non-volatile. I.e.: If the caller wants to continue computing in eax, ecx, edx and eflags, it must store them away itself; for all other registers, the called function takes care of this if it needs the registers.

Passing parameters via the stack

On x86, the function parameters are passed via the stack. Let's take the following function as an example, which expects two parameters:

int foobar ( int a, int b ) {
// placeholder
return a+b;
}

A calling function places the parameters b and a (i.e. in reverse order) on the stack. The function is then called using the call instruction. This instruction places the return address on the stack and jumps to the address at which the function begins.

The calling function could therefore implement the function call to foobar as follows:

caller:
// ...
// Assumption: a is stored in eax
// b is stored in ebx
push ebx;
push eax;
call foobar;
// ...

With the first instruction of foobar the stack looks like this (the stack grows downwards for x86, i.e. from high addresses to low addresses):

0xff
...
b
a
return address<- esp
...
0x00

If the called function wants to access the parameters, the values must be retrieved from the stack, e.g. indirectly via the stack pointer. In x86 assembler, the mov instruction can provide a pointer with an offset and load the resulting address. This can be done with the esp, e.g. like this:

foobar:
// esp points to the _lowest_ place of the return address
// addresses are 4 bytes long on x86
mov eax, [esp+4]; // eax now contains a
mov ebx, [esp+8]; // ebx now contains b
// ...

Addresses on x86 are 4 bytes in size and the esp points to the return address. So esp is the return address, esp+4 is the first parameter and esp+8 is the second parameter if both parameters are 32 bits in size.

If non-volatile registers are now to be used, they must be saved to the stack by the function. However, the esp is then shifted! The volatile registers have already been saved (if necessary) by the calling function.

Return values

With x86, return values are passed via the eax register. This means that at the end of the function – at the time the ret instruction is executed – the return value must be in the eax register. If the calling function expects a return value, it will read this register. If no return value has been set, the last set value is read.

For example, the return inside the foobar function could look like this:

foobar:
// ...
add eax, ebx // eax = eax + ebc
ret

After the return statement in foobar, the calling function will clean up the stack:

caller:
// ...
call foobar
add esp, 8

The instructions call and ret have an influence on the control flow. In addition to the actual jump to the function, call places the return address on the stack. ret expects the return address as the top value on the stack, removes it from the stack and jumps to the specified position. The instruction following the call is executed next.