Most of the exploits based on buffer overflows aim at forcing the execution of malicious code, mainly in order to provide a root shell to the user. The principle is quite simple: malicious instructions are stored in a buffer, which is overflowed to allow an unexpected use of the process, by altering various memory sections.
Thus, we will introduce in this document the way a process is mapped in the machine memory, as well as the buffer notion; then we will focus on two kinds of exploits based on buffer overflow : stack overflows and heap overflows.
1.1 Process memory 1.1.1 Global organization
When a program is executed, its various elements (instructions, variables... ) are mapped in memory, in a structured manner.
The highest zones contain the process environment as well as its arguments: env strings, arg strings, env pointers (figure1.1).
The next part of the memory consists of two sections, the stack and the heap, which are allocated at run time.
The stack is used to store function arguments, local variables, or some information allowing to retrieve the stack state before a function call... This stack is based on a LIFO (Last In, First Out) access system, and grows toward the low memory addresses.
Dynamically allocated variables are found in the heap; typically, a pointer refers to a heap address, if it is returned by a call to the malloc function.
The .bss and .data sections are dedicated to global variables, and are allocated at compilation time. The .data section contains static initialized data, whereas uninitialized data may be found in the .bss section.
The last memory section, .text, contains instructions (e.g the program code) and may include read-only data.
Short examples may be really helpful for a better understanding; let us see where each kind of variable is stored:
high adresses env strings
argv strings
env pointers
argv pointers
argc stack
I
heap .bss
.data
low adresses .text
Figure 1.1: Process memory organization
heap
int main(){
char * tata = malloc(3); }
tata points to an address wich is in the heap.
.bss
char global; int main (){
}
int main(){
static int bss_var;
}
global and bss_var will be in .bss
.data
char global = 'a';
int main(){
}
int main(){
static char data_var = 'a'; }
global and data_var will be in .data. 1.1.2 Function calls
We will now consider how function calls are represented in memory (in the stack to be more accurate), and try to understand the involved mechanisms.
On a Unix system, a function call may be broken up in three steps:
1. prologue: the current frame pointer is saved. A frame can be viewed as a logical unit of the stack, and contains all the elements related to a function.The amount of memory which is necessary for the function is reserved.
2. call: the function parameters are stored in the stack and the instruction pointer is saved, in order to know which instruction must be considered when the function returns.
3. return(or epilogue): the old stack state is restored.
A simple illustration helps to see how all this works, and will allow us a better understanding of the most commonly used techniques involved in buffer overflow exploits. Let us consider this code:
int toto(int a, int b, int c){ int i=4; return (a+i);
}
int main(int argc, char **argv){ toto(0, 1, 2); return 0;
}
We now disassemble the binary using gdb, in order to get more details about these three steps. Two registers are mentionned here: EBP points to the current frame (frame pointer), and ESP to the top of the stack.
First, the main function:
(gdb) disassemble main
Dump of assembler code for function main: 0x80483e4 <main>: push %ebp 0x80483e5 <main+1>: mov %esp,%ebp 0x80483e7 <main+3>: sub $0x8,%esp
That is the main function prologue. For more details about a function prologue, see further on (the toto() case).
0x80483ea <main+6>: add
$0xfffffffc,7,esp
0x80483ed <main+9>: push $0x2 0x80483ef <main+11>: push $0x1 0x80483f1 <main+13>: push $0x0 0x80483f3 <main+15>: call 0x80483c0 <toto>
The toto() function call is done by these four instructions: its parameters are piled (in reverse order) and the function is invoked.
0x80483f8 <main+20>: add $0x10,7esp
This instruction represents the toto() function return in the main() function: the stack pointer points to the return address, so it must be incremented to point before the function parameters (the stack grows toward the low addresses!). Thus, we get back to the initial environment, as it was before toto() was called.
0x80483fb <main+23>: xor 7eax,7eax 0x80483fd <main+25>: jmp 0x8048400 <main+28> 0x80483ff <main+27>: nop
0x8048400 <main+28>: leave 0x8048401 <main+29>: ret End of assembler dump.
The last two instructions are the main() function return step.
Now let us have a look to our toto() function: (gdb) disassemble toto
Dump of assembler code for function toto:
0x80483c0 <toto>: push 7ebp
0x80483c1 <toto+1>: mov 7esp,7ebp 0x80483c3 <toto+3>: sub $0x18,7esp
This is our function prologue: %ebp initially points to the environment; it is piled (to save this current environment), and the second instruction makes %ebp points to the top of the stack, which now contains the initial environment address. The third instruction reserves enough memory for the function (local variables).
0x80483c6 <toto+6>: movl
0x80483cd <toto+13>: mov
0x80483d0 <toto+16>: mov
0x80483d3 <toto+19>: lea
0x80483d6 <toto+22>: mov
0x80483d8 <toto+24>: jmp
0x80483da <toto+26>: lea
$0x4,0xfffffffc(7.ebp) 0x8('7ebp),'7eax 0xfffffffc('7ebp),'7ecx (7ecx,7eax,1),7edx 7edx,7eax
0x80483e0 <toto+32> 0x0(7esi),7esi
These are the function instructions...
0x80483e0 <toto+32>: leave 0x80483e1 <toto+33>: ret End of assembler dump.
(gdb)
The return step (ar least its internal phase) is done with these two instructions. The first one makes the %ebp and %esp pointers retrieve the value they had before the prologue (but not before the function call, as the stack pointers still points to an address which is lower than the memory zone where we find the toto() parameters, and we have just seen that it retrieves its initial value in the main() function). The second instruction deals with the instruction register, which is visited once back in the calling function, to know which instruction must be executed.
This short example shows the stack organization when functions are called. Further in this document, we will focus on the memory reservation. If this memory section is not carefully managed, it may provide opportunities to an attacker to disturb this stack organization, and to execute unexpected code.
That is possible because, when a function returns, the next instruction address is copied from the stack to the EIP pointer (it was piled impicitly by the call instruction). As this address is stored in the stack, if it is possible to corrupt the stack to access this zone and write a new value there, it is possible to specify a new instruction address, corresponding to a memory zone containing malevolent code.
We will now deal with buffers, which are commonly used for such stack attacks.
1.2 Buffers, and how vulnerable they may be
In C language, strings, or buffers, are represented by a pointer to the address of their first byte, and we consider we have reached the end of the buffer when we see a NULL byte. This means that there is no way to set precisely the amount of memory reserved for a buffer, it all depends on the number of characters.
Now let us have a closer look to the way buffers are organized in memory.
First, the size problem makes restricting the memory allocated to a buffer, to prevent any overflow, quite difficult. That is why some trouble may be observed, for instance when strcpy is used without care, which allows a user to copy a buffer into another smaller one !
Here is an illustration of this memory organization: the first example is the storage of the wxy buffer, the second one is the storage of two consecutive buffers, wxy and then abcde.
\0 i y i x i w
Buffer "wxy" in memory
| ||
\0 y x | w | |
Jill d c | \0 | e |
| b | a |
|
Unused byte
Buffers "abcde" and "wxy" in memory
Figure 1.2: Buffers in memory
Note that on the right side case, we have two unused bytes because words (four byte sections) are used to store data. Thus, a six byte buffer requires two words, or height bytes, in memory.
Buffer vulnerabilty is shown in this program:
#include <stdio.h>
int main(int argc, char **argv){ char jayce[4]="Oum"; char herc[8]="Gillian";
strcpy(herc, "BrookFlora"); printf("%s\n", jayce);
return 0;
Two buffers are stored in the stack just as shown on figure 1.3. When ten characters are copied into a buffer which is supposed to be only eight byte long, the first buffer is modified.
This copy causes a buffer overflow, and here is the memory organization before and after the call to strcpy:
\0 | m | u | O |
\0 | n | a | i |
l | l | i | G |
\0 \0 a r | |||
o l F | k | ||
i ~i ■ o i o i r | B |
Initial stack organization After the overflow
Figure 1.3: Overflow consequences
Here is what we see when we run our program, as expected:
alfred@atlantis:~$ gcc jayce.c alfred@atlantis:~$ ./a.out ra
alfred@atlantis:~$
That is the kind of vulnerability used in buffer overflow exploits.
}
No comments:
Post a Comment