Writing a Disassembler in Assembly...
XAfter learning the basics and even opening a window (WOAH!), I felt like I needed to do something actually cool and useful. That is why I decided on writing one for the SUPER-CHIP platform. In previous years I have made a SUPER-CHIP emulator, so I'm not coming into this project completely empty handed, but it will definitely be a journey.
I began by figuring out how to read a file to a buffer, quite an essential step for the flow of the disassembler, as I have to read bytes from a source file in order to output the disassembly. Reading a full file to a buffer is usually done in C by first opening a file stream, inquiring the length of the file in bytes, then dynamically allocating a buffer on the heap and reading the file byte by byte into memory.
fopen -> fseek -> ftell -> rewind -> malloc -> fread -> fclose
Doing this is fairly simple in assembly. First, I began by reserving space in the bss section for three double words: the FILE pointer, the buffer (char*), and the filesize. These are useful for avoiding complex spinning around of registers and also persistence in the program.
; readSourceFile(char* SOURCE_URL)
readSourceFile:
push ebp
mov ebp, esp
mov eax, [ebp + 8] ; move SOURCE_URL to EAX
; open stream
push readMode
push eax
call _fopen
add esp, 0x8
mov [filePointer], eax
; get file size
push 0x2
push 0x0
mov eax, [filePointer]
push eax
call _fseek
add esp, 0xC
mov eax, [filePointer]
push eax
call _ftell
add esp, 0x4
mov [fileSize], eax
mov eax, [filePointer]
push eax
call _rewind
add esp, 0x4
; allocate space on the heap for buffer
mov eax, [fileSize]
inc eax
push eax
call _malloc
add esp, 0x4
mov [fileBuffer], eax
; read source into buffer
mov eax, [filePointer]
push eax
mov eax, [fileSize]
push eax
push 1
mov eax, [fileBuffer]
push eax
call _fread
add esp, 0x10
; null terminator
mov eax, [fileSize]
add eax, [fileBuffer]
mov [eax], byte 0x00
; close file
mov eax, [filePointer]
push eax
call _fclose
add esp, 0x4
mov esp, ebp
pop ebp
ret
While writing the above function, I ran into something that messed with my brain for a while. When opening the file with fopen, the function returns a pointer to the start of the buffer, which is stored in EAX. Logically, I move this pointer into the location reserved in the bss section with mov [filePointer], eax. The thing is, when I want to use the pointer stored in filePointer, I have to use [filePointer] instead of just filePointer. This may seem obvious (and it is), but the fact that there is a kind of double pointer at work messed with my logic for a while, before realizing what the issue was.
Now that we have the source file read into a buffer in memory, we have to setup the loop for iterating the buffer and the local variables that are needed to parse the two-byte opcodes.
sub esp, 24 ; space for 6 local variables
%define opcode ebp - 4
%define x ebp - 8
%define y ebp - 12
%define nnn ebp - 16
%define nn ebp - 20
%define n ebp - 24
First, I subtract the amount of bytes needed from the current stack pointer (remember, arguments are stored
in higher addresses, as the stack grows downwards - therfore we store local variables in lower addresses). The SUPER-CHIP opcodes consist of a few variables and some fixed bytes. The data is encoded in opcodes as follows:
NNN: a 12-bit address
NN: an 8-bit constant
N: a 4-bit constant
X and Y: 4-bit register identifiers
This standard way of evaluating the opcodes is crucial for parsing a source file and finally disassembling it. Now begins the fun part: reading the opcodes two bytes at a time from the buffer and using bit arithmetics to retrieve the correct variables from the opcodes.
; get the array pointer
mov eax, [fileBuffer]
; read two bytes from the array and combine them
movzx ebx, byte [eax+esi] ; h
shl ebx, 0x8
movzx ecx, byte [eax+esi+1] ; l
or ebx, ecx
; store the opcode
mov [opcode], ebx
First, we fetch the pointer to the array. Then we move two sequential bytes from the buffer using the base pointer EAX and the offset ESI, which works as a program counter in the loop (until it hits the file length). But why movzx? In x86, the registers are 32-bit, but in our case, we are only reading a single byte from the specified memory location. This means that we have to let the CPU know what to do with the other 3 bytes. In the case of movzx, it will zero-extend the register, i.e. set the remaining high-bits to zeroes. In this way, we will have no garbage data left in the register and the bitshift operation will be complete successfully. Finally, we OR the registers together and get the final opcode. NOTE! Chip-8 is big-endian, so higher bits are stored in lower addresses, like you can see from the code.
; X
mov eax, [opcode]
and eax, 0x0F00
shr eax, 8
mov [x], eax
; Y
mov eax, [opcode]
and eax, 0x00F0
shr eax, 4
mov [y], eax
; NNN
mov eax, [opcode]
and eax, 0x0FFF
mov [nnn], eax
; NN
mov eax, [opcode]
and eax, 0x00FF
mov [nn], eax
; N
mov eax, [opcode]
and eax, 0x000F
mov [n], eax
We do more bit arithmetics and store the rest of the variables in their respective variables for the disassembly. Basically, we zero-out everything else except the variable itself in the opcode and bit shift it to be contained in the least significant bits possible. The hard past is now over. Next comes the tedious part, where I have to program a huge switch statement with a branch for each opcode. In addition, I will have to figure out a naming scheme for the opcodes and a way to print them to the console. I structured the control flow as follows: First, I check the first 4 bits of the opcode, due to the scheme having the first 4 bits as a fixed number from 0 to F. In some cases, such as for 1, 2, 3, and 4, there are no subcases. But for numbers such as 8 and 0xF, there are multiple subcases, which can be handled by comparing latter bits, such as the last four (n) or eight (nn) bits.
; compare first byte
mov eax, [opcode]
and eax, 0xF000
cmp eax, 0x0000
jnz jump_one
; compare last byte
mov ebx, [opcode]
and ebx, 0x000F
cmp ebx, 0x0000
jnz jump_one_one
; compare nn
mov ebx, [nn]
cmp ebx, 0x009E
jnz jump_e_one
I will not cover the whole control flow, and will leave those three code snippets as examples. There is also a default case for each nesting level of the switch statement. Now that I have control flow figured out, I've run into a new problem: there are no mnemonic for SUPER-CHIP opcodes. My next task will be to figure out what to call each instruction and how to efficiently store them in my program. After comparing the opcodes to x86 instructions, I have come up with a decent mnemonic system that goes one-to-one with each opcode in SUPER-CHIP. I could have gotten away with less, but that is for another time.
clearStr: db "cls", 0xA, 0
retStr: db "ret", 0xA, 0
gotoStr: db "jmp %#04x", 0xA, 0
callStr: db "call %#04x", 0xA, 0
seStr: db "se V%d, %#04x", 0xA, 0
sneStr: db "sne V%d, %#04x", 0xA, 0
sevStr: db "sev V%d, %#04x", 0xA, 0
ldiStr: db "ldi V%d, %#04x", 0xA, 0
addStr: db "add V%d, %#04x", 0xA, 0
movStr: db "mov V%d, V%d", 0xA, 0
orStr: db "or V%d, V%d", 0xA, 0
andStr: db "and V%d, V%d", 0xA, 0
xorStr: db "xor V%d, V%d", 0xA, 0
addvStr: db "addv V%d, V%d", 0xA, 0
subvStr: db "subv V%d, V%d", 0xA, 0
shrStr: db "shr V%d, 1", 0xA, 0
shlStr: db "shl V%d, 1", 0xA, 0
subnStr: db "subn V%d, V%d", 0xA, 0
snevStr: db "snev V%d, V%d", 0xA, 0
ldi16Str: db "ldi16 %#04x", 0xA, 0
jpv0Str: db "jpv0 %#04x", 0xA, 0
rndStr: db "rnd V%d", 0xA, 0
drwStr: db "drw V%d V%d %#04x", 0xA, 0
skpStr: db "skp V%d", 0xA, 0
sknpStr: db "sknp V%d", 0xA, 0
lddtStr: db "lddt V%d", 0xA, 0
ldvtStr: db "ldv V%d", 0xA, 0
ldkStr: db "ldk V%d", 0xA, 0
ldstStr: db "ldst V%d", 0xA, 0
addiStr: db "addi V%d", 0xA, 0
ldfStr: db "ldf V%d", 0xA, 0
bcdStr: db "bcd V%d", 0xA, 0
storStr: db "stor", 0xA, 0
loadStr: db "load", 0xA, 0
rawStr: db "raw: %#04x", 0xA, 0
Above are all the 35 opcodes + a raw byte mnemonic for my disassembler. They are in the same order that the opcodes are mentioned on the Chip-8 Wikipedia page. The last part of writing this disassembler is printing out each instrcution correclty, which thankfully is the easiest part
mov ecx, [x]
push ecx
push shlStr
call _printf
add esp, 0x8
jmp jump_switch_end
Here is the disassembly for the shl instruction, 8XYE. First we move the precomputed x register identifier into ECX and push it to the stack. Then we push the corresponding string to the stack, call printf, fix the stack and move on to the next iteration of the loop.
That's it! My Super-Chip disassembler is now complete and my x86 skills are yet again enhanced. For my next project, I will briefly move out of the assembly language grind and write my own HTTP server simulator for reverse engineering malware that connects to an external server.