Over the past several months, I've been intensively researching BIOS-related binaries. The primary tool I used is still IDA Pro. However, I noted that the speed at which my understanding of the binary evolves still nowhere near what it should be. Therefore, I have been evaluating my method for a while and come-up with these steps:
- Make sure you know the context at which the certain part(s) of the BIOS/UEFI binary executes. This way, you can maximize the presence of documentations or specifications related to that particular part of the binary.
- Mark "alleged" data structures (arrays, structures, etc.) with readable names. Do not worry about changing the names later as long as you keep track of where they are located if they are global variables.
- Create IDA Pro structures which would help improve readability of your disassembly. Again, do not worry too much about correctness at this point because you will know it further down the line. The bottom line is reverse code engineering consists of repeated refinement steps which improves the understanding of the code along the way. The point is you needto create clues for further refinement steps.
- Zoom-out function graph (not function call-graph but the single function graph) to get the "big picture" of the function elements especially to recognize loops and branches. This way, it's faster to understand the function. Refine the "alleged" data structure as needed, as you go through the function analysis. Also don't forget to give meaningful name to the function.
- (Optional). If you think writing C/C++ code as representation for the function could help, then do so. In my case, it helps tremendously, especially because th binary I'm working with is compiled with C compiler (evident from the stack and parameter usage).
- Making standardized information in the anterior line of the first function instruction helps tremendously. I think at least these information should be there:
- Function name, preferably along with its starting address. The address is important because name could change as our understanding evolves during the reverse engineering activity. It's particularly important because the documentation generated from the reverse engineering surely refers to several key functions.
- Function description, i.e. what it does in general sense.
- Function parameters (input and output).
- Data structures and global variables modified by the function.
- Kris Kaspersky-style commenting is rather useful for me. He places a lengthy (but enlightening comment) below specific lines. For example:
main proc near ; CODE XREF: start+AF?p push esi push 8 call ??2@YAPAXI@Z ; operator new(unit) ; Using the new operator, we allocate 8 bytes for the instance of ; some object. Generally, it's not very certain at all that memory ; is allocated for an object (there might be something like ; char *x = new char[8]), so let's not, consider this assertion ; as dogma, and accept it as a working hypothesis. ; Further analysis will show how matters actually stand. mov esi, eax add esp, 4
I found that this commenting style helps especially when we have just starting with the reverse engineering task. - The use of virtual machine could help because it enables us to debug certain BIOS/UEFI part(s). For example: pci option rom debugging with seabios.
- The remaining thing which I'm still not used to is: How to use the function call graphs effectively?