Monday, December 10, 2018

Practical Reverse Engineering - Chapter 1 pg 35 - Exercise #5 - KeInitializeDPC and KeInitializeThreadedDpc

Question: Decompile the following kernel routines in Windows:
  • KeInitializeDpc

KeInitializeDpc
So for starters lets see if there is any MSDN documentation on this function. Looking things up we see this link: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/nf-wdm-keinitializedpc
So from this we can tell the function header should be the following:
NTKERNELAPI VOID KeInitializeDpc(
    PRKDPC Dpc,
    PKDEFERRED_ROUTINE DeferredRoutine,
    PVOID DeferredContext
);
The corresponding disassembly of this function can be seen below:
; void __stdcall KeInitializeDpc(PRKDPC Dpc, PKDEFERRED_ROUTINE DeferredRoutine, PVOID DeferredContext)
public KeInitializeDpc
KeInitializeDpc proc near
xor     eax, eax
mov     dword ptr [rcx], 113h
mov     [rcx+38h], rax
mov     [rcx+10h], rax
mov     [rcx+18h], rdx
mov     [rcx+20h], r8
retn
KeInitializeDpc endp
Looking at the definition of this function, we immediately notice that RCX appears to be some sort of structure that is getting filled out. According to x64 documentation, the first argument is passed in RCX, the second in RDX, the third in R8 and the fourth in R9. This would mean that RCX corresponds to PRKDPC Dpc.

But what is PRKDPC? If we refer back again to https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/nf-wdm-keinitializedpc, we can see this is a pointer to a KDPC object. Using the magic of WinDBG, we can dump the outline of this structure, even though its supposed to be an opaque structure (aka "we may change this structure so don't rely on the details of it remaining the same"), using the dt command. Specifically, dt nt!_KDPC will give us the details we need:
0:010> dt nt!_KDPC
ntdll!_KDPC
   +0x000 TargetInfoAsUlong : Uint4B
   +0x000 Type             : UChar
   +0x001 Importance       : UChar
   +0x002 Number           : Uint2B
   +0x008 DpcListEntry     : _SINGLE_LIST_ENTRY
   +0x010 ProcessorHistory : Uint8B
   +0x018 DeferredRoutine  : Ptr64     void 
   +0x020 DeferredContext  : Ptr64 Void
   +0x028 SystemArgument1  : Ptr64 Void
   +0x030 SystemArgument2  : Ptr64 Void
   +0x038 DpcData          : Ptr64 Void
Ok so now that we know what offsets correspond to different parts of the KDPC object, we can translate the x64 assembly code into the following C code:
NTKERNELAPI VOID KeInitializeDpc(PRKDPC Dpc, PKDEFERRED_ROUTINE DeferredRoutine, PVOID DeferredContext){
  PRKDPC->TargetInfoAsUlong = 0x113;
  PRKDPC->DpcData = NULL;
  PRKDPC->ProcessorHistory = 0;
  PRKDPC->DeferredRoutine = DeferredRoutine;
  PRKDPC->DeferredContext = DeferredContext;
}
If we check this against the HexRays disassembly we can see this is pretty much the same output:
void __stdcall KeInitializeDpc(PRKDPC Dpc, PKDEFERRED_ROUTINE DeferredRoutine, PVOID DeferredContext)
{
  Dpc->TargetInfoAsUlong = 275;
  Dpc->DpcData = 0i64;
  Dpc->ProcessorHistory = 0i64;
  Dpc->DeferredRoutine = (void (__fastcall *)(_KDPC *, void *, void *, void *))DeferredRoutine;
  Dpc->DeferredContext = DeferredContext;
}
So now that is out of the way, the question remains why TargetInfoAsUlong is specifically set to a value of 0x113. Referring back to the WinDBG dump, we can see that TargetInfoAsUlong is actually a structure that contains a one byte Type field, a one byte Importance field, and a two byte Number field.

Therefore setting the value of TargetInfoAsUlong to 0x113 actually sets Type to 0x13, Importance to 0x1 and Number to 0x0. To get a better idea of the KDPC structure I tried referring to ReactOS, specifically https://doxygen.reactos.org/d9/d82/struct__KDPC.html, however there is no documentation there on any of the fields, just the KDPC structure itself.

So with ReactOS out of the way I ended up turning to Geoff Chappel's website, which has a TON of really cool blog posts on research he has done into the Windows kernel. Specifically he has a brilliant article on KDPC available at https://www.geoffchappell.com/studies/windows/km/ntoskrnl/api/ke/dpcobj/kdpc.htm. From here we can learn that the value of the Importance field is actually MediumImportance, which is the default value for a KDPC object unless it is changed via KeSetImportanceDpc.

This is kind of ironic as if we look at the code for KeSetImportanceDpc we will see what has to be the world's simplest kernel function:
public KeSetImportanceDpc
KeSetImportanceDpc proc near
mov     [rcx+1], dl
retn
KeSetImportanceDpc endp
Which makes sense as this is essentially just taking in a a KDPC object along with a byte value as arguments, and is then setting the Importance flag of the KDPC object to the byte value specified.

If we now turn our attention to the Type field we can see that Geoff Chappel notes that this field is DpcObject for a normal DPC or ThreadedDpcObject for a threaded DPC. Looking up DpcObject brings us to https://www.geoffchappell.com/studies/windows/km/ntoskrnl/structs/kobjects.htm which shows that on Windows kernel versions later than 3.51 (aka any modern OS), the value for DpcObject is 0x13.

For those interested, according to Geoff Chappel ThreadedDpcObject is apparently 0x1A on NT 6.3 and later (aka Windows 8.1 and later), however its value from NT 5.2 to 6.2 (Windows XP to Windows 8) was 0x18). This can be confirmed if I decompile the function KeInitializeThreadedDpc on a Windows 10 build:
KeInitializeThreadedDpc proc near
xor     eax, eax
mov     dword ptr [rcx], 11Ah
mov     [rcx+38h], rax
mov     [rcx+10h], rax
mov     [rcx+18h], rdx
mov     [rcx+20h], r8
retn
KeInitializeThreadedDpc endp
Again this is pretty much the same code as before, the only difference is that this time the Type value is being set 0x1A, or ThreadedDpcObject. Otherwise the code is exactly the same as KeInitializeDpc.

Finally Geoff Chappel notes that the Number field denotes the importance or the processor on which this DPC will be run. I assume given that this is set to 0 that the idea is its either being either set to its initial priority or its being set to run on processor 0. With this knowledge we can create a better decompiled version of both KeInitializeDpc and KeInitializeThreadedDpc:
void __stdcall KeInitializeDpc(PRKDPC Dpc, PKDEFERRED_ROUTINE DeferredRoutine, PVOID DeferredContext)
{
  PRKDPC->Type = DpcObject;
  PRKDPC->Importance = MediumImportance;
  PRKDPC->Number = 0;
  PRKDPC->DpcData = NULL;
  PRKDPC->ProcessorHistory = 0;
  PRKDPC->DeferredRoutine = DeferredRoutine;
  PRKDPC->DeferredContext = DeferredContext;
}

void __stdcall KeInitializeThreadedDpc(PRKDPC Dpc, PKDEFERRED_ROUTINE DeferredRoutine, PVOID DeferredContext)
{
  PRKDPC->Type = ThreadedDpcObject;
  PRKDPC->Importance = MediumImportance;
  PRKDPC->Number = 0;
  PRKDPC->DpcData = NULL;
  PRKDPC->ProcessorHistory = 0;
  PRKDPC->DeferredRoutine = DeferredRoutine;
  PRKDPC->DeferredContext = DeferredContext;
}
Hope you enjoyed this walkthrough. If you have any questions, comments, or otherwise feel free to drop them down below. I'm still very new to Windows kernel reversing so its quite possible I've stuffed something up, though I'm open to suggestions and improvements so please do send them my way.

More kernel function reversing entries will be coming soon, although they may be separate entries as some of these are a bit longer than I originally expected with the extra background research I'm doing for them.

Thursday, December 6, 2018

Practical Reverse Engineering - Chapter 1 pg 35 - Exercise #4

Question: Implement the following functions in x86 assembly: strlen, strchr, memcpy, memset, strcmp, strset.

strlen
push ebp
mov ebp, esp
push edi; Save nonvolatile register.
push esi; Save nonvolatile register.
mov edi, [ebp+0x8]; Set ESI to the first argument passed in 
                  ; aka the pointer to the string.
mov esi, edi; Save original starting point of the string into ESI.
xor al, al; Set AL or the byte being searched for, 
          ; to 0, or a NULL byte.
repne scasb; Keep incrementing EDI and comparing it to see 
           ; if it is a NULL byte. Stop loop when a NULL 
           ; byte is hit.
sub edi, esi; EDI will contain the address of a NULL byte. Subtract
            ; this address by ESI or the address of the start of the
            ; string to find the number of bytes read aka the string 
            ; length and save this number in EDI.
mov eax, edi; Return EDI or the string length.
pop esi; Restore original ESI value.
pop edi; Restore original EDI value.
mov esp, ebp; Restore ESP value
pop ebp; Restore frame pointer value
ret; Return

strchr
push ebp
mov ebp, esp; Save ESP value for later restoration. Part of function prologue.
push ebx ; EBX is not a volatile register, so save it onto 
         ; the stack for later restoration. EAX, ECX, and 
         ; EDX are volatile (https://en.wikibooks.org/wiki/X86_Assembly/High-Level_Languages#STDCALL)
mov edx, [ebp+0xC] ; Set EDX to the value of the second 
                   ; argument aka int character
mov ebx, [ebp+0x8] ; Set EBX to the value of the first
                   ; argument aka char * str
check_null:
    mov ecx, byte ptr [ebx]; Perform a memory read to get the value
                           ; of the current byte in the string.
                           ; Doing this at the top of the loop saves
                           ; us having to do extra memory reads.
    cmp ecx, 0 ; Check that we haven't reached the end of the 
               ; string aka the NULL byte.
    mov  eax, 0 ; Prepare to return 0, aka a 
                ; NULL pointer, as the result.
    jz finish ; Jump to finish if the current byte in the string 
              ; is NULL, aka we hit the end of the string.
check_equal:
    cmp ecx, edx ; Check if current character in string
                 ; is the one being searched for.
    mov eax, ebx ; Set EAX to the location where the character was
                 ; found to set return value correctly.
    jz finish ; Jump to finish if the current byte in the
              ; string is the one being searched for.
loop_jump:
    inc ebx ; Increment EBX to point to the 
            ; next character in the string
    jmp check_null ; Jump back to the top of the loop to
                   ; continue searching through the string.
finish:
    pop ebx ; Restore EBX back to its original value.
    mov esp, ebp ; Restore ESP back to its original value.
    pop ebp ; Set EBP or the frame pointer, back to its previous value.
    ret ; Return

memcpy
push ebp
mov ebp, esp ; Function prologue
push esi ; Save original ESI value, nonvolatile register.
push edi ; Save original EDI value, nonvolatile register.
mov ecx, [ebp+0x10] ; Set ECX to size or the number of bytes to copy over.
mov esi, [ebp+0xC] ; Set ESI to the value of the source argument.
mov edi, [ebp+0x8] ; Set EDI to the value of the destination argument.
cld ; Thanks to zerosum0x0 for pointing out in his articles that 
    ; this needs to be done. Otherwise we can't be sure that EDI 
    ; and ESI will be incremented when doing movsb operation.
rep movsb ; Move ECX number of bytes from ESI, or source to 
          ; EDI, or destination, incrementing EDI and ESI
          ; with each byte read.
mov eax, [ebp+0x8] ; Set EAX, or the return value, to destination.
pop edi ; Restore original EDI value.
pop esi ; Restore original ESI value.
mov esp, ebp
pop ebp ; Function epilogue
ret

memset
push ebp
mov ebp, esp ; Function prologue
push edi ; Save value of EDI register, nonvolatile register.
mov ecx, [ebp+0x10] ; Set ECX to the argument num 
                    ; or number of bytes to set.
mov eax, [ebp+0xC] ; Set EAX to the argument value or 
                   ; the value to set the bytes to.
mov edi, [ebp+0x8] ; Set EDI to the argument ptr or the 
                   ; address of the block of memory to fill.
cld ; Set the direction flag to 0 to ensure EDI is incremented
    ; every time the stosb instruction is executed.
rep stosb ; Fill ECX bytes of memory with the value in EAX,
          ; starting at the address in EDI. EDI will be 
          ; incremented each time the stosb instruction is executed.
mov eax, [ebp+0x8] ; Set the return value to the start of the
                   ; block of memory that was filled out.
pop edi ; Set EDI back to its original value.
mov esp ebp
pop ebp ; Function epilogue
ret;

strset
push ebp ; Save previous frame pointer onto the stack.
mov ebp, esp ; Set EBP to point to the current frame pointer.
mov eax, [ebp+0xC] ; Set EAX to the argument int c
mov ecx, [ebp+0x8]; Set ECX to the argument char * str
loop_start:
   cmp byte ptr [ecx], 0 ; Check if the current byte of the 
                         ; string being processed is NULL.
   jz finish ; Jump to the finish if it is.
   mov byte ptr [ecx], al ; Set the current byte in the string
                          ; to the value of the argument c.
   inc ecx ; Set ECX to point to the next byte in the string.
   jmp loop_start ; Jump back to the start of the loop.
finish:
  mov eax, [ebp+8] ; Set the return value to the address of 
                   ; the first byte of the string that was set.
  mov esp, ebp ; Restore previous ESP value
  pop ebp ; Restore previous stack frame value.
  ret

strcmp
push ebp
mov ebp, esp ; Function prologue code
push edi ; Save EDI value, nonvolatile register.
push esi ; Save ESI value, nonvolatile register.
mov edx, [ebp+0xC] ; Set EDX to str2
mov ecx, [ebp+0x8] ; Set ECX to str1
dec edx ; Subtract EDX by 1 since the later loop_start code will be 
        ; incrementing it by 1 for each iteration of the loop.
dec ecx ; Subtract ECX by 1 since the later loop_start code will be 
        ; incrementing it by 1 for each iteration of the loop.
loop_start:
    inc edx ; Increment EDX to point to the next byte in the str2 string.
    inc ecx ; Increment ECX to point to the next byte in the str1 string.
    mov edi, byte ptr [edx] ; Set EDI to current byte in str2 being processed.
    mov esi, byte ptr [ecx] ; Set ECX to current byte in str1 being processed.
    cmp esi, edi ; Compare current byte in str1 to current byte in str2.
    jl lower_str1 ; If the value of the current byte being processed 
                  ; in str1 is lower than the value of the current 
                  ; byte being processed in str2, then jump to lower_str1.
    jg greater_str1 ; If the value of the current byte being processed 
                  ; in str1 is greater than the value of the current 
                  ; byte being processed in str2, then jump to greater_str1.
    jmp compare_null ; If the current byte being processed in str1 is 
                     ; equal to the current byte being processed in str2,
                     ; then jump to compare_null to see if we the byte 
                     ; that was compared is a NULL byte aka we have 
                     ; reached the end of both strings.
lower_str1:
    mov eax, -1 ; If the first character that does not match has a lower
                ; value in str1 than str2, return a value < 0. The value
                ; -1 will work for our purposes.
    jmp finish ; Jump to cleanup code.
greater_str1:
    mov eax, 1 ; If the first character that does not match has a 
               ; greater value in str1 than str2, return a value
               ; > 0. The value 1 will work for our purposes.
    jmp finish ; Jump to cleanup code.
compare_null:
    test edi, edi ; Check that the bytes being currently being processed 
                  ; in str1 and str2, which have been determined to be 
                  ; the same as one another, are not NULL bytes.
    jnz loop_start ; If the bytes that were matched are not NULL 
                   ; bytes, continue the processing loop.
    mov eax, 0 ; Otherwise if they are both NULL bytes, set the
               ; return value to 0 to indicate the two strings 
               ; are the same as one another.
finish:
    pop esi ; Restore the original value of ESI.
    pop edi ; Restore the original value of EDI.
    mov esp, ebp
    pop ebp ; Function epilogue code
    ret

Wednesday, November 21, 2018

Practical Reverse Engineering - Chapter 1 pg 35 - Exercise #3

Question: In some of the assembly listings, the function name has a @ prefix followed by a number. Explain when and why this decoration exists.


So first off I should clarify that technically the @ symbol isn't a prefix as described in the book but is rather a suffix which is part of the function's name. With that out of the way, I did some digging into the way this is used. An example of a function within the assembly listings referenced by this question was _DllMain@12. According to https://en.wikipedia.org/wiki/Name_mangling, this is a common way to perform name mangling on functions (all the following information is taken from this page so please refer to it if you wish to gain a more complete understanding. I'm just simplifying what they say and adding some examples).

So what exactly is name mangling? Simply put, when compiling a program the linker needs to know what functions are being used or declared by a program, their calling convention, and the size and types of the respective parameters. Additionally if any functions are overloaded (aka multiple declarations of the same function but with different signatures (ex making two declarations such as int aFunc(b) and int aFunc(b,c). Both have the same function name but different signatures as the expected parameters are different), or two functions with the same name but different namespaces (internalFunc::theFunc(a,b) and externalFunc::theFunc(a,b) would be an example), then mangling needs to occur.

In C the convention is as follows:
  • If the calling convention is CDECL, append _ before the function name.
  • If the calling convention is STDCALL, append _ before the function name. Then append @ after the function name, followed by the total number of bytes that all the parameters take up.
  • If the calling convention is FASTCALL, then add @ before the function name. Then append @ after the function name, followed by the total number of bytes that all the parameters take up.
If you think about it this makes sense though. In CDECL, the caller is responsible for cleaning up the stack, so there is no need for the linker to really care about how many bytes are used in the callee, this is handled by the caller. In the case of STDCALL and FASTCALL though, the linker needs to be aware of how many bytes these functions use for their arguments so that it can generate code that appropriately cleans up the stack for these functions.

So from this we can break down the definition of _DllMain@12. Since it starts with _ and ends with @ followed by a number, it is a STDCALL function. Additionally based on the fact that it follows the conventions mentioned above we can also conclude it is a C function and not a C++ function (which will we show shortly). The DllMain part tells the linker that the function name is DllMain whist the @12 part tells the linker to reserve 12 bytes on the stack for the arguments. This makes sense since DllMain has the definition DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved), which means each argument takes 4 bytes each. 4 * 3 arguments gives us the 12 bytes mentioned in the mangled function definition.

Interestingly according to the Wikipedia page mentioned earlier, unlike C there appears to be no clear mangling method for C++ functions. For example the Borland compiler has one way of mangling function names whilst the Microsoft MSVC compiler has a very different way of mangling names (refer to https://en.wikipedia.org/wiki/Name_mangling for some examples).

Perhaps the best tutorial I've seen on actually demangling C++ names by hand was this tutorial though: http://www.int0x80.gr/papers/name_mangling.pdf. Worth a read if you have the time.

Also interesting note that was mentioned in the http://www.int0x80.gr/papers/name_mangling.pdf paper but if your using IDA you can change the way IDA displays mangled symbols by choosing Options->Demangled Names.... and then clicking on the option buttons to either set the demangled name as a comment, not demangle names at all, or change the function name to the demangled version. Should help if you want to clean things up a bit in your view.

Tuesday, November 20, 2018

Practical Reverse Engineering - Chapter 1 pg 35 - Exercise #2

Question: In the example walkthrough, we did a nearly one-to-one translation of the assembly code to C. As an exercise, re-decompile this whole function so that it looks more natural. What can you say about the developer's skill level/experience? Explain your reasons. Can you do a better job?

My version of the original code after decompilation by hand into Visual Studio:
#include "stdafx.h"
#include <intrin .h="">
#include <tlhelp32 .h="">

typedef struct _IDTR {
    DWORD base;
    SHORT limit;
} IDTR, *PIDTR;

BOOL _stdcall DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved) {
    IDTR idtr;
    tagPROCESSENTRY32 processEntry;
    HANDLE snapshot;
    __sidt(&idtr);
    if ((idtr.base > 0x8003F400) && (idtr.base < 0x80047400)) {
        return FALSE;
    }
    memset(&processEntry, 0, 0x128);
    snapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
    if (snapshot == INVALID_HANDLE_VALUE) {
        return FALSE;
    }
    processEntry.dwSize = 0x128;
    if (Process32First(snapshot, &processEntry) == 0) {
        return FALSE;
    }
    while (strcmp(processEntry.szExeFile, "explorer.exe") == 0) {
        Process32Next(snapshot, &processEntry);
    }
    if (processEntry.th32ParentProcessID == processEntry.th32ProcessID) {
        return FALSE;
    }
    if (fdwReason == DLL_PROCESS_ATTACH) {
        CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)0x100032D0, NULL, 0, NULL);
    }
    return TRUE;
}
Based off of the number of checks that are occurring here I would say the developer seems to be aware of the limitations of some Windows functions and what needs to be checked before continuing. Therefore I would say that they have a good level of experience with the Win32 API. My guess based on https://www.slideshare.net/MattiaSalvi2/virtual-machines-security-internals-detection-and-exploitation is that the attackers were trying to identify if the machine is potentially being virtualized so that they could prevent exploitation if such scenarios where encountered, however as described in the book this is a really bad way of doing things as it assumes that the code is running on core 0.

Since each core/processor has its own individual IDT table, if this code was run on a secondary core or processor (aka not processor 0/core 0) then the IDT register value would not be the same as the values the attacker is expecting in this code. Similarly later versions of Windows also change the IDT base address across boots so this adds even less reliability to the technique utilized by the malware. These two points are noted on page 31 where the authors state "Clearly 0x8003F400 is only valid for core 0 on Windows XP. If the instruction were to be scheduled to run on another core, the IDTR would be 0xBAB3C590. On later versions of Windows, the IDT base address changes between reboots; hence the practice of hardcoding base addresses will not work".

As for if one could improve this, I did some research into things and found this paper: https://www.slideshare.net/MattiaSalvi2/virtual-machines-security-internals-detection-and-exploitation by Matta Salvi which mentions something quite similar to what the code in this example was trying to achieve (ironically though VM detection code actually performs the opposite of what the code in the malware is doing. My guess is some old VM software versions may have been acting as virtual Windows XP machines back in the day, hence why malware may try check the IDT base address) as well as a project called Scoopy.

Looking this up links to ScoopyNG located at http://www.trapkit.de/tools/scoopyng/ which is a project containing a number of Anti-Virtualization tricks that can be used to detect virtualization. Testing these out most of the techniques didn't work anymore (VMWare and VirtualBox have both gotten smarter about detecting Anti-VM techniques, or at least the well known ones), however there were a few VMWare specific techniques that still worked on a recent version of VMWare Workstation when I tested it out. Keep in mind ScoopyNG project is from 2008 though so I'm sure there are a lot better techniques and mitigations out there today (10 years does change a lot of things technology wise).

On the results of testing ScoopyNG on recent versions of VMWare and VirtualBox, I'd imagine this code could be improved to not only use more specific techniques to determine if they are in a virtual machine (and hence stop exploitation if so), but also something like GetVersionEx to help determine what version of the operating system they are running on if they wanted to target a specific range of operating systems (provided they were targeting OS versions prior to Windows 8.1). This would also get around the limitation of the IDT technique used as regardless of which core the GetVersionEx function is run on, valid OS version information should still be returned.

On a side note though I'd be interested to know if anyone has a better method for detecting VMs that still works these days. The RedPill method that Joanna Rutkowska (http://www.securiteam.com/securityreviews/6Z00H20BQS.html) published has long since been mitigated and most of the other techniques seem to be specific to the VM software being utilized. Therefore I'm curious if VM detection strategies nowadays have migrated to performing platform specific attacks (like the checks that SnoopyNG did to successfully detect it was running in VMWare Workstation during my tests) or if there are still ways to do things generically (all the generic test in VirtualBox failed and SnoopyNG didn't have any VirtualBox specific tests so it thought it was running in a native machine).

Monday, November 12, 2018

Practical Reverse Engineering - Chapter 1 pg 25 - Author's Challenge Solution

Question: Take the example shown on page 24 and decompile it further to make it look more "natural"


Solution:

char * sub_1000AE3B(char * aStr){
    signed int length = lstrlenA(aStr);
    signed int counter1 = 0;
    signed int counter2 = 0;
    if(length != 0){
        while(counter2 < length){
            aStr[counter1] = aStr[counter2];
            counter2 += 3;
            counter1 += 1;
        }
    }
    aStr[counter1] = '\x00';
    return aStr;
}

Additional Notes

The disassembly on page 25 of the edition I am reading is incorrect. In the example given the authors provide an example whose assembly is as follows:
01 mov ecx, edx
02 loc_CFB8F:
03     lodsd
04     not eax
05     stosd
06     loop loc_CFB8F
They then state that this is the corresponding disassembly to C:
while (ecx != 0) {
     eax = *edi
     edi++;
     *esi = ~eax;
      esi++;
      ecx--;
}
If one refers back to their earlier remarks about how lodsd works, one will see that this command actually loads into EAX the DWORD at the location pointed to by ESI (refer to http://faydoc.tripod.com/cpu/lodsd.htm or the x86 instruction manual by Intel if you need further confirmation). Similarly, stosd actually stores the value at EAX into EDI, not ESI as shown above (see http://faydoc.tripod.com/cpu/stosd.htm).

Therefore the rough C code should actually be:
while (ecx != 0) {
     eax = *esi
     esi++;
     *edi = ~eax;
      edi++;
      ecx--;
}
Aka a simple switch of the ESI and EDI registers in the example C code provided.

Thursday, November 9, 2017

Practical Reverse Engineering - Chapter 1 pg 17 Exercise

Question 1


Question: Given what you learned about CALL and RET, explain how you would read the value of EIP? Why can't you just do MOV EAX, EIP?

Answer: The easiest way I could think of to read the current value of EIP was to do the following:

CALL *instruction after this call instruction* <- 5 Bytes
POP EAX <- 1 Byte (the instruction that we will point to)
ADD EAX, 4 <- 3 bytes

Essentially we will do a CALL to the POP EAX instruction, at which point ESP will point to the address where "POP EAX" is located in memory (aka the return address for the CALL instruction). POP EAX will ensure EAX gets set to this value. Adding 4 to EAX will ensure we add 1 to this value to compensate for the "POP EAX" instruction, then we add another 3 for the bytes that formulate the "ADD EAX, 4" instruction.

And there we go, we have EIP :)

To answer the other question we can refer to page 82 of https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf or section 3.5 of volume 1 of the Intel Software Developer's Manual. On this page they mention that EIP can only be controlled by control transfer instructions like CALL, JMP, etc. It cannot be directly accessed by software in x86.

Question 2


Question: Come up with at least two code sequences to set EIP to 0xAABBCCDD.

Answer:

Option 1
PUSH 0xAABBCCDD ; PUSH value onto the stack
RETN ; POP the value at the top of the stack into EIP and resume execution from this location.

Option 2
JMP 0xAABBCCDD ; Just straight up redirect execution. Why not make it simple after all?

Option 3
CALL 0xAABBCCDD ; Another option :)

Option 4 - Aka the fancier conditional option
XOR EAX, EAX ; Set EAX to zero
TEST EAX, EAX ; Check if EAX is 0
JZ 0xAABBCCDD ; Jump to 0xAABBCCDD if so.

Question 3


Question: In the example function, addme, what would happen if the stack pointer were not properly restored before executing RET?

Answer: Well considering the stack pointer would be the part where we do the PUSH EBP instruction, we would return execution to EBP, which means we would return execution to the base pointer of the function that called the addme() function. Since this base pointer will be an address on the stack, this will result in the program trying to execute data off of the stack itself as though it was code. What happens from here depends on the data that is on the stack, but most likely some invalid instructions would be executed and the program would crash. This is assuming the stack is executable as on modern systems the more likely scenario is that DEP would kick in and you would get an ACCESS VIOLATION as the stack would be marked as non-executable.

Question 4


Question: In all of the calling conventions explained the return value is stored in a 32-bit register (EAX). What happens when the return value does not fit in a 32-bit register? Write a program to experiment and evaluate your answer. Does the mechanism change from compiler to compiler?

Answer: So the answer to this, according to my testing it depends how the program is written. For a test I wrote the following program:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
 long long num = 0x8888888822223344;
 return num;
}
The output of this when compiled with Dev C++ 5.11 with the TDM-GCC 4.9.2 32 Bit Release option and then decompiled with IDA Pro was as follows:
; int __cdecl main(int argc, const char **argv, const char **envp)
public _main
_main proc near

argc= dword ptr 8
argv= dword ptr 0Ch
envp= dword ptr 10h

push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 10h
call ___main
mov dword ptr [esp+8], 22223344h
mov dword ptr [esp+0Ch], 88888888h
mov eax, [esp+8]
leave
retn
_main endp
As can be seen in this solution, GCC has attempted to try and save a 64 bit number by using the stack to store the upper 32 bits of the number at [ESP+0xC] and the lower 32 bits of the number at [ESP+0x8]. However it only returns the lower 32 bits of the number as EAX is set to the value of [ESP+0x8], which will just contain the lower 32 bits of 0x8888888822223344, aka 0x22223344.

Now for the interesting bit. So what if one sets the return value directly then? Well we get the following compilation warning:
[Warning] overflow in implicit constant conversion [-Woverflow]
Here is the program:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
    return 0x8888888822223344;
}
And here is the corresponding assembly:
; int __cdecl main(int argc, const char **argv, const char **envp)
public _main
_main proc near

argc= dword ptr  8
argv= dword ptr  0Ch
envp= dword ptr  10h

push    ebp
mov     ebp, esp
and     esp, 0FFFFFFF0h
call    ___main
mov     eax, 22223344h
leave
retn
_main endp
Looks like our number got truncated again, but unlike last time no attempt was made to actually try to save the upper 32 bits of the number specified as the return address anywhere. The compiler instead decided to optimize things and just drop the upper 32 bits entirely, thereby only utilizing the lower 32 bits, aka 0x22223344, in its operations.

Finally, lets try Visual Studio 2017 Community. Upon compilation of the first example we get the following warning:
c:\users\*redacted*\consoleapplication2\consoleapplication2.cpp(10): warning C4244: 'return': conversion from '__int64' to 'int', possible loss of data
Looks to be a fairly accurate description of what might happen here :) But lets just check the disassembly to be sure:
; int __cdecl main_0(int argc, const char **argv, const char **envp)
_main_0 proc near

var_D0= byte ptr -0D0h
var_C= dword ptr -0Ch
var_8= dword ptr -8
argc= dword ptr  8
argv= dword ptr  0Ch
envp= dword ptr  10h

push    ebp
mov     ebp, esp
sub     esp, 0D0h
push    ebx
push    esi
push    edi
lea     edi, [ebp+var_D0]
mov     ecx, 34h
mov     eax, 0CCCCCCCCh
rep stosd
mov     [ebp+var_C], 22223344h
mov     [ebp+var_8], 88888888h
mov     eax, [ebp+var_C]
pop     edi
pop     esi
pop     ebx
mov     esp, ebp
pop     ebp
retn
_main_0 endp
Yep looks looks like the error message was correct, the return value is being truncated to just its first 32 bits before being returned to the caller.

Practical Reverse Engineering - Chapter 1 pg 11 Exercise 1

Problem

"This function uses a combination of SCAS and STOS to do its work. First, explain what is the type of the [EBP+8] and [EBP+C] in line 1 and 8 respectively. Next explain what this snippet does."

Disassembly:

1   mov edi, [ebp+8]
2   mov edx, edi
3   xor eax, eax
4   or ecx, 0FFFFFFFFh
5   repne scasb
6   add ecx, 2
7   neg ecx
8   mov al, [ebp+0xC]
9   mov edi, edx
10  rep stosb
11  mov eax, edx

Explanation:

Line 1: We set EDI to be the value at EBP+8

Line 2: We save the value of EBP+8 to EDX.

Line 3: Clear out EAX, aka set EAX to 0.

Line 4: Set ECX to 0xFFFFFFFF or -1 by OR'ing all of 
        its bytes with 0xFFFFFFFF. As 0x1 OR 0x1 = 1, 
        and 0x1 OR 0x1 = 1, all bits will be set, 
        making it -1.

Line 5: Read value held in EDI, aka the value pointed
        to by EBP+8, and check if it is the same as EAX, 
        or 0. When we look at the REPNE instruction, we
        can see this is essentially a strlen operation 
        as we will keep repeating this operation and 
        decrementing ECX by 1 each time until we hit 
        a NULL byte terminator aka 0.

Line 6: Add 2 to ECX. If we performed the STOSB 
        instruction 8 times, the value in ECX would 
        be -9. Adding 2 to this value will make it -7, 
        aka the negative equivalent of the number of 
        characters in the string before the null byte.

Line 7: Use the NEG ECX instruction to do a 2's compliment
        operation on ECX and in effect flip the sign bit 
        of ECX, transforming it from a negative number to a
        positive one. In our example, ECX would go from -7 to 7.

Line 8: Move byte held at EBP+0xC into AL.

Line 9: Set EDI back to the address pointed to by EBP+8 
        using EDX, where we had saved the original EBP+8 value.

Line 10: For ECX times, aka the number of characters in the 
         string as determined earlier, write the byte 
         contained at EBP+0xC, aka AL, into the string 
         array pointed to by EBP+8, aka EDI.

Line 11: Move EDX, aka the start of the string buffer
         or EBP+8, into EAX so we can return it to 
         the calling function.
So in essence this could be simplified down to the following in C:
int len = strlen(*(EBP+8));
memset(*(EBP+8), (BYTE *)(EBP+0xC), len);
We also can answer the other question as we now know that [EBP+0xC] is a pointer to a byte value to use for the memset operation, and [EBP+8] is a pointer to a NULL terminated string which we want to memset to the value pointed to by [EBP+0xC].

Hope that helps!

-tekwizz123