Wednesday, November 21, 2018

Practical Reverse Engineering - Chapter 1 pg 35 - Exercise #3

Question: In some of the assembly listings, the function name has a @ prefix followed by a number. Explain when and why this decoration exists.


So first off I should clarify that technically the @ symbol isn't a prefix as described in the book but is rather a suffix which is part of the function's name. With that out of the way, I did some digging into the way this is used. An example of a function within the assembly listings referenced by this question was _DllMain@12. According to https://en.wikipedia.org/wiki/Name_mangling, this is a common way to perform name mangling on functions (all the following information is taken from this page so please refer to it if you wish to gain a more complete understanding. I'm just simplifying what they say and adding some examples).

So what exactly is name mangling? Simply put, when compiling a program the linker needs to know what functions are being used or declared by a program, their calling convention, and the size and types of the respective parameters. Additionally if any functions are overloaded (aka multiple declarations of the same function but with different signatures (ex making two declarations such as int aFunc(b) and int aFunc(b,c). Both have the same function name but different signatures as the expected parameters are different), or two functions with the same name but different namespaces (internalFunc::theFunc(a,b) and externalFunc::theFunc(a,b) would be an example), then mangling needs to occur.

In C the convention is as follows:
  • If the calling convention is CDECL, append _ before the function name.
  • If the calling convention is STDCALL, append _ before the function name. Then append @ after the function name, followed by the total number of bytes that all the parameters take up.
  • If the calling convention is FASTCALL, then add @ before the function name. Then append @ after the function name, followed by the total number of bytes that all the parameters take up.
If you think about it this makes sense though. In CDECL, the caller is responsible for cleaning up the stack, so there is no need for the linker to really care about how many bytes are used in the callee, this is handled by the caller. In the case of STDCALL and FASTCALL though, the linker needs to be aware of how many bytes these functions use for their arguments so that it can generate code that appropriately cleans up the stack for these functions.

So from this we can break down the definition of _DllMain@12. Since it starts with _ and ends with @ followed by a number, it is a STDCALL function. Additionally based on the fact that it follows the conventions mentioned above we can also conclude it is a C function and not a C++ function (which will we show shortly). The DllMain part tells the linker that the function name is DllMain whist the @12 part tells the linker to reserve 12 bytes on the stack for the arguments. This makes sense since DllMain has the definition DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved), which means each argument takes 4 bytes each. 4 * 3 arguments gives us the 12 bytes mentioned in the mangled function definition.

Interestingly according to the Wikipedia page mentioned earlier, unlike C there appears to be no clear mangling method for C++ functions. For example the Borland compiler has one way of mangling function names whilst the Microsoft MSVC compiler has a very different way of mangling names (refer to https://en.wikipedia.org/wiki/Name_mangling for some examples).

Perhaps the best tutorial I've seen on actually demangling C++ names by hand was this tutorial though: http://www.int0x80.gr/papers/name_mangling.pdf. Worth a read if you have the time.

Also interesting note that was mentioned in the http://www.int0x80.gr/papers/name_mangling.pdf paper but if your using IDA you can change the way IDA displays mangled symbols by choosing Options->Demangled Names.... and then clicking on the option buttons to either set the demangled name as a comment, not demangle names at all, or change the function name to the demangled version. Should help if you want to clean things up a bit in your view.

Tuesday, November 20, 2018

Practical Reverse Engineering - Chapter 1 pg 35 - Exercise #2

Question: In the example walkthrough, we did a nearly one-to-one translation of the assembly code to C. As an exercise, re-decompile this whole function so that it looks more natural. What can you say about the developer's skill level/experience? Explain your reasons. Can you do a better job?

My version of the original code after decompilation by hand into Visual Studio:
#include "stdafx.h"
#include <intrin .h="">
#include <tlhelp32 .h="">

typedef struct _IDTR {
    DWORD base;
    SHORT limit;
} IDTR, *PIDTR;

BOOL _stdcall DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved) {
    IDTR idtr;
    tagPROCESSENTRY32 processEntry;
    HANDLE snapshot;
    __sidt(&idtr);
    if ((idtr.base > 0x8003F400) && (idtr.base < 0x80047400)) {
        return FALSE;
    }
    memset(&processEntry, 0, 0x128);
    snapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
    if (snapshot == INVALID_HANDLE_VALUE) {
        return FALSE;
    }
    processEntry.dwSize = 0x128;
    if (Process32First(snapshot, &processEntry) == 0) {
        return FALSE;
    }
    while (strcmp(processEntry.szExeFile, "explorer.exe") == 0) {
        Process32Next(snapshot, &processEntry);
    }
    if (processEntry.th32ParentProcessID == processEntry.th32ProcessID) {
        return FALSE;
    }
    if (fdwReason == DLL_PROCESS_ATTACH) {
        CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)0x100032D0, NULL, 0, NULL);
    }
    return TRUE;
}
Based off of the number of checks that are occurring here I would say the developer seems to be aware of the limitations of some Windows functions and what needs to be checked before continuing. Therefore I would say that they have a good level of experience with the Win32 API. My guess based on https://www.slideshare.net/MattiaSalvi2/virtual-machines-security-internals-detection-and-exploitation is that the attackers were trying to identify if the machine is potentially being virtualized so that they could prevent exploitation if such scenarios where encountered, however as described in the book this is a really bad way of doing things as it assumes that the code is running on core 0.

Since each core/processor has its own individual IDT table, if this code was run on a secondary core or processor (aka not processor 0/core 0) then the IDT register value would not be the same as the values the attacker is expecting in this code. Similarly later versions of Windows also change the IDT base address across boots so this adds even less reliability to the technique utilized by the malware. These two points are noted on page 31 where the authors state "Clearly 0x8003F400 is only valid for core 0 on Windows XP. If the instruction were to be scheduled to run on another core, the IDTR would be 0xBAB3C590. On later versions of Windows, the IDT base address changes between reboots; hence the practice of hardcoding base addresses will not work".

As for if one could improve this, I did some research into things and found this paper: https://www.slideshare.net/MattiaSalvi2/virtual-machines-security-internals-detection-and-exploitation by Matta Salvi which mentions something quite similar to what the code in this example was trying to achieve (ironically though VM detection code actually performs the opposite of what the code in the malware is doing. My guess is some old VM software versions may have been acting as virtual Windows XP machines back in the day, hence why malware may try check the IDT base address) as well as a project called Scoopy.

Looking this up links to ScoopyNG located at http://www.trapkit.de/tools/scoopyng/ which is a project containing a number of Anti-Virtualization tricks that can be used to detect virtualization. Testing these out most of the techniques didn't work anymore (VMWare and VirtualBox have both gotten smarter about detecting Anti-VM techniques, or at least the well known ones), however there were a few VMWare specific techniques that still worked on a recent version of VMWare Workstation when I tested it out. Keep in mind ScoopyNG project is from 2008 though so I'm sure there are a lot better techniques and mitigations out there today (10 years does change a lot of things technology wise).

On the results of testing ScoopyNG on recent versions of VMWare and VirtualBox, I'd imagine this code could be improved to not only use more specific techniques to determine if they are in a virtual machine (and hence stop exploitation if so), but also something like GetVersionEx to help determine what version of the operating system they are running on if they wanted to target a specific range of operating systems (provided they were targeting OS versions prior to Windows 8.1). This would also get around the limitation of the IDT technique used as regardless of which core the GetVersionEx function is run on, valid OS version information should still be returned.

On a side note though I'd be interested to know if anyone has a better method for detecting VMs that still works these days. The RedPill method that Joanna Rutkowska (http://www.securiteam.com/securityreviews/6Z00H20BQS.html) published has long since been mitigated and most of the other techniques seem to be specific to the VM software being utilized. Therefore I'm curious if VM detection strategies nowadays have migrated to performing platform specific attacks (like the checks that SnoopyNG did to successfully detect it was running in VMWare Workstation during my tests) or if there are still ways to do things generically (all the generic test in VirtualBox failed and SnoopyNG didn't have any VirtualBox specific tests so it thought it was running in a native machine).