Monday, December 10, 2018

Practical Reverse Engineering - Chapter 1 pg 35 - Exercise #5 - KeInitializeDPC and KeInitializeThreadedDpc

Question: Decompile the following kernel routines in Windows:
  • KeInitializeDpc

KeInitializeDpc
So for starters lets see if there is any MSDN documentation on this function. Looking things up we see this link: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/nf-wdm-keinitializedpc
So from this we can tell the function header should be the following:
NTKERNELAPI VOID KeInitializeDpc(
    PRKDPC Dpc,
    PKDEFERRED_ROUTINE DeferredRoutine,
    PVOID DeferredContext
);
The corresponding disassembly of this function can be seen below:
; void __stdcall KeInitializeDpc(PRKDPC Dpc, PKDEFERRED_ROUTINE DeferredRoutine, PVOID DeferredContext)
public KeInitializeDpc
KeInitializeDpc proc near
xor     eax, eax
mov     dword ptr [rcx], 113h
mov     [rcx+38h], rax
mov     [rcx+10h], rax
mov     [rcx+18h], rdx
mov     [rcx+20h], r8
retn
KeInitializeDpc endp
Looking at the definition of this function, we immediately notice that RCX appears to be some sort of structure that is getting filled out. According to x64 documentation, the first argument is passed in RCX, the second in RDX, the third in R8 and the fourth in R9. This would mean that RCX corresponds to PRKDPC Dpc.

But what is PRKDPC? If we refer back again to https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/nf-wdm-keinitializedpc, we can see this is a pointer to a KDPC object. Using the magic of WinDBG, we can dump the outline of this structure, even though its supposed to be an opaque structure (aka "we may change this structure so don't rely on the details of it remaining the same"), using the dt command. Specifically, dt nt!_KDPC will give us the details we need:
0:010> dt nt!_KDPC
ntdll!_KDPC
   +0x000 TargetInfoAsUlong : Uint4B
   +0x000 Type             : UChar
   +0x001 Importance       : UChar
   +0x002 Number           : Uint2B
   +0x008 DpcListEntry     : _SINGLE_LIST_ENTRY
   +0x010 ProcessorHistory : Uint8B
   +0x018 DeferredRoutine  : Ptr64     void 
   +0x020 DeferredContext  : Ptr64 Void
   +0x028 SystemArgument1  : Ptr64 Void
   +0x030 SystemArgument2  : Ptr64 Void
   +0x038 DpcData          : Ptr64 Void
Ok so now that we know what offsets correspond to different parts of the KDPC object, we can translate the x64 assembly code into the following C code:
NTKERNELAPI VOID KeInitializeDpc(PRKDPC Dpc, PKDEFERRED_ROUTINE DeferredRoutine, PVOID DeferredContext){
  PRKDPC->TargetInfoAsUlong = 0x113;
  PRKDPC->DpcData = NULL;
  PRKDPC->ProcessorHistory = 0;
  PRKDPC->DeferredRoutine = DeferredRoutine;
  PRKDPC->DeferredContext = DeferredContext;
}
If we check this against the HexRays disassembly we can see this is pretty much the same output:
void __stdcall KeInitializeDpc(PRKDPC Dpc, PKDEFERRED_ROUTINE DeferredRoutine, PVOID DeferredContext)
{
  Dpc->TargetInfoAsUlong = 275;
  Dpc->DpcData = 0i64;
  Dpc->ProcessorHistory = 0i64;
  Dpc->DeferredRoutine = (void (__fastcall *)(_KDPC *, void *, void *, void *))DeferredRoutine;
  Dpc->DeferredContext = DeferredContext;
}
So now that is out of the way, the question remains why TargetInfoAsUlong is specifically set to a value of 0x113. Referring back to the WinDBG dump, we can see that TargetInfoAsUlong is actually a structure that contains a one byte Type field, a one byte Importance field, and a two byte Number field.

Therefore setting the value of TargetInfoAsUlong to 0x113 actually sets Type to 0x13, Importance to 0x1 and Number to 0x0. To get a better idea of the KDPC structure I tried referring to ReactOS, specifically https://doxygen.reactos.org/d9/d82/struct__KDPC.html, however there is no documentation there on any of the fields, just the KDPC structure itself.

So with ReactOS out of the way I ended up turning to Geoff Chappel's website, which has a TON of really cool blog posts on research he has done into the Windows kernel. Specifically he has a brilliant article on KDPC available at https://www.geoffchappell.com/studies/windows/km/ntoskrnl/api/ke/dpcobj/kdpc.htm. From here we can learn that the value of the Importance field is actually MediumImportance, which is the default value for a KDPC object unless it is changed via KeSetImportanceDpc.

This is kind of ironic as if we look at the code for KeSetImportanceDpc we will see what has to be the world's simplest kernel function:
public KeSetImportanceDpc
KeSetImportanceDpc proc near
mov     [rcx+1], dl
retn
KeSetImportanceDpc endp
Which makes sense as this is essentially just taking in a a KDPC object along with a byte value as arguments, and is then setting the Importance flag of the KDPC object to the byte value specified.

If we now turn our attention to the Type field we can see that Geoff Chappel notes that this field is DpcObject for a normal DPC or ThreadedDpcObject for a threaded DPC. Looking up DpcObject brings us to https://www.geoffchappell.com/studies/windows/km/ntoskrnl/structs/kobjects.htm which shows that on Windows kernel versions later than 3.51 (aka any modern OS), the value for DpcObject is 0x13.

For those interested, according to Geoff Chappel ThreadedDpcObject is apparently 0x1A on NT 6.3 and later (aka Windows 8.1 and later), however its value from NT 5.2 to 6.2 (Windows XP to Windows 8) was 0x18). This can be confirmed if I decompile the function KeInitializeThreadedDpc on a Windows 10 build:
KeInitializeThreadedDpc proc near
xor     eax, eax
mov     dword ptr [rcx], 11Ah
mov     [rcx+38h], rax
mov     [rcx+10h], rax
mov     [rcx+18h], rdx
mov     [rcx+20h], r8
retn
KeInitializeThreadedDpc endp
Again this is pretty much the same code as before, the only difference is that this time the Type value is being set 0x1A, or ThreadedDpcObject. Otherwise the code is exactly the same as KeInitializeDpc.

Finally Geoff Chappel notes that the Number field denotes the importance or the processor on which this DPC will be run. I assume given that this is set to 0 that the idea is its either being either set to its initial priority or its being set to run on processor 0. With this knowledge we can create a better decompiled version of both KeInitializeDpc and KeInitializeThreadedDpc:
void __stdcall KeInitializeDpc(PRKDPC Dpc, PKDEFERRED_ROUTINE DeferredRoutine, PVOID DeferredContext)
{
  PRKDPC->Type = DpcObject;
  PRKDPC->Importance = MediumImportance;
  PRKDPC->Number = 0;
  PRKDPC->DpcData = NULL;
  PRKDPC->ProcessorHistory = 0;
  PRKDPC->DeferredRoutine = DeferredRoutine;
  PRKDPC->DeferredContext = DeferredContext;
}

void __stdcall KeInitializeThreadedDpc(PRKDPC Dpc, PKDEFERRED_ROUTINE DeferredRoutine, PVOID DeferredContext)
{
  PRKDPC->Type = ThreadedDpcObject;
  PRKDPC->Importance = MediumImportance;
  PRKDPC->Number = 0;
  PRKDPC->DpcData = NULL;
  PRKDPC->ProcessorHistory = 0;
  PRKDPC->DeferredRoutine = DeferredRoutine;
  PRKDPC->DeferredContext = DeferredContext;
}
Hope you enjoyed this walkthrough. If you have any questions, comments, or otherwise feel free to drop them down below. I'm still very new to Windows kernel reversing so its quite possible I've stuffed something up, though I'm open to suggestions and improvements so please do send them my way.

More kernel function reversing entries will be coming soon, although they may be separate entries as some of these are a bit longer than I originally expected with the extra background research I'm doing for them.

Thursday, December 6, 2018

Practical Reverse Engineering - Chapter 1 pg 35 - Exercise #4

Question: Implement the following functions in x86 assembly: strlen, strchr, memcpy, memset, strcmp, strset.

strlen
push ebp
mov ebp, esp
push edi; Save nonvolatile register.
push esi; Save nonvolatile register.
mov edi, [ebp+0x8]; Set ESI to the first argument passed in 
                  ; aka the pointer to the string.
mov esi, edi; Save original starting point of the string into ESI.
xor al, al; Set AL or the byte being searched for, 
          ; to 0, or a NULL byte.
repne scasb; Keep incrementing EDI and comparing it to see 
           ; if it is a NULL byte. Stop loop when a NULL 
           ; byte is hit.
sub edi, esi; EDI will contain the address of a NULL byte. Subtract
            ; this address by ESI or the address of the start of the
            ; string to find the number of bytes read aka the string 
            ; length and save this number in EDI.
mov eax, edi; Return EDI or the string length.
pop esi; Restore original ESI value.
pop edi; Restore original EDI value.
mov esp, ebp; Restore ESP value
pop ebp; Restore frame pointer value
ret; Return

strchr
push ebp
mov ebp, esp; Save ESP value for later restoration. Part of function prologue.
push ebx ; EBX is not a volatile register, so save it onto 
         ; the stack for later restoration. EAX, ECX, and 
         ; EDX are volatile (https://en.wikibooks.org/wiki/X86_Assembly/High-Level_Languages#STDCALL)
mov edx, [ebp+0xC] ; Set EDX to the value of the second 
                   ; argument aka int character
mov ebx, [ebp+0x8] ; Set EBX to the value of the first
                   ; argument aka char * str
check_null:
    mov ecx, byte ptr [ebx]; Perform a memory read to get the value
                           ; of the current byte in the string.
                           ; Doing this at the top of the loop saves
                           ; us having to do extra memory reads.
    cmp ecx, 0 ; Check that we haven't reached the end of the 
               ; string aka the NULL byte.
    mov  eax, 0 ; Prepare to return 0, aka a 
                ; NULL pointer, as the result.
    jz finish ; Jump to finish if the current byte in the string 
              ; is NULL, aka we hit the end of the string.
check_equal:
    cmp ecx, edx ; Check if current character in string
                 ; is the one being searched for.
    mov eax, ebx ; Set EAX to the location where the character was
                 ; found to set return value correctly.
    jz finish ; Jump to finish if the current byte in the
              ; string is the one being searched for.
loop_jump:
    inc ebx ; Increment EBX to point to the 
            ; next character in the string
    jmp check_null ; Jump back to the top of the loop to
                   ; continue searching through the string.
finish:
    pop ebx ; Restore EBX back to its original value.
    mov esp, ebp ; Restore ESP back to its original value.
    pop ebp ; Set EBP or the frame pointer, back to its previous value.
    ret ; Return

memcpy
push ebp
mov ebp, esp ; Function prologue
push esi ; Save original ESI value, nonvolatile register.
push edi ; Save original EDI value, nonvolatile register.
mov ecx, [ebp+0x10] ; Set ECX to size or the number of bytes to copy over.
mov esi, [ebp+0xC] ; Set ESI to the value of the source argument.
mov edi, [ebp+0x8] ; Set EDI to the value of the destination argument.
cld ; Thanks to zerosum0x0 for pointing out in his articles that 
    ; this needs to be done. Otherwise we can't be sure that EDI 
    ; and ESI will be incremented when doing movsb operation.
rep movsb ; Move ECX number of bytes from ESI, or source to 
          ; EDI, or destination, incrementing EDI and ESI
          ; with each byte read.
mov eax, [ebp+0x8] ; Set EAX, or the return value, to destination.
pop edi ; Restore original EDI value.
pop esi ; Restore original ESI value.
mov esp, ebp
pop ebp ; Function epilogue
ret

memset
push ebp
mov ebp, esp ; Function prologue
push edi ; Save value of EDI register, nonvolatile register.
mov ecx, [ebp+0x10] ; Set ECX to the argument num 
                    ; or number of bytes to set.
mov eax, [ebp+0xC] ; Set EAX to the argument value or 
                   ; the value to set the bytes to.
mov edi, [ebp+0x8] ; Set EDI to the argument ptr or the 
                   ; address of the block of memory to fill.
cld ; Set the direction flag to 0 to ensure EDI is incremented
    ; every time the stosb instruction is executed.
rep stosb ; Fill ECX bytes of memory with the value in EAX,
          ; starting at the address in EDI. EDI will be 
          ; incremented each time the stosb instruction is executed.
mov eax, [ebp+0x8] ; Set the return value to the start of the
                   ; block of memory that was filled out.
pop edi ; Set EDI back to its original value.
mov esp ebp
pop ebp ; Function epilogue
ret;

strset
push ebp ; Save previous frame pointer onto the stack.
mov ebp, esp ; Set EBP to point to the current frame pointer.
mov eax, [ebp+0xC] ; Set EAX to the argument int c
mov ecx, [ebp+0x8]; Set ECX to the argument char * str
loop_start:
   cmp byte ptr [ecx], 0 ; Check if the current byte of the 
                         ; string being processed is NULL.
   jz finish ; Jump to the finish if it is.
   mov byte ptr [ecx], al ; Set the current byte in the string
                          ; to the value of the argument c.
   inc ecx ; Set ECX to point to the next byte in the string.
   jmp loop_start ; Jump back to the start of the loop.
finish:
  mov eax, [ebp+8] ; Set the return value to the address of 
                   ; the first byte of the string that was set.
  mov esp, ebp ; Restore previous ESP value
  pop ebp ; Restore previous stack frame value.
  ret

strcmp
push ebp
mov ebp, esp ; Function prologue code
push edi ; Save EDI value, nonvolatile register.
push esi ; Save ESI value, nonvolatile register.
mov edx, [ebp+0xC] ; Set EDX to str2
mov ecx, [ebp+0x8] ; Set ECX to str1
dec edx ; Subtract EDX by 1 since the later loop_start code will be 
        ; incrementing it by 1 for each iteration of the loop.
dec ecx ; Subtract ECX by 1 since the later loop_start code will be 
        ; incrementing it by 1 for each iteration of the loop.
loop_start:
    inc edx ; Increment EDX to point to the next byte in the str2 string.
    inc ecx ; Increment ECX to point to the next byte in the str1 string.
    mov edi, byte ptr [edx] ; Set EDI to current byte in str2 being processed.
    mov esi, byte ptr [ecx] ; Set ECX to current byte in str1 being processed.
    cmp esi, edi ; Compare current byte in str1 to current byte in str2.
    jl lower_str1 ; If the value of the current byte being processed 
                  ; in str1 is lower than the value of the current 
                  ; byte being processed in str2, then jump to lower_str1.
    jg greater_str1 ; If the value of the current byte being processed 
                  ; in str1 is greater than the value of the current 
                  ; byte being processed in str2, then jump to greater_str1.
    jmp compare_null ; If the current byte being processed in str1 is 
                     ; equal to the current byte being processed in str2,
                     ; then jump to compare_null to see if we the byte 
                     ; that was compared is a NULL byte aka we have 
                     ; reached the end of both strings.
lower_str1:
    mov eax, -1 ; If the first character that does not match has a lower
                ; value in str1 than str2, return a value < 0. The value
                ; -1 will work for our purposes.
    jmp finish ; Jump to cleanup code.
greater_str1:
    mov eax, 1 ; If the first character that does not match has a 
               ; greater value in str1 than str2, return a value
               ; > 0. The value 1 will work for our purposes.
    jmp finish ; Jump to cleanup code.
compare_null:
    test edi, edi ; Check that the bytes being currently being processed 
                  ; in str1 and str2, which have been determined to be 
                  ; the same as one another, are not NULL bytes.
    jnz loop_start ; If the bytes that were matched are not NULL 
                   ; bytes, continue the processing loop.
    mov eax, 0 ; Otherwise if they are both NULL bytes, set the
               ; return value to 0 to indicate the two strings 
               ; are the same as one another.
finish:
    pop esi ; Restore the original value of ESI.
    pop edi ; Restore the original value of EDI.
    mov esp, ebp
    pop ebp ; Function epilogue code
    ret

Wednesday, November 21, 2018

Practical Reverse Engineering - Chapter 1 pg 35 - Exercise #3

Question: In some of the assembly listings, the function name has a @ prefix followed by a number. Explain when and why this decoration exists.


So first off I should clarify that technically the @ symbol isn't a prefix as described in the book but is rather a suffix which is part of the function's name. With that out of the way, I did some digging into the way this is used. An example of a function within the assembly listings referenced by this question was _DllMain@12. According to https://en.wikipedia.org/wiki/Name_mangling, this is a common way to perform name mangling on functions (all the following information is taken from this page so please refer to it if you wish to gain a more complete understanding. I'm just simplifying what they say and adding some examples).

So what exactly is name mangling? Simply put, when compiling a program the linker needs to know what functions are being used or declared by a program, their calling convention, and the size and types of the respective parameters. Additionally if any functions are overloaded (aka multiple declarations of the same function but with different signatures (ex making two declarations such as int aFunc(b) and int aFunc(b,c). Both have the same function name but different signatures as the expected parameters are different), or two functions with the same name but different namespaces (internalFunc::theFunc(a,b) and externalFunc::theFunc(a,b) would be an example), then mangling needs to occur.

In C the convention is as follows:
  • If the calling convention is CDECL, append _ before the function name.
  • If the calling convention is STDCALL, append _ before the function name. Then append @ after the function name, followed by the total number of bytes that all the parameters take up.
  • If the calling convention is FASTCALL, then add @ before the function name. Then append @ after the function name, followed by the total number of bytes that all the parameters take up.
If you think about it this makes sense though. In CDECL, the caller is responsible for cleaning up the stack, so there is no need for the linker to really care about how many bytes are used in the callee, this is handled by the caller. In the case of STDCALL and FASTCALL though, the linker needs to be aware of how many bytes these functions use for their arguments so that it can generate code that appropriately cleans up the stack for these functions.

So from this we can break down the definition of _DllMain@12. Since it starts with _ and ends with @ followed by a number, it is a STDCALL function. Additionally based on the fact that it follows the conventions mentioned above we can also conclude it is a C function and not a C++ function (which will we show shortly). The DllMain part tells the linker that the function name is DllMain whist the @12 part tells the linker to reserve 12 bytes on the stack for the arguments. This makes sense since DllMain has the definition DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved), which means each argument takes 4 bytes each. 4 * 3 arguments gives us the 12 bytes mentioned in the mangled function definition.

Interestingly according to the Wikipedia page mentioned earlier, unlike C there appears to be no clear mangling method for C++ functions. For example the Borland compiler has one way of mangling function names whilst the Microsoft MSVC compiler has a very different way of mangling names (refer to https://en.wikipedia.org/wiki/Name_mangling for some examples).

Perhaps the best tutorial I've seen on actually demangling C++ names by hand was this tutorial though: http://www.int0x80.gr/papers/name_mangling.pdf. Worth a read if you have the time.

Also interesting note that was mentioned in the http://www.int0x80.gr/papers/name_mangling.pdf paper but if your using IDA you can change the way IDA displays mangled symbols by choosing Options->Demangled Names.... and then clicking on the option buttons to either set the demangled name as a comment, not demangle names at all, or change the function name to the demangled version. Should help if you want to clean things up a bit in your view.

Tuesday, November 20, 2018

Practical Reverse Engineering - Chapter 1 pg 35 - Exercise #2

Question: In the example walkthrough, we did a nearly one-to-one translation of the assembly code to C. As an exercise, re-decompile this whole function so that it looks more natural. What can you say about the developer's skill level/experience? Explain your reasons. Can you do a better job?

My version of the original code after decompilation by hand into Visual Studio:
#include "stdafx.h"
#include <intrin .h="">
#include <tlhelp32 .h="">

typedef struct _IDTR {
    DWORD base;
    SHORT limit;
} IDTR, *PIDTR;

BOOL _stdcall DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved) {
    IDTR idtr;
    tagPROCESSENTRY32 processEntry;
    HANDLE snapshot;
    __sidt(&idtr);
    if ((idtr.base > 0x8003F400) && (idtr.base < 0x80047400)) {
        return FALSE;
    }
    memset(&processEntry, 0, 0x128);
    snapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
    if (snapshot == INVALID_HANDLE_VALUE) {
        return FALSE;
    }
    processEntry.dwSize = 0x128;
    if (Process32First(snapshot, &processEntry) == 0) {
        return FALSE;
    }
    while (strcmp(processEntry.szExeFile, "explorer.exe") == 0) {
        Process32Next(snapshot, &processEntry);
    }
    if (processEntry.th32ParentProcessID == processEntry.th32ProcessID) {
        return FALSE;
    }
    if (fdwReason == DLL_PROCESS_ATTACH) {
        CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)0x100032D0, NULL, 0, NULL);
    }
    return TRUE;
}
Based off of the number of checks that are occurring here I would say the developer seems to be aware of the limitations of some Windows functions and what needs to be checked before continuing. Therefore I would say that they have a good level of experience with the Win32 API. My guess based on https://www.slideshare.net/MattiaSalvi2/virtual-machines-security-internals-detection-and-exploitation is that the attackers were trying to identify if the machine is potentially being virtualized so that they could prevent exploitation if such scenarios where encountered, however as described in the book this is a really bad way of doing things as it assumes that the code is running on core 0.

Since each core/processor has its own individual IDT table, if this code was run on a secondary core or processor (aka not processor 0/core 0) then the IDT register value would not be the same as the values the attacker is expecting in this code. Similarly later versions of Windows also change the IDT base address across boots so this adds even less reliability to the technique utilized by the malware. These two points are noted on page 31 where the authors state "Clearly 0x8003F400 is only valid for core 0 on Windows XP. If the instruction were to be scheduled to run on another core, the IDTR would be 0xBAB3C590. On later versions of Windows, the IDT base address changes between reboots; hence the practice of hardcoding base addresses will not work".

As for if one could improve this, I did some research into things and found this paper: https://www.slideshare.net/MattiaSalvi2/virtual-machines-security-internals-detection-and-exploitation by Matta Salvi which mentions something quite similar to what the code in this example was trying to achieve (ironically though VM detection code actually performs the opposite of what the code in the malware is doing. My guess is some old VM software versions may have been acting as virtual Windows XP machines back in the day, hence why malware may try check the IDT base address) as well as a project called Scoopy.

Looking this up links to ScoopyNG located at http://www.trapkit.de/tools/scoopyng/ which is a project containing a number of Anti-Virtualization tricks that can be used to detect virtualization. Testing these out most of the techniques didn't work anymore (VMWare and VirtualBox have both gotten smarter about detecting Anti-VM techniques, or at least the well known ones), however there were a few VMWare specific techniques that still worked on a recent version of VMWare Workstation when I tested it out. Keep in mind ScoopyNG project is from 2008 though so I'm sure there are a lot better techniques and mitigations out there today (10 years does change a lot of things technology wise).

On the results of testing ScoopyNG on recent versions of VMWare and VirtualBox, I'd imagine this code could be improved to not only use more specific techniques to determine if they are in a virtual machine (and hence stop exploitation if so), but also something like GetVersionEx to help determine what version of the operating system they are running on if they wanted to target a specific range of operating systems (provided they were targeting OS versions prior to Windows 8.1). This would also get around the limitation of the IDT technique used as regardless of which core the GetVersionEx function is run on, valid OS version information should still be returned.

On a side note though I'd be interested to know if anyone has a better method for detecting VMs that still works these days. The RedPill method that Joanna Rutkowska (http://www.securiteam.com/securityreviews/6Z00H20BQS.html) published has long since been mitigated and most of the other techniques seem to be specific to the VM software being utilized. Therefore I'm curious if VM detection strategies nowadays have migrated to performing platform specific attacks (like the checks that SnoopyNG did to successfully detect it was running in VMWare Workstation during my tests) or if there are still ways to do things generically (all the generic test in VirtualBox failed and SnoopyNG didn't have any VirtualBox specific tests so it thought it was running in a native machine).

Monday, November 12, 2018

Practical Reverse Engineering - Chapter 1 pg 25 - Author's Challenge Solution

Question: Take the example shown on page 24 and decompile it further to make it look more "natural"


Solution:

char * sub_1000AE3B(char * aStr){
    signed int length = lstrlenA(aStr);
    signed int counter1 = 0;
    signed int counter2 = 0;
    if(length != 0){
        while(counter2 < length){
            aStr[counter1] = aStr[counter2];
            counter2 += 3;
            counter1 += 1;
        }
    }
    aStr[counter1] = '\x00';
    return aStr;
}

Additional Notes

The disassembly on page 25 of the edition I am reading is incorrect. In the example given the authors provide an example whose assembly is as follows:
01 mov ecx, edx
02 loc_CFB8F:
03     lodsd
04     not eax
05     stosd
06     loop loc_CFB8F
They then state that this is the corresponding disassembly to C:
while (ecx != 0) {
     eax = *edi
     edi++;
     *esi = ~eax;
      esi++;
      ecx--;
}
If one refers back to their earlier remarks about how lodsd works, one will see that this command actually loads into EAX the DWORD at the location pointed to by ESI (refer to http://faydoc.tripod.com/cpu/lodsd.htm or the x86 instruction manual by Intel if you need further confirmation). Similarly, stosd actually stores the value at EAX into EDI, not ESI as shown above (see http://faydoc.tripod.com/cpu/stosd.htm).

Therefore the rough C code should actually be:
while (ecx != 0) {
     eax = *esi
     esi++;
     *edi = ~eax;
      edi++;
      ecx--;
}
Aka a simple switch of the ESI and EDI registers in the example C code provided.