Decompiling specific routines within an executable is far easier than decompiling the entire program, especially when the work is being done by hand. For a single routine, there is a basic process that can be repeated as necessary to produce high-level code from a disassembled listing. Entire applications may bring up issues of compression or encryption, OOP-style calls-by-reference, program segments, application resources, and so on ... these tend to complicate the decompilation process, if not obstruct it entirely. Whenever possible, it is better to narrow the scope of one's work to a handful of vital routines, rather than on the executable as a whole.
The decompilation process in itself is simple:
Consider the following assembly routine:
0800A0C1 89 E5 mov ebp, esp
0800A0C3 57 push edi
0800A0C4 56 push esi
0800A0C5 53 push ebx
0800A0C6 8B 5D 08 mov ebx, [ebp+08h]
0800A0C9 8B 75 0C mov esi, [ebp+0Ch]
0800A0CC 03 75 14 add esi, [ebp+14h]
0800A0CF 03 75 10 add esi, [ebp+10h]
0800A0D2 01 DE add esi, ebx
0800A0D4 89 F0 mov eax, esi
0800A0D6 BF E8 03 00 00 mov edi, 3E8h
0800A0DB 99 cdq
0800A0DC F7 FF idiv edi
0800A0DE 8D 72 07 lea esi, [edx+7]
0800A0E1 81 FE 10 27 00 00 cmp esi, 2710h
0800A0E7 76 0F jbe short 800A0F8
0800A0E9 68 D5 5B 06 08 push offset str_WrongLicenseDate
0800A0EE 6A 00 push 0
0800A0F0 E8 5B 25 00 00 call fatalError
0800A0F5 83 C4 08 add esp, 8
0800A0F8 BA 07 00 00 00 mov edx, 7
0800A0FD 8B 4D 18 mov ecx, [ebp+18h]
0800A100 80 39 00 cmp byte ptr [ecx], 0
0800A103 74 11 jz short 800A116
0800A105 90 nop
0800A106 90 nop
0800A107 90 nop
0800A108 8D 14 52 lea edx, [edx+edx*2]
0800A10B 0F BE 01 movsx eax, byte ptr [ecx]
0800A10E 01 C2 add edx, eax
0800A110 41 inc ecx
0800A111 80 39 00 cmp byte ptr [ecx], 0
0800A114 75 F2 jnz short 800A108
0800A116 31 C9 xor ecx, ecx
0800A118 39 F1 cmp ecx, esi
0800A11A 7D 1C jge short 800A138
0800A11C 8D 04 52 lea eax, [edx+edx*2]
0800A11F 8B 55 0C mov edx, [ebp+0Ch]
0800A122 01 C2 add edx, eax
0800A124 8D 04 5B lea eax, [ebx+ebx*2]
0800A127 8D 04 43 lea eax, [ebx+eax*2]
0800A12A 34 F0 xor al, 0F0h
0800A12C 8D 1C 45 03 00 00 00 lea ebx, ds:3[eax*2]
0800A133 41 inc ecx
0800A134 39 F1 cmp ecx, esi
0800A136 7C E4 jl short 800A11C
0800A138 53 push ebx
0800A139 52 push edx
0800A13A 68 E8 5B 06 08 push offset str_%x%x
0800A13F 68 98 77 08 08 push offset cBuffer
0800A144 E8 47 B4 03 00 call sprintf
0800A149 B8 98 77 08 08 mov eax, offset cBuffer
0800A14E 8D 65 F4 lea esp, [ebp-0Ch]
0800A151 5B pop ebx
0800A152 5E pop esi
0800A153 5F pop edi
0800A154 89 EC mov esp, ebp
0800A156 5D pop ebp
0800A157 C3 retn
Function Prototype
The first order of business is to determine the calling convention of the routine. The invoking code looks as follows:
0800A207 push esi
0800A208 mov eax, [year]
0800A20E push eax
0800A20F mov eax, [day]
0800A215 push eax
0800A216 mov eax, [month]
0800A21C push eax
0800A21D mov eax, [num]
0800A223 push eax
0800A224 call 0800A0C1
0800A229 push eax
0800A22A push ebx
0800A22B call strcmp
0800A230 add esp, 1Ch
Notice that 0800A230 adjusts the stack for 7 DWORDs, representing the
parameters for the strcmp call and the call of the target routine inclusive.
Since the calling function is responsible for cleaning up the stack, we
can assume that the routine uses the C calling convention.
The next step is to provide a basic C function declaration in place of the stack frame. In the disassembly, the stack frame appears as follows:
0800A0C1 89 E5 mov ebp, esp ; enter subroutine
0800A0C3 57 push edi ; preserve registers
0800A0C4 56 push esi
0800A0C5 53 push ebx
...
0800A149 B8 98 77 08 08 mov eax, offset cBuffer ;load return value
0800A14E 8D 65 F4 lea esp, [ebp-0Ch] ;cleanup stack
0800A151 5B pop ebx ;restore registers
0800A152 5E pop esi
0800A153 5F pop edi
0800A154 89 EC mov esp, ebp ;leave subroutine
0800A156 5D pop ebp
0800A157 C3 retn ;return from call
This can be replaced with
*char stringMunch( num, month, day, year, reg_ESI ){
...
return(Buffer);
}
Reformatting Calls
The code can be tightened up a bit more by rewriting the calls in the routine using proper C syntax; this will clear up a few PUSHes, as well as the ADD ESP cleanup after each one. The two calls
0800A0E9 68 D5 5B 06 08 push offset str_WrongLicenseDate
0800A0EE 6A 00 push 0
0800A0F0 E8 5B 25 00 00 call fatalError
0800A0F5 83 C4 08 add esp, 8
0800A138 53 push ebx
0800A139 52 push edx
0800A13A 68 E8 5B 06 08 push offset str_%x%x
0800A13F 68 98 77 08 08 push offset cBuffer
0800A144 E8 47 B4 03 00 call sprintf
can be rewritten as
fatalError( NULL, *str_WrongLicenseDate );
sprintf( *Buffer, "%x%x", reg_EDX, reg_EBX);
... the register values will be taken care of later.
Naming Parameters
Once the calling convention is known, and preliminary names chosen for the passed parameters, the [ebp+##] values can be replaced with the chosen parameter names. In this case,
Register Elimination
The constant shuffling of values to and from registers is one of the most confusing aspects of assembly language; it is important to constantly clean up unused or 'dead' registers while decompiling, in order to simplify the job of translating assembly language to C.
Quite often this invloves creating higher-level expressions out of assembly language; the code fragment
0800A0C6 8B 5D 08 mov ebx, num
0800A0C9 8B 75 0C mov esi, month
0800A0CC 03 75 14 add esi, day
0800A0CF 03 75 10 add esi, year
0800A0D2 01 DE add esi, ebx
0800A0D4 89 F0 mov eax, esi
can be converted to
reg_EAX = ( month + day + year + num );
Thus eliminating [at the moment] the need for the ESI and EBX registers.
Creating High-level Expressions
Translating asm instructions into higher-level expressions is what the majority of decompiling work consists of; the rest of the program consists of execution flow control and data shuffling. There are three types of expressions recognized by higher-level languages: assignment, logical, and mathematical expressions.
Assignment expressions are represented with '=' in C; in assembly, they generally consist of MOV and LEA instructions.
Logical expressions consist of AND, OR, NEG, and XOR; in C these are represented by '&', '|', '~', and '^'. The left-shift and right-shift instructions are represented by '<<' and '>>' in C, though they are usually indicative of multiplication.
Mathematical expressions are more varied in assembler; however in C they all resolved to one of '+', '-', '*', '/', and '%'.
0800A0D4 reg_EAX = ( month + day + year + num );
0800A0D6 BF E8 03 00 00 mov edi, 3E8h
0800A0DB 99 cdq
0800A0DC F7 FF idiv edi
0800A0DE 8D 72 07 lea esi, [edx+7]
The CDQ serves only to facilitate the QWORD division handled by
the IDIV instruction; after this code has executed, EAX will contain
the quotient and edx the remainder. Since only EDX is used, the IDIV
can be recoded in C as a modulus expression:
reg_ESI = ((day + opt + month + year) % 1000 )+ 7
Notice that the redundant EAX assignment has been removed from the expression.
Also, the LEA instruction is used as a mathematical instruction rather than
as an assignment; this is a compiler optimization that will recur frequently
in the target routine.After converting much of the assembler to assignment and mathematical expressions, we have the following tightened-up code:
*char stringMunch( num, month, day, year, reg_ESI ){
reg_EBX = year
reg_ESI = ((day + opt + month + year) % 1000 )+ 7
cmp reg_ESI, 2710h
jbe short 800A0F8
fatalError( NULL, *str_WrongLicenseDate );
800A0F8:
reg_EDX = 7
reg_ECX = param_reg_ESI
cmp reg_ECX, 0
jz short 800A116
800A108:
reg_EDX *= 3
reg_EDX += reg_ECX
reg_ECX++
cmp reg_ECX, 0
jnz short 800A108
800A116
reg_ECX = 0
cmp reg_ECX, REG_ESI
jge short 800A138
800A11C:
reg_EAX = reg_EDX*3
reg_EDX = month
reg_EDX += reg_EAX
reg_EAX = reg_EBX * 7
xor al, 0F0h
reg_EBX = (reg_EAX *2) + 3
reg_ECX++
cmp ecx, esi
jl short 800A11C
800A138:
sprintf( *Buffer, "%x%x", reg_EDX, reg_EBX);
return(Buffer);
}
For clarity, the registers can be cleaned up a bit to yield
800A0F8:
reg_EDX = 7;
cmp param_reg_ESI, 0
jz short 800A116
800A108:
reg_EDX = (reg_EDX * 3) + param_reg_ESI;
param_reg_ESI++;
cmp param_reg_ESI, 0
jnz short 800A108
800A116
param_reg_ES = 0;
cmp param_reg_ESI, REG_ESI
jge short 800A138
800A11C:
reg_EDX = (reg_EDX * 3) + month
year = (((year * 7)^ 0F0h) *2) + 3
param_reg_ESI++
cmp param_reg_ESI, REG_ESI
jl short 800A11C
Reconstruct Control Structures
By now the code is abstracted enough that the underlying logic is apparent. The first construct is obviously an IF statement:
reg_ESI = ((day + opt + month + year) % 1000 )+ 7;
if ( reg_ESI > 10,000){
fatalError( NULL, *str_WrongLicenseDate );
}
Immediately following this is a WHILE loop, evidenced by the
"is param_reg_ESI = 0" check. ESI is typically used as an index
into an array [and therefore a string]; since in C strings are
null terminated, this code translates to a "while NOT end-of-string"
loop. For convenience, we "change param_reg_ESI" to "ptrString":
reg_EDX = 7;
while ( *ptrString > 0 ) {
reg_EDX = (reg_EDX * 3) + *ptrString;
ptrString++;
}
After this comes one more while loop, using the current value of ESI:
ptrString = 0;
while (ptrString < reg_ESI )
{
reg_EDX = (reg_EDX * 3) + month;
year = (((year * 7)^ 0F0h) *2) + 3;
ptrString++;
}
The ESI register is put to use here as a 'magic number', part of the
mathematical black box of this number crunching routine. The EDX
register is used to house the key that will later be written, along
with a manipulation of the year, to a global string using sprintf().
Wrapping it up
At this point the routine is completely rewritten in C; it can be incorporated into a source code recovery effort for the entire program, or used in another program entirely in order to duplicate the registration process of the target.
/* global data */
char[] Buffer;
/* RegInfo Number Cruncher Routine */
char* stringMunch( num, month, day, year, reg_ESI ){
long magic = ((day + opt + month + year) % 1000 )+ 7;
if ( magic > 10,000){
fatalError( NULL, *str_WrongLicenseDate );
}
key = 7;
while ( *ptrString > 0 ) {
key = (key * 3) + *ptrString;
ptrString++;
}
ptrString = 0;
while (ptrString < magic ){
key = (key * 3) + month;
year = (((year * 7)^ 0F0h) *2) + 3;
ptrString++
}
sprintf( *Buffer, "%x%x", key, year );
return( Buffer );
}