?

Log in

tachikoma valentines

Assembly review, part 1 of 3

None of this will be new to most of my readers… because I’m pretty sure I have 3 at most, and 2 of them should know this stuff already.  But!  This is for the sake of the third.

Say you’re given a disassembly.  For example, here’s some that might be outputted by objdump -d fun.o:
fun.o:     file format elf32-i386

Disassembly of section .text:

00000000 <fun>:
   0:        55                           push   %ebp
   1:        89 e5                        mov    %esp,%ebp
   3:        53                           push   %ebx
   4:        83 ec 08                     sub    $0x8,%esp
   7:        83 7d 08 01                  cmpl   $0x1,0x8(%ebp)
   b:        77 08                        ja     15 <fun+0x15>
   d:        8b 45 08                     mov    0x8(%ebp),%eax
  10:        89 45 f8                     mov    %eax,-0x8(%ebp)
  13:        eb 23                        jmp    38 <fun+0x38>
  15:        8b 45 08                     mov    0x8(%ebp),%eax
  18:        83 e8 02                     sub    $0x2,%eax
  1b:        89 04 24                     mov    %eax,(%esp)
  1e:        e8 fc ff ff ff               call   1f <fun+0x1f>
  23:        89 c3                        mov    %eax,%ebx
  25:        8b 45 08                     mov    0x8(%ebp),%eax
  28:        83 e8 01                     sub    $0x1,%eax
  2b:        89 04 24                     mov    %eax,(%esp)
  2e:        e8 fc ff ff ff               call   2f <fun+0x2f>
  33:        01 c3                        add    %eax,%ebx
  35:        89 5d f8                     mov    %ebx,-0x8(%ebp)
  38:        8b 45 f8                     mov    -0x8(%ebp),%eax
  3b:        83 c4 08                     add    $0x8,%esp
  3e:        5b                           pop    %ebx
  3f:        5d                           pop    %ebp
  40:        c3                           ret   
Let’s work step-by-step to figure out what this is.

Assumption: cdecl calling convention.

Assumption: fun is a function.  Stuff other than functions can and do end up in the .text section, but this seems to disassemble just fine, so it’s probably executable code.  So let’s start with a skeleton:
… fun(…);

Bytes 0-6 are the function prelude: saving the old %ebp, setting %ebp to the top of our frame, and growing our stack by 0x8.

Quick, what does the stack look like?
....
....^-- arguments are above the return address
....<-- return address
    v-- top of our frame -- %ebp now points here
....<-- the eight bytes of our frame
....<-- that we just reserved...
    ^-- bottom of our frame -- %esp now points here

Bytes 7-c compare the literal value 1 to the 4-byte value at an address 8 bytes above %ebp — in other words, the first argument — then if (unsigned)arg1 > 1, we jump to byte 15 (ja is the unsigned version of jg, jump-if-greater.)  Now we know that fun takes at least one argument, the first of which is an unsigned 4-byte value, so:
… fun(unsigned arg1, …);

Suppose we didn’t take the jump, meaning that (unsigned)arg1 ≤ 1.  Bytes d-14 copy arg1 to %eax, then %eax to the bottom of our frame — that’s a local variable.
… fun(unsigned arg1, …) {
    unsigned local1;
    if (arg1 <= 1)
        local1 = arg1;
    …
}


Alright, so suppose that bytes b-c do jump us to 15.  Bytes 15-24 copy arg1 to %eax, subtract 2, push it on the stack, recursively call fun, then save the return value in %ebx.  Bytes 25-33 copy arg1 to %eax, subtract 1, push it on the stack, recursively call fun, then add the return value to %ebx.  Bytes 35-37 then take %ebx, the sum, and saves it to what we decided to call local1.  From this, we can gather that fun takes a single argument, since that’s all we push onto the stack before calling it, and returns a 4-byte value, since we treat %eax as one:
unsigned fun(unsigned arg1) {
    unsigned local1;
    if (arg1 <= 1)
        local1 = arg1;
    else
        local1 = fun(arg1 - 2) + fun(arg1 - 1);
    …
}


Bytes 38-40 copy local1 to %eax — the return register — then go through the standard teardown of our frame and return.  We’ve completed analyzing this function:
unsigned fun(unsigned arg1)
    unsigned local1;
    if (arg1 <= 1)
        local1 = arg1;
    else
        local1 = fun(arg1 - 2) + fun(arg1 - 1);
    return local1;
}


That looks very familiar… almost like a naïve Fibonacci number generator!  In fact, if we write this:
unsigned fib(unsigned n) {
    if (n <= 1)
        return n;
    return fib(n - 2) + fib(n - 1);
}
and pass it through gcc -S, it outputs this:
        .file  "fib.c"
        .text
.globl fib
        .type  fib, @function
fib:
        pushl  %ebp
        movl   %esp, %ebp
        pushl  %ebx
        subl   $8, %esp
        cmpl   $1, 8(%ebp)
        ja     .L2
        movl   8(%ebp), %eax
        movl   %eax, -8(%ebp)
        jmp    .L4
.L2:
        movl   8(%ebp), %eax
        subl   $2, %eax
        movl   %eax, (%esp)
        call   fib
        movl   %eax, %ebx
        movl   8(%ebp), %eax
        subl   $1, %eax
        movl   %eax, (%esp)
        call   fib
        addl   %eax, %ebx
        movl   %ebx, -8(%ebp)
.L4:
        movl   -8(%ebp), %eax
        addl   $8, %esp
        popl   %ebx
        popl   %ebp
        ret
        .size  fib, .-fib
        .ident "GCC: (GNU) 4.2.4 (Gentoo 4.2.4 p1.0)"
        .section       .note.GNU-stack,"",@progbits
So we pretty much nailed it on the head.  (As to be expected from a simple example like this.)

Comments