Iret #GP on pre-commit handling failure: the NetBSD case (CVE-2009-2793)
------------------------------------------------------------------------

On the Intel architecture, once an operating system kernel has completed
servicing an interrupt or exception, it will generally return to user
mode using iret. The iret instruction will restore the context required
to continue execution, such as code segment, instruction pointer, flags
and so on.

iret is a complex instruction whose pseudocode alone spans several pages
of the software developers manual. Interestingly, in protected mode it
is executed in two distinct stages, a pre-commit stage (before privilege
level is changed) and a post-commit stage (after privilege level is
changed). You can see the commit point in the pseudocode below (taken
from Intel manual, comment is ours)

IF new mode != 64-Bit Mode
  THEN
    IF tempEIP is not within code segment limits
      THEN #GP(0); FI;
    EIP <- tempEIP;
  ELSE (* new mode = 64-bit mode *)
    IF tempRIP is non-canonical
      THEN #GP(0); FI;
    RIP <- tempRIP;
FI;
CS <- tempCS;  // This is the commit point (privilege switch)
EFLAGS (CF, PF, AF, ZF, SF, TF, DF, OF, NT) <- tempEFLAGS;

When the processor handles an exception, two cases can arise:
- the handler procedure is executed at the same level of privilege
  as the interrupted procedure, no stack switch occurs
- the handler procedure is executed at a different privilege level,
  therefore a stack switch occurs

The generated stack frame will be different if a stack switch occurs,
because the processor needs to save the interrupted procedure's stack.

When iret returns to a different privilege level, its behaviour on
failure will depend on which stage of the operation it is currently
executing.  A pre-commit failure will induce no stack-switching while a
post-commit failure will induce a stack switching and therefore generate
a different size trap frame.

--------------------
Affected Software
------------------------

It's easy to overlook this distinction and we have found multiple cases
where this has had direct security consequences or made other issues
exploitable.

For instance, the NetBSD kernel on x86 does not handle pre-commit failures
properly.

We can easily make iret fail pre-commit by having tempEIP outside the
code segment limits.

- The canonical way to do this is to set-up a LDT entry with a code segment
  limited to 0x1FFF. mmap memory at 0x1000 and then put some shellcode with
  an int 0x80 at the very end of this page, so that when the kernel iret,
  tempEIP is past the code segment limits.

- Interestingly, because of the lazy handling of non executable stack
  emulation on x86, this bug could be triggered by a non malicious
  program:

/* ... */
int main(int argc, char **argv)
{
  jmp_buf env;

  void handlesig(int n) {
        longjmp(env, 1);

  }
  signal(SIGSEGV, handlesig);

  if (setjmp(env) == 0) {
        ( (void(*)(void)) NULL) ();
  }

  return 0;
}

/* ... */
int main(int argc, char **argv)
{
       char baguette;
       signal(SIGABRT, (void (*)(int))&baguette);
       abort();
}

--------------------
Consequences
-----------------------

In the NetBSD case, the kernel stack will get desynchronized. This might
allow an attacker to elevate privileges.

-------------------
Solution
-----------------------

We reported this to NetBSD developpers in May. Obviously, the fix is
non trivial, and after much discussion, we agreed to release this
information to open this issue to the wider NetBSD developement
community.

-------------------
Credit
-----------------------

This bug was discovered by Tavis Ormandy and Julien Tinnes of the Google
Security Team.