By Kenton Varda - 08 Apr 2015
A few months ago I discovered a security bug in the Darwin kernel used by most Apple products. The bug could allow an attacker to trivially remotely DoS a variety of network services and apps, from Node.js to Chrome. Today, Apple released a patch (look for CVE-2015-1105), so now I can tell you about it.
Now, just to be clear, I’m no Adam Langley. This bug is “just” a DoS, nothing like a Heartbleed or a Shellshock. The worst it can do is allow an attacker to cause a temporary service disruption. But I think all security bugs deserve a writeup so that we can learn from them, and Apple’s terse description of the problem doesn’t accomplish this. Also, it’s a fun story.
I discovered the problem while doing research on the different interfaces that various operating systems provide for doing event-driven I/O – that is, how you tell the platform: “Here are all my open connections; wake me up when one of them receives a message.” It turns out that every OS does this differently. Linux has epoll
. BSD has kqueue
. Windows has… well, about five different mechanisms that cover differing subsets of usecases and you can only choose one. In any case, I was trying to build an abstraction layer over these for Cap’n Proto, so I wanted to make sure I understood them all.
I noticed a curious thing: some man pages discussed an event called “out-of-band data” while others didn’t. “Out-of-band data” (OOB), also known as “urgent data”, is a little-used feature of TCP connections that essentially allows you to send a byte that “jumps the queue” so that the receiving app can receive it before receiving other data sent before it. You probably didn’t know about this, because basically no one uses this feature – except for Telnet, which needs a way to signal that you pressed “ctrl+C” when the destination app is not otherwise processing input.
With almost all event notification APIs, regular data and OOB data raise different kinds of events. For example, poll()
(and its successor on Linux, epoll
) has POLLIN
for regular data and POLLPRI
for OOB. This way, if your app does not expect to receive OOB data, it simply doesn’t ask to be notified about that type of event, and the kernel happily discards it for you (or maybe inserts it into the regular stream, which is fine).
Curiously, the BSD kqueue
docs are unclear on how OOB data is handled. FreeBSD’s kqueue
makes no mention of it, and as far as I’ve been able to determine it simply doesn’t support notification of OOB events at all. DragonflyBSD’s kqueue
defines an EVFILT_EXCEPT
event type.
Darwin’s (OSX/iOS) kqueue
also doesn’t mention OOB data, but some Googling revealed an undocumented “feature”: on OOB data, Darwin will raise a regular EVFILT_READ
event (which normally indicates that regular in-band data was received) but set the special flag EV_OOBAND
on the event structure.
Of course, if you aren’t expecting OOB data, you’re not going to check for that flag. So when you receive EVFILT_READ
, you’re going to believe you’ve received regular data. And you’re going to do a recv()
call to read that data, and there isn’t going to be any. And then you’re going to say “oh well” and return to the event loop. But if you are using kqueue()
in level-triggered mode (as most people do, because it’s easier), then the operating system is going to see that the OOB data is still there, and is going to give you the exact same event again.
So you go into an infinite loop.
Wait, doesn’t that mean almost all event-driven OSX network apps will go into an infinite loop if they receive a single TCP packet with the urgent bit set?
I didn’t think that could possibly be true at first, so I fired up a Mac machine to try it. Sure enough:
When Google Chrome visited an HTTP server that sent back an OOB byte, the whole app (not just the tab, but everything) locked up and had to be force-quit. It turns out Chrome does all networking from the main process, so the per-tab process separation did not help. (Chromium issue 437642 – currently still locked down as a security issue)
When a Node.js server received an OOB byte from a client, the server would go into an infinite loop and stop handling other connections. (fixed in this commit)
On the other hand, my third test case – nginx – was not affected, because it uses kqueue in edge-triggered mode, and therefore it only receives the unexpected event when new data arrives rather than any time data is available – i.e. once rather than infinity times. But two of three is a pretty worrisome hit rate, especially when these are some very big names.
Arguably the worst / most interesting part of this problem is that it was a problem inherent in the API. Technically it was not that the kernel was buggy, but that the interface was confusing (and underdocumented) in a way that caused the same bug to manifest in several different apps. Fixing the problem either required fixing every app (and being ever-vigilant in the future), or changing the API and breaking any existing app that depended on the behavior (of which there appears to be a few).
To Apple’s credit, they did what I think is the right thing: they changed the interface so that it no longer reports EVFILT_READ
events on TCP OOB data. I do not quite understand their description of this problem as a “state inconsistency issue”, but my tests confirm that OOB data is now ignored.
The moral of the story? Confusing APIs are a security problem. If many users of your API get it wrong in a way that introduces a security bug, that’s a bug in your API, not their code.
By Asheesh Laroia - 06 Apr 2015
I’m hoping to see you in Montreal at PyCon 2015! I’m co-speaking at a talk and a tutorial. I’ll also be eager to talk to people about Sandstorm and self-hosting servers.
Last year, I co-gave a talk about turning your computer into a server with Karen Rustad. Here’s what it looked like when one of my friends surprised me by changing the data we were showing during a live demo:
That Lol is supposed to say Django. Thanks, Luke. Check out my expression of amusement masking horror.
I got compromised because the Django sample app I demo’d has a default admin account bundled with it, as part of the demo. The good news is if you’re using Sandstorm, the platform handles authentication & authorization for apps, so this sort of thing won’t happen to you. Plus, as you would expect, Sandstorm ships with no default passwords.
This year I’m co-leading a tutorial called Getting Comfortable With Web Security where we discuss all sorts of common security issues with web applications; Jacky Chang and Nicole Zuckerman are my co-presenters. I’m also sharing a stage with Philip James to answer the question, Type python, press enter. What happens?
I hope to see you there! Doubly so if you’re interested in a Sandstorm & server self-hosting Birds of a Feather session. Send an email to [email protected]!
By David Renshaw - 30 Mar 2015
One positive side effect of my work on the GitLab app is that we now have a porting guide tailored specifically for Ruby on Rails. If you have a Rails app that you’d like to package for Sandstorm, this is the place to start. The guide should quickly get you up and running, and it explains how to deal with a few quirks that are particular to how Rails and Sandstorm interact.
I invite you to read it, to try porting an app, and to tell me what you think! I am by no means an expert on anything Ruby-related, so don’t hesitate to contribute corrections, either by directly editing the wiki, by starting a discussion on the sandstorm-dev mailing list, or by emailing me.
By Asheesh Laroia - 27 Mar 2015
After visiting Boston (well, Cambridge) for LibrePlanet, I met up with a few Sandstorm fans and prospsective users, and I snapped some photos.
Three of us met up at Clover Kendall to discuss NightScout, a free software web app for helping people share information about their blood sugar levels with family and friends.
Monday evenings are the regular meeting time for SIPB, MIT’s computing club, and some of us researched if Sandstorm could be of use to the MIT community.
I hope to see you in my future travels!
By Asheesh Laroia - 19 Mar 2015
I’m a Free Software Foundation member, and I’m happy to say that I’ll be visiting Boston this weekend for the FSF’s yearly LibrePlanet conference.
You can read more about the conference from Zak Rogoff’s summary of 2013 or Shauna Gordon-McKeon’s invitation to new speakers in 2014.
If you’re in the Boston area between now and Tuesday March 24, send me an email and let’s meet up! I’m [email protected].