By Kenton Varda - 17 Nov 2016
Sandstorm is about making it easy to run a personal server. But we also offer Sandstorm Oasis, a service which runs your Sandstorm server for you.
Contradiction?
Actually, no: Even if you run your own server, Oasis benefits you. Oasis is important because it makes it possible for anyone to use Sandstorm’s library of open source apps, even if they really don’t want to run their own server. A larger audience means that more and better apps will become available. Indeed, after we launched Oasis last year, the rate of new apps becoming available on Sandstorm spiked.
That benefits self-hosters, because those same apps can be used on your private server, too.
In fact, we at Sandstorm don’t necessarily think “the cloud” is a bad idea. What we believe is that you should have the freedom to choose what makes sense for you. But that choice is moot if the particular app you need to use is only available in the cloud – we need the same apps to be available everywhere.
Oasis has now been running reliably for over a year. The Sandstorm team uses Oasis every day to get our own work done. I am composing this blog post in Etherpad, while organizing my task list in Wekan, chatting with teammates in Rocket.Chat, and syncing files with Davros.
Here are just some of the things we’ve changed since Oasis was launched:
We’ve so far kept Oasis labeled “beta”, mostly because, as engineers, we always feel like there’s so much more to do. But, that will always be true – no good software project is ever “done”. With Oasis being used for so much real work, the time has come to remove the “beta” label.
Oasis will officially emerge from beta on November 27. We wanted to give advance notice of this change because it affects our paying users: we will no longer be waiving your subscription fee as we have during the beta period. For backers of our Indiegogo campaign who opted for free hosting as a perk, the timer on your service will start now (hey, you got an extra free year!). For the rest, your next monthly invoice will be charged from your credit card. Your subscription payments help support further development of Sandstorm and packaging more apps. Thank you for your support!
By Kenton Varda - 10 Nov 2016
As of a couple weeks ago – October 23, 2016 – Sandstorm can now be installed on systems with:
This means that Sandstorm can now be installed on Red Hat Enterprise Linux (RHEL) 7, as well as its cousin CentOS 7, both of which use kernel version 3.10.
It also means that Sandstorm can now be installed on Arch Linux, which has historically shipped kernels compiled with CONFIG_USER_NS=n
.
So if you previously couldn’t install Sandstorm because you were using one of these distros, now you can!
For the technically curious…
Sandstorm uses the Linux kernel’s “namespaces” feature as part of setting up the secure sandboxes in which apps run. Normally, creating namespaces requires root privileges, because these features could be used to escalate privileges. However, using “user namespaces”, a process that does not have root privileges can create a special kind of namespace in which other namespaces are (ostensibly) safe to use. Hence, it allows unprivileged processes to create sandboxes.
For security reasons, most of Sandstorm does not run with root privileges. Because of this, it has long relied on user namespaces to allow it to set up sandboxes. At the time the Sandstorm project started, it looked like user namespaces would soon be broadly available across Linux distros, so this seemed like a reasonable strategy.
Unfortunately, this has not been entirely true in practice. The enterprise-oriented RHEL and CentOS distros have long release cycles. Today, they still use kernel version 3.10, which is nearly three years old. Because user namespaces still had many problems in this kernel version, they were disabled by default and are today available only with a boot flag. Meanwhile, some faster-moving distros like Arch have chosen to keep user namespaces disabled even with newer kernel versions due to security concerns: the user namespaces feature has been the source of many local privilege escalation exploits in Linux. Although these vulnerabilities can’t be exploited by Sandstorm apps, such frequent vulnerabilities are problematic for servers which rely on user account separation for security outside of Sandstorm.
Even as it became apparent that Sandstorm’s use of user namespaces was preventing it from being used on some distros, we were hesitant to try other approaches. It seemed like the only way to solve the problem would be to employ a setuid-root binary to set up sandboxes when user namespaces were not available. setuid-root binaries are inherently risky – if not written exactly correctly, it could open its own privilege escalation vulnerability. Also, it would require a major refactoring of Sandstorm internals to move the supervisor into its own binary.
But a couple weeks ago, I realized suddenly that a different idea would work. The Sandstorm server normally starts up as root, but then runs several child processes under a regular user account. Most of Sandstorm’s business logic is in a node.js web server. That process talks via Cap’n Proto RPC to a “back-end” daemon written in C++, which in turn launches app sandboxes. This back-end daemon is hand-coded in C++, with the core logic all living in a single file.
Because of this design, it turned out to be relatively easy to pass superuser privileges down through the back-end, while still keeping them away from the web server. Specifically, the back-end can execute with its effective UID set to a normal user account, but its real UID being root. Then, when it comes time to start a sandbox, it can promote itself back to root to do the work.
This turned out to take only a couple hours to implement. In retrospect, the design seems obvious, and I wish I’d thought of it sooner!
There is a minor downside: If a vulnerability allows an attacker to cause the back-end to execute arbitrary code, that code could claim the superuser privileges, whereas before it would be limited to the Sandstorm server UID. This risk is probably small because the back-end is a relatively simple program that only speaks directly to other trusted programs (although it speaks indirectly to potentially-malicious actors). Nevertheless, if user namespaces are available, then Sandstorm will avoid handing root privileges to the back-end at all, continuing to operate as it did historically.
Existing Sandstorm users need not take any action. Your servers will continue to operate exactly as they always have.
But if you’ve been held back from installing Sandstorm before because it wouldn’t work on your distro, you should try again now!
By Kenton Varda - 25 Oct 2016
Last week, a Linux kernel bug, CVE-2016-5195, was described as “the most serious Linux local privilege escalation ever”. The bug – which potentially allowed any code running on a Linux machine to escalate its privileges to root – was already being actively exploited in the wild before it was fixed, and had existed in the kernel for many years.
Since Sandstorm allows any user of a server to upload their own apps, you might wonder if this bug could allow a Sandstorm user to compromise the server.
We’re happy to report that the answer appears to be “no”. As is often the case with Linux kernel bugs, our sandbox blocked the exploit.
Of course, we still recommend updating your kernel in case the bug can be exploited in ways that have not been discovered yet.
The bug in question was a race condition in the handling of memory pages mapped copy-on-write. A process can ask that a read-only file be mapped into its memory space in such a way that it is allowed to modify the mapped memory. When the process writes to the memory, the kernel makes a private copy of the affected page, so that the process only modifies its copy, not the original. Meanwhile, a process can request later on that the modifications it made be discarded, returning the page to its original state. In certain circumstances, by both writing to a page and requesting this discard at the same time (in separate threads), the process could end up writing to the original pages that are shared with other processes on the system, instead of its own private copy. Hence, the process could modify any file on the system. By modifying, say, the sudo
utility, it could give itself a backdoor which allows it to gain root privileges trivially.
However, not just any old write worked here. In order to trigger the race condition, the process had to write in a way that calls the kernel’s get_user_pages()
function with the force
parameter set to 1
. The force
parameter says: “If this page is mapped copy-on-write, then let me write to it (making a private copy) even if the page’s protection mode is read-only.” As it turns out, it is possible for a memory mapping to be both read-only and copy-on-write, and in fact this is the mode that is usually used when mapping in a program’s main binary and shared libraries. Normally, no copy is ever performed, because the writes that would trigger them are not allowed. However, there is a special case where this combination of flags matters: If you are running a program in a debugger, and you ask the debugger to insert a breakpoint, it does so by overwriting the instruction at the given address with a break instruction. That is, it modifies the mapped executable. The force
flag actually exists for exactly this purpose: so that the debugger can inject breakpoints into the program being executed by the process being debugged (without affecting any other processes that happen to be running the same program).
Because the force
flag is only useful in very specific circumstances, only certain code paths can trigger the vulnerability. Kernel security engineer and Sandstorm contributor Andrew Lutomirski tells us the only entry points appear to be:
ptrace()
system call’s PTRACE_POKEDATA
operation, which is explicitly meant to be used by debuggers, often for the purpose of setting breakpoints./proc/<pid>/mem
. It’s unclear why this code uses force
– possibly it was a mistake.As it turns out, none of these code paths can be exploited by Sandstorm apps:
ptrace()
./proc
inside app sandboxes./dev
contains only null
, zero
, and urandom
.So, as far as we can tell, Sandstorm has never been vulnerable to this bug.
Even if Sandstorm were vulnerable, the exploit would have far reduced impact inside Sandstorm than in a typical Linux environment, because:
NO_NEW_PRIVS
prctl()
flag inside the sandbox.When running on Sandstorm, a user’s data in an app like Etherpad is containerized separately from another user’s data. In fact, we go one step further and containerize each document separately. In the case that Sandstorm had not mitigated the bug outright, it appears the impact of the bug would be that an app could break Sandstorm’s per-document isolation and read/write documents from any number of users, so long as those users all use the same version of the same app on the same server. The app still would not have been able to interfere with other apps. This is the status quo in a typical Linux environment: in most non-Sandstorm environments, an app keeps all users’ data in a single database without per-user isolation. Overall, this is much less significant than a privilege escalation to root. Thankfully, our seccomp mitigation prevented this.
This is not the first Linux security bug mitigated by Sandstorm. In fact, we’ve kept a long list. Moreover, in addition to mitigating Linux kernel problem, Sandstorm mitigates most vulnerabilities in the apps that run on top of it. Check out the whole list of mitigated vulnerabilities that we’ve compiled: Sandstorm Security Non-Events
Want to try out Sandstorm as a user? Try the online demo »
By Jade Q. Wang - 13 Oct 2016
I’m sharing a pro-tip today because I like making sure that everyone gets the most productivity they can out of Sandstorm.
Let’s say I want to share a grain (e.g., a document, spreadsheet, git repository, or a Collection) with a group of colleagues who are already in the same Rocket.Chat chatroom. To do so, I first click the + icon in Rocket.Chat.
This opens a Powerbox request with a type-ahead search box. Before I’ve typed anything, I can see the grains that have most recently been opened by me.
For instance, if I’m looking for feedback for a blog post I drafted in Etherpad, I can type “Etherpad” and it will list all Etherpads that I have access to on this server.
Here are all the Etherpads I have access to. But today, I’m actually searching for something else: a Collection titled “Sales / Revenue docs”. I can also search for grains by title, so I search for “revenue”:
Once I’ve selected the grain, I can choose whether I’d like to give everyone in this chatroom the permission to edit or only view this Collection before I connect the grain with this Rocket.Chat room.
When I share a grain in a chatroom, it automatically renders a snippet which includes the icon for the app that opens the grain. It looks like this:
Now, everyone who is in this chatroom has access to this Collection (as well as every grain inside it). No need to go share it with them separately.
To try it out for yourself, go install Rocket.Chat now!
Do you use Sandstorm to collaborate at work? Sandstorm for Work (60-day free trial) comes with priority support, organization management features, and integration with enterprise infrastructure.
By the way, if you found this useful and would like to see more bite-sized pro-tip style blog posts in the future, please reshare this and let me know (I’m @qiqing on Twitter)!
By Kenton Varda - 30 Sep 2016
A month or two ago, we started seeing a mysterious problem in production: every now and then, one of our Node.js web server processes supporting Sandstorm Oasis would suddenly jump to 100% CPU usage (of one core) and stay there until it was killed. The problem wasn’t an infinite loop, though: the process continued to respond to requests, just slowly. Since the process continued to respond to requests, it continued to pass health checks and was never restarted automatically. But for users assigned to that shard, the service was essentially unusable, as every action would take seconds to complete. The problem left nothing at all suspicious in the logs – other than a gap in which far fewer requests that normal were being handled. At first, the problem only struck about once a week, seemingly at random.
This kind of bug is a web developer’s worst nightmare. How do you debug something which you can only reproduce once a week, at random, with real users on the line? What could even cause a process to slow down but not stop in this way?
Obviously, we needed to take a CPU profile while the bug was in progress. Of course, the bug only reproduced in production, therefore we’d have to take our profile in production. This ruled out any profiling technology that would harm performance at other times – so, no instrumented binaries. We’d need a sampling profiler that could run on an existing process on-demand. And it would have to understand both C++ and V8 Javascript. (This last requirement ruled out my personal favorite profiler, pprof from google-perftools.)
Luckily, it turns out there is a correct modern answer: Linux’s “perf” tool. This is a sampling profiler that relies on Linux kernel APIs, thus not requiring loading any code into the target binary at all, at least for C/C++. And for Javascript, it turns out V8 has built-in support for generating a “perf map”, which tells the tool how to map JITed code locations back to Javascript source: just pass the --perf_basic_prof_only_functions
flag on the Node command-line. This flag is safe in production – it writes some data to disk over time, but we rebuild all our VMs weekly, so the files never get large enough to be a problem.
Armed with this new knowledge, we waited. Finally, after a few days, my pager went off. I shelled into the broken server, recorded a ten-second profile, restarted Node, and then downloaded the data for analysis. Upon running perf
, I was presented with this:
Well, this looks promising! Almost all the time is being spent in two C++ functions! The perf viewer makes it easy to jump directly into the disassembly:
Wow! Almost all of our CPU time is being spent on a handful of instructions. In fact, what we’re looking at here is two different inlined copies of the same C++ code:
What you are looking at is a loop that traverses a linked list trying to find the element with a particular ID. We were spending the majority of our CPU time scanning one linked list.
So, what is this code for?
You might be surprised to see the word “thread” in V8, which implements Javascript, a language known for being almost militantly opposed to threads. It turns out, though, that V8 supports “green threads” – simulated threads implemented entirely in userspace, with cooperative switching. Node users can take advantage of this via the node-fibers npm package. This package allows you to avoid Node’s “callback hell” by instead instantiating arbitrarily many call stacks and jumping between them whenever you need to wait for an asynchronous operation. Our code was, in fact, using node-fibers, mostly because we built on Meteor, which uses fibers by default.
The linked list in question implements a map from thread IDs to per-thread data, such as thread-local variables. Among other things, every time the process switches between fibers, the current thread is looked up in this table.
As any fresh CS grad knows, a linked list is not the ideal data structure for a lookup table – you probably want a hashtable, red-black tree, or the like. But as many more experienced engineers know, a linked list can be more efficient than those other structures in cases where the number of elements stays small. V8’s developers, as it turns out, had designed around the assumption of a fixed thread pool never containing more than a handful of threads. But node-fibers – especially as used by Meteor – doesn’t work this way. In Meteor, every concurrent operation gets its own fiber. Once a fiber completes, it is placed in a pool for reuse, but if many fibers are needed simultaneously, the pool can grow to any size. As the pool gets bigger, the linked list gets bigger, which makes fiber-switching slower, which makes the whole process permanently slower.
But our processes weren’t getting slower over time. They were getting suddenly slower all at once. One moment the process is fine, the next it is hosed. Under normal load, our servers were sitting steady at around 100 fibers – nowhere near enough to be a problem. So now we had a new mystery: What was causing these sudden spikes in fiber creation? It was around this time we started referring to the incidents as “fiber bombs”. Alas, our profiles only showed us the after-effects of a bomb having gone off; they told us nothing about how the fibers were created in the first place. So we were back to square one.
Early on the morning of September 1st, the problem became suddenly more urgent: Instead of once a week, the problem started happening approximately once an hour. Like any good production problem, this began just after midnight. After three or so iterations of “get paged, wake up, restart the process, go back to sleep”, I grudgingly accepted that this could not wait until the morning. By about 5AM I had hot-patched our servers to monitor their own fiber counts and kill themselves whenever the number went over 1000 or so. In the process, I observed that a typical “fiber bomb” created anywhere from 5,000 to 20,000 fibers – all at once.
Still, the root cause was a mystery. With the servers now managing their own restarts and the pager quieting down, I crawled back into bed.
The spikes continued to happen approximately once an hour from then on. This was actually wonderful: it meant I could now iterate on the problem 150x faster than I could before! I began manually instrumenting the codebase with a sort of poor-man’s sampling profiler that specifically sampled fiber creation, and specifically did so at times when fiber counts seemed to be spiking. This turned out not as easy as it sounds, as there were many places that would create fibers as a result of some task having been queued previously. At the time of fiber creation, the queue insertion was no longer on the stack. So, I had to instrument the queue inserts too, and so on.
Soon, I made a startling discovery: It turned out that Meteor had monkey-patched the global Promise implementation. Specifically, they had apparently decided that they wanted .then()
callbacks always to run in fibers, for convenience since most Meteor code requires that it be run in a fiber. Thus, they wrote code to intercept calls to .then()
and wrap the callback in another callback that creates a new fiber and runs the original callback inside it.
This might sound basically reasonable at first (it should be “compatible” with standard Promise semantics), but there is a problem: In code that makes heavy, idiomatic use of Promises, it is common to string together a long chain of short .then()
callbacks. As it so happens, Sandstorm itself contains a lot of Promise-based code, especially around communicating with its back-end, which it does using Cap’n Proto. Cap’n Proto’s API makes very heavy use of Promises, and does not expect to run in fibers. Thus, this code which seemingly had nothing to do with fibers was in fact the main creator of fibers in our system, creating massive quantities of totally unnecessary fibers, wasting memory and CPU time.
But even that didn’t actually explain the bombs. The way fibers work, if you start a new fiber that immediately completes, the fiber immediately goes back to the fiber pool. All of our Promise-heavy code operated in asynchronous style, therefore the callbacks would always complete immediately. So while the Promise code was needlessly starting lots of fibers, it should actually have been reusing the same Fiber object over and over again.
But there was one more wrinkle: It turns out that the V8 promise implementation itself sometimes calls .then()
recursively, passing along one callback from one promise to another. In fact, it has to do this to correctly implement the spec. But since .then()
had been monkey-patched, each time the same callback passed through another .then()
call, it received another wrapper layer spawning another fiber. In the end, one callback, when finally called, would start a fiber, which would start another fiber, which would start another fiber, and so on. Since each fiber in this chain was itself responsible for spawning the next, all the fibers would be started before any completed. If one callback managed to be wrapped 20,000 times, then you get 20,000 fibers, all at once.
I patched the Promise monkey-patch such that, after wrapping a callback, it would mark the wrapped callback object with a field like alreadyWrapped = true
. If the same callback came back to be wrapped again, the code would see this marking and avoid double-wrapping.
And just like that, the problem stopped.
Meanwhile, we’ve also filed an issue against V8, requesting that they replace their linked list with a hashtable. This wouldn’t have completely mitigated the fiber bombs, but it would have at least prevented them from permanently crippling the process.