Private Keys, Dragons, and C++: Part 1
LLVM Wyvern, used with permission from llvm.org
Managing the lifetime of private keys in C++ is a difficult task. From memory
management to compiler optimizations and undefined behavior, everywhere you
turn, there are dragons waiting to break the security of even the most
well-written standards-conforming code, and steal the keys to your castle. In
this series, I describe common pitfalls when securing data while resident in
memory and the industry-standard strategies for mitigating them.
In the first article of this series, we’ll examine an example from Monero. In Monero, one of the ways private keys are represented is with the following class hierarchy:
These POD or “Plain Old Data” structs are used all over the codebase as stack variables. A good representative example is where Monero’s wallet derives the private key from the user’s Electrum recovery sentence:
Here the code does some error checking, converts the recovery sentence to
binary form stored in our crypto::secret_key
, and attempts to generate
the wallet from that. If generation fails and an exception is thrown, it is
handled and the error is logged.
From a key security standpoint, there are a few different areas where this code leaks the data of the private keys, in a way that other vulnerabilities might be able to take advantage of it. Mind you, I’m only describing best practices here, not an actual exploited vulnerability, so don’t get all doom and gloom on me.
The main leakages come from stale data on the stack. This happens as a result of C++’s mantra that one only pays in performance for the features they use. In this case, that relates to the fact that most C++ programs are not so hypersensitive to security issues, as those that handle financial transactions, like cryptocurrency wallets. Optimizing compilers, on the other hand, are very sensitive to performance concerns, with their goal being to make standards-comforming code as fast as possible. The difference in goals here means that the programmer has to take special care to manage the lifetime of their data.
The upside however, is that these are solvable problems. Let’s talk about the stack. Distilling the above example down, we have this roughly equivalent snippet:
Consider what the stack might look like just after the words_to_bytes()
call:
If generate()
fails, and we return to recover()
s caller, the secret data in
recovery_key
will be left as garbage on the stack. This is undesriable
because it leaves the data exposed to exploits that inspect the stack. Ideally,
the sensitive data in the keys should be resident in memory for precisely as
long as it is needed, and no longer.
A similar issue arises in the original example in the thrown path. There, the unwinder walks down the stack, perforing any cleanups along the way (i.e. destructors), leaving everything else untouched for performance reasons. Since, as platform maintainers, we want our user’s programs to be quick and responsive, this is usually a good tradeoff.
Luckily, we’re writing c++, and we can write a destructor that scrubs the key’s memory on scope exit, or during one of those unwinder cleanups. Doing so in C is still possible, but takes a bit more manual management of the data. Our first attempt might look something like:
Unfortunately, the compiler decides to be “helpful” and
optimize out the memset. Careful readers will
have noticed that I had to defeat the optimizer once already, in order to keep it
from optimizing doit()
all the way to a nop. The trick there was to call
through a volatile function pointer. Since volatile acts as a sort of
optimization barrier as far as the compiler is concerned, it keeps the dead
code elimination pass from touching our call. In other words, volatile scares
off dragons. If we use the same trick again, it forces the compiler to
keep the memset call.
This pattern is needed often enough that the standards body for C11 added it
under the name memset_s
. On BSD platforms, there is explicit_bzero
, and
likewise Windows has SecureZeroMemory
. They’re all written with slightly
different tricks, but all with the purpose of forcing the optimizer to leave
seemingly dead code alone.
As with all such dragons’ dens, tread carefully and ask the advice of your local compiler expert before writing such code. It’s really tricky to get right, and there are many traps you can fall into. In the second article in this series, I’ll tell you all about page tables and memory being written to disk behind your back.
Stay tuned ‘till next time.