Private Keys, Dragons, and C++: Part 1

LLVM Wyvern, used with permission from llvm.org


Managing the lifetime of private keys in C++ is a difficult task. From memory management to compiler optimizations and undefined behavior, everywhere you turn, there are dragons waiting to break the security of even the most well-written standards-conforming code, and steal the keys to your castle. In this series, I describe common pitfalls when securing data while resident in memory and the industry-standard strategies for mitigating them.

In the first article of this series, we’ll examine an example from Monero. In Monero, one of the ways private keys are represented is with the following class hierarchy:

namespace crypto {
  struct ec_scalar {
    char data[32];
  };

  struct secret_key : ec_scalar {};

  struct public_key : ec_scalar {};
}

These POD or “Plain Old Data” structs are used all over the codebase as stack variables. A good representative example is where Monero’s wallet derives the private key from the user’s Electrum recovery sentence:

bool WalletImpl::recover(const std::string &path, const std::string &seed)
{
    clearStatus();
    m_errorString.clear();
    if (seed.empty()) {
        m_errorString = "Electrum seed is empty";
        LOG_ERROR(m_errorString);
        m_status = Status_Error;
        return false;
    }

    m_recoveringFromSeed = true;
    crypto::secret_key recovery_key;
    std::string old_language;
    if (!crypto::ElectrumWords::words_to_bytes(seed, recovery_key, old_language)) {
        m_errorString = "Electrum-style word list failed verification";
        m_status = Status_Error;
        return false;
    }

    try {
        m_wallet->set_seed_language(old_language);
        m_wallet->generate(path, "", recovery_key, true, false);

    } catch (const std::exception &e) {
        m_status = Status_Critical;
        m_errorString = e.what();
    }
    return m_status == Status_Ok;
}

Here the code does some error checking, converts the recovery sentence to binary form stored in our crypto::secret_key, and attempts to generate the wallet from that. If generation fails and an exception is thrown, it is handled and the error is logged.

From a key security standpoint, there are a few different areas where this code leaks the data of the private keys, in a way that other vulnerabilities might be able to take advantage of it. Mind you, I’m only describing best practices here, not an actual exploited vulnerability, so don’t get all doom and gloom on me.

The main leakages come from stale data on the stack. This happens as a result of C++’s mantra that one only pays in performance for the features they use. In this case, that relates to the fact that most C++ programs are not so hypersensitive to security issues, as those that handle financial transactions, like cryptocurrency wallets. Optimizing compilers, on the other hand, are very sensitive to performance concerns, with their goal being to make standards-comforming code as fast as possible. The difference in goals here means that the programmer has to take special care to manage the lifetime of their data.

The upside however, is that these are solvable problems. Let’s talk about the stack. Distilling the above example down, we have this roughly equivalent snippet:

bool words_to_bytes(const std::string &seed,
                    crypto::secret_key *key);

bool generate(crypto::secret_key *key);

bool recover(const std::string &seed) {
  crypto::secret_key recovery_key;

  if (words_to_bytes(seed, &recovery_key))
    return FAIL;

  if (generate(&recovery_key))
    return FAIL;

  return SUCCESS;
}

Consider what the stack might look like just after the words_to_bytes() call:

      +------------------------------+
sp -> |   return val  = false        |           words_to_bytes's frame
      +------------------------------+
      |   recovery_key               |           recover's frame
      |     data[0]  = 's'           |
      |     data[1]  = 'e'           |
      |     data[2]  = 'k'           |
      |     data[3]  = 'r'           |
      |     data[4]  = 'i'           |
      |     data[5]  = 't'           |
      |     data[6]  = ' '           |
      |     data[7]  = 'i'           |
      |     data[8]  = 's'           |
      |     ...                      |
      |     data[30] = '4'           |
      |     data[31] = '2'           |      heap
fp -> |   return addr = &caller + 24 |       ^
      |   seed = -------------------------\  |
      +------------------------------+    |  |
      |   seed                       |<---/  |   caller's frame
      |     ptr = ---------------------------/
      |     len = 37                 |
      |   return addr = ...          |
      |   ...                        |

If generate() fails, and we return to recover()s caller, the secret data in recovery_key will be left as garbage on the stack. This is undesriable because it leaves the data exposed to exploits that inspect the stack. Ideally, the sensitive data in the keys should be resident in memory for precisely as long as it is needed, and no longer.

A similar issue arises in the original example in the thrown path. There, the unwinder walks down the stack, perforing any cleanups along the way (i.e. destructors), leaving everything else untouched for performance reasons. Since, as platform maintainers, we want our user’s programs to be quick and responsive, this is usually a good tradeoff.

Luckily, we’re writing c++, and we can write a destructor that scrubs the key’s memory on scope exit, or during one of those unwinder cleanups. Doing so in C is still possible, but takes a bit more manual management of the data. Our first attempt might look something like:

namespace crypto {
  struct ec_scalar {
    char data[32];
  };

  struct secret_key : ec_scalar {
    ~secret_key() {
      memset(data, sizeof(data));
    }
  };
}

Unfortunately, the compiler decides to be “helpful” and optimize out the memset. Careful readers will have noticed that I had to defeat the optimizer once already, in order to keep it from optimizing doit() all the way to a nop. The trick there was to call through a volatile function pointer. Since volatile acts as a sort of optimization barrier as far as the compiler is concerned, it keeps the dead code elimination pass from touching our call. In other words, volatile scares off dragons. If we use the same trick again, it forces the compiler to keep the memset call.

This pattern is needed often enough that the standards body for C11 added it under the name memset_s. On BSD platforms, there is explicit_bzero, and likewise Windows has SecureZeroMemory. They’re all written with slightly different tricks, but all with the purpose of forcing the optimizer to leave seemingly dead code alone.

As with all such dragons’ dens, tread carefully and ask the advice of your local compiler expert before writing such code. It’s really tricky to get right, and there are many traps you can fall into. In the second article in this series, I’ll tell you all about page tables and memory being written to disk behind your back.

Stay tuned ‘till next time.