Rust 2020: Lessons learned by transpiling C to Rust

The C2Rust project is all about translating C code into an equivalent, drop-in ABI-compatible Rust implementation. (Read our C2Rust introductory blog post here.) Along the way we’ve uncovered some dark corners of C as it’s written in practice, and found places where Rust can’t quite replicate the same code with the same ABI. This is the story of those dark corners and the areas we think Rust needs to improve to be fully FFI compatible with C.

Background

Rust was designed as a systems programming language that enforces temporal memory safety at compile-time, via a borrow checker that enforces strict ownership rules and limits aliasing for memory allocations and pointers. Additionally, Rust automates memory management using the RAII pattern, which is also enforced at compile-time by calling local objects' destructors at the end of each scope. Conversely, other programming languages solve these problems using garbage collection or reference counting. However, the garbage collector is usually a significant part of the language runtime and its existence influences the design of the rest of the language, including the language’s FFI. Passing GC’ed objects through the FFI to another language, e.g., C, poses significant challenges since the GC must track pointers across languages or risk freeing them prematurely; alternatively, the GC’ed language may disallow passing GC’ed pointers through the FFI completely, like in Go, or shift the burden of memory allocation back to the developer.

Rust’s static enforcement of memory safety (as opposed to dynamic management via garbage collection or reference counting) positions it as a valuable alternative to C or C++ in a lot of use cases. These include situations where:

multiple languages must coexist in a process and communicate over a FFI,
adding a heavyweight language runtime is infeasible, and
the pause times or memory requirements of garbage collection are undesirable.

Rust is one of the few languages that can be used to implement middleware and OS components, with the rewrites optionally being drop-in replacements for the originals. In fact, one of the most frequent questions we get when discussing C2Rust with others is “Have you tried transpiling OpenSSL?” (fortunately, there already exists a binary-compatible rewrite written in Rust). Another good Rust drop-in replacement example is the relibc library, which is a C standard library written in Rust. While Rust can compete with languages like Java, Go, Swift, and Python at writing applications, it is particularly suited for writing libraries and low-level system components, e.g., kernel modules and firmware.

In cases like the above, binaries compiled from Rust are not used in a standalone environment, but linked (either statically or dynamically) as a component of a larger system, which may be written in one or several other languages, e.g., C. This means that the Rust binaries must be ABI-compatible with the C binaries. On Linux, such binaries are compiled with gcc or clang, and linked using the GNU linker, gold or lld. To achieve true ABI parity with gcc, rustc should ideally support every gcc extension to C. In his presentation on Rust and systems programming earlier this year, Josh Triplett named parity with C as a significant factor in systems language adoption, and concluded with a list of current problems and future improvements on this front. In this blog post, we give some real-world examples where Rust is not yet fully compatible with C, some of which already mentioned by Josh in his talk.

Opportunities to improve Rust’s compatibility with C

While converting C code to Rust using C2Rust, we encountered some edge cases where Rust cannot quite replicate C features (or at least gcc’s variant of C). If we want Rust to be an ABI-compatible replacement for C, these are a few of the issues that we need to resolve.

`long double` types

long double types in C are specified as being at least as long as double but are generally implemented as 80-bit floating point values on x86, although implementations are platform-dependent. To be fully ABI-compatible with C code, Rust needs to support long flointing-point types that match the implementations used on supported platforms, i.e., f80 and f128. A Pre-RFC thread was started to discuss alternative floating-point types. Adding these types under std::arch was suggested, which seems like a good path forward.

We ran into long double types in the wild when trying to translate the newlib C library, which can optionally be built with support for the long double type, including in its printf implementation. The best available alternative for long floating-point types is the f128 crate, which internally implements it as an array of bytes and implements all operations in C. However, since long double in most x86 C compilers is an 80-bit floating point internally stored in 128 bits, it is not ABI-compatible with the __float128 type implemented by the f128 crate. This poses problems not only for long double use in variadics (which is how we initially encountered this problem), but also when passing long double values between C and Rust.

GCC extensions

Many gcc extensions to C do not have Rust equivalents. For example, we ran into issues with the following extensions:

Symbol aliases (link to open issue), i.e., __attribute__((alias("foo"))), used by libxml2. This attribute exports the same function or global variable as multiple symbols (which may even have different visibilities). Rust provides #[no_mangle] and #[link_name="..."] which let us rename a global, but no equivalent attribute to export it under a second name.
Packed structures that also have alignment requirements, e.g., xregs_state from the kernel. We opened an issue on this on GitHub, and the discussion is still ongoing.
Aligned globals, e.g., ssemask from ioq3. We could handle these cases by replacing them with aligned structures, but that would not be perfectly equivalent with the original C code. For example, for this C code:

struct Foo {
    char x[5];
};

struct Foo16 {
    char x[5];
} __attribute__((aligned(16)));

struct Foo foo __attribute__((aligned(16))) = { .x = "foo" };
struct Foo16 foo16;

the alignments of variables foo and foo16 are both 16, but their sizes are 5 and 16 respectively.

Static library sizes

Most of the time we want to build a C shared library (cdylib) when building drop-in replacement Rust code. However, when we want to build a static library, the resulting library is huge. For example, building the minimal no_std rust module below, which doesn’t depend on any of the Rust standard library, results in a 1.6M static library!

#![no_std]
extern "C" {
    fn printk(fmt: *const u8, ...);
}

#[no_mangle]
pub unsafe extern "C" fn rb_xcheck(tag: u8, val: u64) {
    printk(b"XCHECK(%u):%lu/%#lx\n\x00" as *const u8, tag as u32, val, val);
}

results in the following crate file sizes:

Crate type	Size
`staticlib`	1.6M
`rlib`	5K
`cdylib`	15K

Because we needed to embed this code inside a kernel module, we couldn’t use cdylib (kernel modules are .ko files which are object files, the kernel does not support loading shared libraries). rlib is built with the assumption that it will be linked into another Rust target, so we decided to avoid it. The staticlib output was what we really wanted, but it was far larger.

Incidentally, cargo produces an ELF archive when specifying crate-type = "staticlib", but certain build systems like the kernel’s only accept object files. We were able to work around this by linking a relocatable object file using ld -r (we have another blog post on kernel modules coming soon).

Moving forward

Summarizing our findings and our experiences with low-level Rust, here are a few ideas for the directions Rust could go in 2020:

Compatibility with GCC: A few of the current edge cases were covered above. Potential improvements include support for more types like long double (maybe by exposing all LLVM types to Rust somehow). Additionally, there are many GCC attributes that Rust could support.
Linking: We have a few ideas for small improvements to Rust linking:
- Embedding Rust code into certain build systems would be easier if Cargo produced object files instead of libraries, e.g., by adding a new crate-type = "object" crate type.
- As noted above, the output archive for staticlib can be large even for small inputs, and reducing its size would make a difference in contexts where the final file is not a linked binary.
Inline assembly: Currently Rust inline assembly maps very closely to LLVM’s, which is a different format from gcc’s, so we have had to work around mismatches. We are looking forward to Rust getting stable support for inline assembly someday. Care needs to be taken in this process so that existing inline assembly can be rewritten into semantically equivalent assembly in whatever syntax is settled upon.

Conclusion

Right now, Rust covers almost everything one might want to do in C. While the issues we discuss here are fairly minor, they are holding back full feature parity and replacement of real-world C software. We’ve been able to transpile most C code to FFI-compatible, equivalent Rust: lua, nginx and zstd currently transpile without any changes, and ioq3 only requires a single small change in the Rust output to run (the ssemask issue illustrated above). We hope that as Rust matures, we can resolve these edge cases standing in the way of full compatibility between C and Rust.