7 minutes
C2Rust is Back
By Frances Wingertertl;dr: c2rust (a transpiler from C to unsafe Rust) is once more being actively developed, now works with recent nightlies, has some new features and bugfixes, and has dropped the c2rust-refactor tool in preparation for an exciting new approach to generating safe rust. As always, we welcome new users and are eager for any feedback!
After a long while without much time or funding to dedicate to the project, c2rust is once more being updated and actively maintained by Immunant in collaboration with our friends at Galois.
The code had bitrotted a fair bit since 2019, so the first order of business was modernizing dependencies and compatibility.
c2rust used rustc
’s internal representation of the Rust language AST (libsyntax
) to build and process the Rust code it generated, which
meant that we depended on the compiler’s internal libraries and could not simply get and version our AST representation via Cargo like our other dependencies.
Every time we wanted to support building with a newer rustc
nightly, we had to do whatever forward-porting work was needed throughout the codebase.
This meant we couldn’t get unrelated rustc
improvements (shortened compile times, bug fixes, etc.) without committing to a bunch of unrelated code churn.
To break this frustrating linkage, we’ve ported the c2rust transpiler from libsyntax
to syn
so that we no longer need to depend on rustc
internals just to emit Rust code.
Unlike the transpiler, the c2rust refactoring tool does more than just emitting Rust code, and relies quite deeply on a bunch of rustc
internals from AST to HIR and MIR.
While looking at forward-porting the refactoring tool, we concluded that its approach
isn’t quite the right one.
It worked for some quite useful but relatively shallow cleanups, but for more interesting rewrites from unsafe to safe code,
it’s necessary to do more analysis of the programs being rewritten.
We’ve started work on a successor to the rewriting tool that does this analysis and will hopefully enable these more involved transformations.
In the meantime, we’ve disabled the old refactoring tool.
In addition to its deep usage of lots of rustc
internals, it wasn’t clear that the community was making wide use of it relative to the transpiler.
This frees us of some maintenance burden and lets the rest of c2rust evolve more quickly while we work on the successor to the rewriting tool.
What does this get us?
-
It’s much easier to install c2rust…
c2rust now builds with
rustc
1.58 stable, nightly-2021-11-221, or any later nightly! This makes it a lot easier to install c2rust: justcargo install c2rust
(Compiling transpiled code still requires a nightly compiler because the transpilation output relies on some nightly features, so you’ll have to install a nightly and
cargo +nightly build
your transpiled codebase.) -
…including on Macs with Apple Silicon:
This means we can now support folks with M1 Macs, as the previous nightly that c2rust required was so old that
aarch64-apple-darwin
was not yet a Tier-2 platform! -
Support for
asm!
and non-x86 asm translationc2rust previously translated inline assembly to the old (pre-2020)
asm!
macro, which was later renamed tollvm_asm!
, and then removed entirely. As part of the forward-porting effort, we’ve updated our assembly translation to generate the new native, stable (as of early 2022)asm!
macro.Along the way, we did a fair bit of work to improve support for translation of inline assembly for non-x86 architectures. Rust supports four major architectures:
aarch64
,arm
,riscv
, andx86
[_64
]. All of these now have operand modifiers and machine constraints translated into their Rust-native equivalents. That said, this support is very fresh and would benefit from being exercised by more testers. Anyone with a low-level C codebase targeting one of these architectures is encouraged to throw it at c2rust and let us know how it goes. The intersection of architectures for which we have inline assembly in our existing test suite (which is mostly older open-source C code where inline asm targets x86, Sparc, MIPS, etc.) with the more recent set of architectures that Rust fully supports for inline asm leaves us with only a few test cases, so any additional validation would be greatly appreciated. -
Loads of miscellaneous fixes
This release also contains all the minor fixes and new features we’ve added since the last release. Since then, we’ve notably gained:
- Correct translation of structs whose canonical declaration did not mark them as packed but another declaration did. This fixes many users of
epoll
. - Newer LLVM versions are supported, including 10, 11, 12, and 13.
- Packed
union
s are now translated correctly. va_list
values are supported even as struct fields, which is a pattern that shows up in Apache HTTPd.
- Correct translation of structs whose canonical declaration did not mark them as packed but another declaration did. This fixes many users of
-
A few gotchas
Unfortunately not everything is good news. c2rust has lost some functionality in the course of the forward port, and, while there’s no reason things can’t be reimplemented, in the meantime the following features have been removed:
-
Support for MMX
SIMD (single-instruction, multiple data) instruction support is critical for many performance-oriented codebases that C and Rust get used for, but one particular instruction set extension has fallen out of popularity: MMX. MMX extended the x86 architecture in the late ’90s, helping to accelerate multimedia/math workloads of the era, but has nowadays been superseded multiple times and is now all but obsolete–upstream Rust removed support for it in late 2019. As a result, forward-porting c2rust meant we also dropped our corresponding support. All other SIMD extensions that used to work still do, of course. If you have a specific need for MMX support, please let us know.
-
Translation of comments in C (temporarily)
The way we handle construction and printing of the final Rust source code AST is substantially different with
syn
+prettyplease
compared tolibsyntax
, and it’s more difficult to reinsert comments with the new representation. There are fewer fewer locations at which comments can be placed, and identifying the right place to reinsert a given comment requires work that hasn’t been finished yet. We intend to restore comment translation, but for now comments in C source code are dropped before Rust is generated. This regression is tracked here. -
The refactoring tool and cross-checks
As noted above, the refactoring tool has been dropped for now, and we’re working on an exciting new tool for safening the output of c2rust. Expect news in this space at some point later this year!
Cross-checks have also been dropped. It’s unlikely you’ve heard of these, as they were a mostly-internal tool for validating translations of C programs by instrumenting the original program and inserting analogous instrumentation into translated programs. This allowed intermediate computations to be compared as programs ran, giving more detailed feedback than “the original works fine, but the transpiled program crashes”. Cross-checks were very useful while prototyping the transpiler, but they worked by inserting checking code into
rustc
’s internal representation of the program, which meant that it relied heavily onrustc
internals and would impose a significant maintenance burden to forward-port. Cross-checks were also a very finicky tool, as they observed program execution at a very fine-grained level, meaning that many of the observed values were not reproducible even across two runs of the same program, much less between separate runs of a C program and its transpiled Rust version.
-
Finally
If you’re interested in working on projects like this, building Rust tools for making the computer world safer, consider applying for a job at Immunant.
I’d like to give special thanks to @chrysn, who has contributed a large number of bug fixes and improvements in the course of their use of c2rust, many of which are present in this release, and to everyone who has contributed code or issues to the project.
Distribution Statement “A” Approved for Public Release, Distribution Unlimited.
-
The effort to allow c2rust to build on a newer toolchain has been underway since late 2021; this post was cleared for publication by DARPA on Jun 8th, 2022. ↩︎