The Rust-loving team at Immunant has been hard at work on C2Rust, a migration framework that takes the drudgery out of migrating to Rust. Our goal is to make safety improvements to the translated Rust automatically where we can, and help the programmer do the same where we cannot. First, however, we have to build a rock-solid translator that gets people up and running in Rust. Testing on small CLI programs gets old eventually, so we decided to try translating Quake 3 into Rust. After a couple of days, we were likely the first people to ever play Quake3 in Rust!

Setting the stage: Quake 3 sources

After looking at the original Quake 3 source code and various forks, we settled on ioquake3. It is a community fork of Quake 3 that is still maintained and builds on modern platforms.

As a starting point, we made sure we could build the project as is:

$ make release

The ioquake3 build produces a few different libraries and executables:

$ tree --prune -I missionpack -P "*.so|*x86_64"
.
└── build
    └── debug-linux-x86_64
        ├── baseq3
        │   ├── cgamex86_64.so          # client
        │   ├── qagamex86_64.so         # game server
        │   └── uix86_64.so             # ui
        ├── ioq3ded.x86_64              # dedicated server binary
        ├── ioquake3.x86_64             # main binary
        ├── renderer_opengl1_x86_64.so  # opengl1 renderer
        └── renderer_opengl2_x86_64.so  # opengl2 renderer

Of these libraries, the UI, client, and server libraries can be built as either Quake VM assembly or native X86 shared libraries. We opted to use the native versions of these libraries for our project. Translating just the VM into Rust and using the QVM versions would have been significantly simpler but we wanted to thoroughly test out C2Rust.

We focused on the UI, game, client, OpenGL1 renderer and main binary for our translation. It would be possible to translate the OpenGL2 renderer as well, but we chose to skip it as it makes significant use of .glsl shader files which the build system embeds as literal strings in C source code. While we could add custom build script support for embedding the GLSL code into Rust strings after we transpile, there’s not a good automatic way to transpile these autogenerated, temporary files¹. We instead just translated the OpenGL1 renderer library and forced the game to use it instead of the default renderer. Finally, we decided to skip the dedicated server and mission pack files, as they wouldn’t be hard to translate but were also not necessary for our demonstration.

Transpiling Quake 3

To preserve the directory structure used by Quake 3 and not need to change its source code, we needed to produce exactly the same binaries as the native build, meaning four shared libraries and one executable. Since C2Rust produces Cargo build files, each binary needs its own Rust crate with a corresponding Cargo.toml file. For C2Rust to produce one crate per output binary, it would need a list of the binaries along with their corresponding object or source files, and linker invocation used to produce each binary (used to determine other details like library dependencies).

However, we quickly ran into one limitation with the way C2Rust intercepts the native build process: C2Rust takes a compilation database file as an input, which contains a list of compilation commands executed during the build. However, this database only contains compilation commands, and not any linker invocations. Most tools that produce this database have this intentional limitation, e.g., cmake with CMAKE_EXPORT_COMPILE_COMMANDS, bear and compiledb. To our knowledge, the only tool that does include linking commands is build-logger from CodeChecker, which we didn’t use because we only learned about it after writing our own wrappers (described below). This meant that we couldn’t use a compile_commands.json file produced by any of the common tools to transpile a multi-binary C program.

Instead, we wrote our own compiler and linker wrapper scripts that dump out all compiler and linker invocations to a database, and then convert that into an extended compile_commands.json. Instead of the normal build using a command like:

$ make release

we add wrappers to intercept the build using:

$ make release CC=/path/to/C2Rust/scripts/cc-wrappers/cc

The wrappers produce a directory full of JSON files, one per invocation. A second script aggregates all of them into a new compile_commands.json file that contains both compilation and linking commands. We then extended C2Rust to read the linking commands from the database, and produce a separate crate per linked binary. Additionally, C2Rust now also reads the library dependencies of each binary and automatically adds them to that crate’s build.rs file.

As a quality of life improvement, all of the binaries can be built all at once by having them within a workspace. C2Rust produces a top-level workspace Cargo.toml file, so we can build the project with a single cargo build command in the quake3-rs directory:

$ tree -L 1
.
├── Cargo.lock
├── Cargo.toml
├── cgamex86_64
├── ioquake3
├── qagamex86_64
├── renderer_opengl1_x86_64
├── rust-toolchain
└── uix86_64

$ cargo build --release

Fixing a few Papercuts

When we first tried to build the translated code, we hit a couple of issues with the Quake 3 sources, hitting corner cases that C2Rust couldn’t handle (correctly or at all).

Pointers to Arrays

In a few places, the original source code contains expressions that point one past the last element of an array. Here is a simplified example of the C code:

int array[1024];
int *p;

// ...

if (p >= &array[1024]) {
   // error...
}

The C standard (see e.g. C11, Section 6.5.6) allows pointers to an element one past the end of the array. However, Rust forbids this, even if we are only taking the address of the element. We found examples of this pattern in the AAS_TraceClientBBox function.

The Rust compiler also flagged a similar but actually buggy example in G_TryPushingEntity where the conditional is >, not >=. The out of bounds pointer was then dereferenced after the conditional, which is an actual memory safety bug.

To avoid this issue in the future, we fixed the C2Rust transpiler to use pointer arithmetic to calculate the address of an array element instead of using an array indexing operation. With this fix, code that uses this “address of element past the array end” pattern will now correctly translate and run with no modifications necessary.

Flexible Array Members

We started up a game to test things out and immediately got a panic from Rust:

thread 'main' panicked at 'index out of bounds: the len is 4 but the index is 4', quake3-client/src/cm_polylib.rs:973:17

Taking a look at cm_polylib.c, we noticed that it was dereferencing the p field in the following struct:

typedef struct
{
	int		numpoints;
	vec3_t	p[4];		// variable sized
} winding_t;

The p field in this struct is a pre-C99 non-compliant version of a “flexible array member” which is still accepted by gcc. C2Rust recognizes flexible array members with the C99 syntax (vec3_t p[]) and implements simple heuristics to also detect some pre-C99 versions of this pattern (0- and 1-sized arrays at the end of structures; we also found a few of those in the ioquake3 source code).

Changing the above struct to C99 syntax fixed the panic:

typedef struct
{
	int		numpoints;
	vec3_t	p[];		// variable sized
} winding_t;

Trying to automatically fix this pattern in the general case (arrays of sizes other that 0 or 1) would be extremely difficult, since we would have to distinguish between regular arrays and flexible array members of arbitrary sizes. Instead, we recommend that the original C code is fixed manually — just like we did for ioquake3.

Tied Operands in Inline Assembly

Another source of crashes was this C inline assembly code from the /usr/include/bits/select.h system header:

# define __FD_ZERO(fdsp)                                            \
  do {                                                              \
    int __d0, __d1;                                                 \
    __asm__ __volatile__ ("cld; rep; " __FD_ZERO_STOS               \
                          : "=c" (__d0), "=D" (__d1)                \
                          : "a" (0), "0" (sizeof (fd_set)           \
                                          / sizeof (__fd_mask)),    \
                            "1" (&__FDS_BITS (fdsp)[0])             \
                          : "memory");                              \
  } while (0)

which defines the internal version of the __FD_ZERO macro. This definition hits a rare corner case of gcc inline assembly: tied input/output operands with different sizes. The "=D" (__d1) output operand binds the edi register to the __d1 variable as a 32-bit value, while "1" (&__FDS_BITS (fdsp)[0]) binds the same register to the address of fdsp->fds_bits as a 64-bit pointer. gcc and clang fix this mismatch by using the 64-bit register rdi instead and then truncating its value before the assignment to __d1, while Rust defaults to LLVM’s semantics which leave this case undefined. What we saw happening for debug builds (but not release builds, which behaved correctly) was that both operands would be assigned to the edi register, causing the pointer to be truncated to 32 bits before the inline assembly, which would cause crashes.

Since rustc passes Rust inline assembly to LLVM with very few changes, we decided to fix this particular case in C2Rust. We implemented a new c2rust-asm-casts crate that fixes the issue above via the Rust type system using a trait and some helper functions that automatically extend and truncate the values of tied operands to an internal size that is large enough to hold both operands. The code above correctly transpiles to the following:

let mut __d0: c_int = 0;
let mut __d1: c_int = 0;
// Reference to the output value of the first operand
let fresh5 = &mut __d0;
// The internal storage for the first tied operand
let fresh6;
// Reference to the output value of the second operand
let fresh7 = &mut __d1;
// The internal storage for the second tied operand
let fresh8;
// Input value of the first operand
let fresh9 = (::std::mem::size_of::<fd_set>() as c_ulong).wrapping_div(::std::mem::size_of::<__fd_mask>() as c_ulong);
// Input value of the second operand
let fresh10 = &mut *fdset.__fds_bits.as_mut_ptr().offset(0) as *mut __fd_mask;
asm!("cld; rep; stosq"
     : "={cx}" (fresh6), "={di}" (fresh8)
     : "{ax}" (0),
       // Cast the input operands into the internal storage type
       // with optional zero- or sign-extension
       "0" (AsmCast::cast_in(fresh5, fresh9)),
       "1" (AsmCast::cast_in(fresh7, fresh10))
     : "memory"
     : "volatile");
// Cast the operands out (types are inferred) with truncation
AsmCast::cast_out(fresh5, fresh9, fresh6);
AsmCast::cast_out(fresh7, fresh10, fresh8);

Note that the code above does not require the types for any input or output values in the assembly statement, relying instead on Rust’s type inference to resolve those types (mainly the types of fresh6 and fresh8 above).

Aligned Global Variables

The final source of crashes we encountered was the following global variable that stores a SSE constant:

static unsigned char ssemask[16] __attribute__((aligned(16))) =
{
	"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00\x00\x00\x00"
};

Rust currently supports the alignment attribute on structure types, but not on global variables, i.e., static items. We are looking into ways to solve this in the general case in either Rust or C2Rust, but have decided to fix this issue manually for ioquake3 with a short patch file for now. This patch file replaces the Rust equivalent of ssemask with:

#[repr(C, align(16))]
struct SseMask([u8; 16]);
static mut ssemask: SseMask = SseMask([
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0,
]);

Running quake3-rs

Running cargo build --release emits the binaries, but they are all emitted under target/release using a directory structure that the ioquake3 binary does not recognize. We wrote a script that creates symbolic links in the current directory to replicate the correct directory structure (including links to the .pk3 files containing the game assets):

$ /path/to/make_quake3_rs_links.sh /path/to/quake3-rs/target/release /path/to/paks

The /path/to/paks path should point to a directory containing the .pk3 files.

Now let’s run the game! We need to pass +set vm_game 0, etc., so that we load these modules as Rust shared libraries instead of QVM assembly, and cl_renderer to use the OpenGL1 renderer.

$ ./ioquake3 +set sv_pure 0 +set vm_game 0 +set vm_cgame 0 +set vm_ui 0 +set cl_renderer "opengl1"

And…

Image of Quake3 console startup running in Rust

We have Quake3 running in Rust!

Here is a video of us transpiling Quake 3, loading the game and playing for a bit:

You may browse the transpiled sources in the transpiled branch of our repository. We also provide the refactored branch containing the same sources with some refactoring commands pre-applied.

Transpiling Instructions

If you want to try translating Quake 3 and run it yourself, please be aware that you will need to own the original Quake 3 game assets or download the demo assets from the web. You’ll also need to install C2Rust (the required Rust nightly version at the time of writing is nightly-2019-12-05, but we recommend you check the C2Rust repository or crates.io for the latest one):

$ cargo +nightly-2019-12-05 install c2rust

and copies of our C2Rust and ioquake3 repositories:

$ git clone [email protected]:immunant/c2rust.git
$ git clone [email protected]:immunant/ioq3.git

As an alternative to installing c2rust with the command above, you may build C2Rust manually using cargo build --release. In either case, the C2Rust repository is still required as it contains the compiler wrapper scripts that are required to transpile ioquake3.

We provide a script that automatically transpiles the C code and applies the ssemask patch. To use it, run the following command from the top level of the ioq3 repository:

$ ./transpile.sh </path/to/C2Rust repository> </path/to/c2rust binary>

This command should produce a quake3-rs subdirectory containing the Rust code, where you can subsequently run cargo build --release and the rest of the steps described earlier.

What’s Next?

As we continue to develop C2Rust, we’d love to hear what you want to see translated next. Drop us a line at [email protected] and let us know! If you have legacy C code you need modernized and translated, the team here at Immunant is here to help. We are available for consulting and contracting engagements ranging from one-time support to full-service code modernization.

Thanks to David Dubois for correcting our understanding of the GLSL build process. ↩︎

Translating Quake 3 into Rust