Improved C Variadics in Rust and C2Rust

Introduction to C Variadics

The C language provides a special class of functions called variadic functions that can be called with a variable number of arguments. The declaration of a variadic function ends with an ellipsis, e.g.:

void variadic_function(int x, ...);

Variadic functions can be called with any number of arguments in place of the ellipsis (including none at all).

The C runtime provides a set of helper macros that developers use to retrieve the values of the variadic arguments. All the macros, along with the va_list type, are defined in the stdarg.h header and most commonly implemented as compiler builtins:

va_start(ap, last) initializes the variable ap of type va_list to the next argument of the current function following last, where last is the name of one of the non-variadic arguments of the function. last most commonly refers to the argument immediately before the ellipsis, so ap is effectively initialized with the address of the first variadic argument.
va_arg(ap, type) returns the value of the next argument from ap, under the assumption that the type of the argument at the variadic function’s call site is type.
va_copy(aq, ap) copies the current state of ap into aq. Both arguments have type va_list.
va_end(ap) frees all resources used by ap. Each va_start and va_copy call must be matched by exactly one va_end call in the same function.

For example, this is a very common implementation of the printf function:

int printf(const char *fmt, ...) {
    int res;
    va_list ap;
    va_start(ap, fmt);
    res = vprintf(fmt, ap);
    va_end(ap);
    return res;
}

The signature of the vprintf function is

int vprintf(const char *fmt, va_list ap);

and it internally calls va_arg to retrieve the values of the arguments passed to printf.

C Variadics in Rust

Our goal in the C2Rust project is to translate any valid C99 program into equivalent Rust code. Naturally, this means we need to properly support translating C variadic functions into Rust.

For a long time, the Rust-C FFI only allowed one-way calls to C variadic functions: Rust code could call C variadic functions, but not the other way around. For example, the printf function could be declared and called from Rust as:

extern "C" {
    fn printf(fmt: *const c_char, ...) -> c_int;
}

but such a function could not be implemented in Rust.

Rust RFC 2137 proposed an interface for Rust code to provide C-compatible variadic functions, which was later implemented as a series of patches by Dan Robertson that have been merged into nightly Rust from November 2018 to February 2019.

The new interface provides a new VaList that is compatible with C’s va_list and which implements the following interface (simplified for brevity):

impl VaList {
    /// Rust equivalent of `va_arg`, extracts the next argument of type `T`.
    pub unsafe fn arg<T>(&mut self) -> T;

    /// Calls function `f` with a copy of `self`, constructed with `va_copy`
    /// and safely destroyed using `va_end`.
    pub unsafe fn with_copy<F, R>(&self, f: F) -> R
        where F: FnOnce(VaList) -> R;
}

These methods provide a safer alternative to their C counterparts (but still marked unsafe, since arg still performs a form of type punning), guaranteeing that every call to va_start and va_copy has a matching va_end.

C variadic functions defined in Rust use a special syntax: the variadic arguments are defined with a special ellipsis type, which the compiler internally transforms into a VaList type and automatically calls va_start and va_end for that parameter, e.g.:

pub unsafe extern "C" variadic_function(mut ap: ...) {
    // rustc calls `va_start(ap)` internally here

    // Print the first argument as a `u32`
    println!("{}", ap.arg::<u32>());

    // rustc now calls `va_end(ap)` automatically
}

Implementing `Clone` for `VaList`

While the VaList API above does not directly match C’s macros, it provides a sufficient interface for the C2Rust transpiler to support conversion of C va_start, va_arg and va_end calls. However, some uses of va_copy cannot be translated to with_copy, such as

va_list ap1, ap2;
va_copy(ap1, ap);
if (condition)
    va_copy(ap2, ap);
// ...other code that uses ap1 and ap2...
va_end(ap1);
// ...more code that uses ap2...
if (same_condition)
    va_end(ap2);

and

va_list aq, ap1, ap2;
// ...initialize ap1 and ap2...
if (condition) {
    va_copy(aq, ap1);
} else {
    va_copy(aq, ap2);
}
// ...code that uses aq...
va_end(aq);

In the first example, the problem is that the lifetimes of the ap1 and ap2 variables overlap, but neither includes the other. Therefore, we cannot replace each va_copy call with with_copy while also maintaining the order of variadics operations (we could move va_end(ap1); after the last if statement, but that might change the behavior of the entire program). In the second example, the aq copy is initialized in one of the branches of the if statement, but needs to escape the statement and live until the corresponding va_end call which is outside the statement. If we were to place the with_copy call inside the if, the latest its internal scope could end (and implicitly call va_end) would be at the end of each branch.

For a real-world example of C code that cannot be trivially transformed to a with_copy call, see the vasprintf function in the Julia language implementation.

The underlying issue is that with_copy creates a new scope which the copy lives and is destroyed in at the end, and assumes that all other uses of the copy can be cleanly moved into this scope. Our examples show that this is not always simple or even possible at all.

To solve this problem, we submitted a Rust language pull request with an extension to this interface that would expose a Rust version of va_copy. After several redesigns based on discussions and suggestions from Rust language team, we settled on a final version of the interface with the following changes (based on a design proposed by Rust compiler team member eddyb):

Split the previous VaList into two structures: an internal VaListImpl used by the compiler as the backing structure for the ellipsis argument, and VaList used as the public C-compatible interface for the former.
Implement the Clone trait which copies a given VaListImpl and returns the copy.

The VaList split brings Rust’s data structures closer to their C equivalents, since on some architectures va_list is defined, e.g., by clang, as

typedef struct __va_list_tag va_list[1];

Due to C’s implicit array-to-pointer decay, va_list decays to the struct __va_list_tag* pointer type when used in function signatures, but remains a single-element array (with the same size as the structure itself) when used to declare a local variable. To preserve this distinction in the Rust interface, we refactored VaList to match the pointer version of va_list, and added VaListImpl as an equivalent for the __va_list_tag structure.

The new interface for the two structures is (simplified once again):

impl VaListImpl {
    /// Explicitly convert `VaListImpl` -> `VaList` for callees.
    pub fn as_va_list(&mut self) -> VaList;

    pub unsafe fn arg<T>(&mut self) -> T;

    /// Calls function `f` with a copy of `self`, constructed with `va_copy`
    /// and safely destroyed using `va_end`.
    pub unsafe fn with_copy<F, R>(&self, f: F) -> R
      where F: FnOnce(VaList) -> R;
}

impl Deref for VaList {
    /// Deref-coercion for `VaList`, so it can be used in lieu of a `VaListImpl`.
    fn deref(&self) -> &VaListImpl;
}

/// `DerefMut` implementation for `arg`
impl DerefMut for VaList { ... }

impl Clone for VaListImpl {
    /// Copy `self` into a new `VaListImpl` and return it.
    fn clone(&self) -> Self;
}

We omitted one interesting and significant detail from the interface for brevity: the actual types are not VaListImpl and VaList, but VaListImpl<'f> and VaList<'a, 'f>. The 'a lifetime has a simple purpose: since VaList is internally just a &'a mut VaListImpl reference to its backing VaListImpl, 'a is the lifetime of that reference. On the other hand, 'f has a much more interesting motivation: by making both structures invariant over this lifetime, we tie each VaListImpl structure to the function it was created in (for the VaListImpl<'f> creates implicitly by the compiler for an ellipsis argument, its lifetime argument is always the entire body of that variadic function), and tie each VaList to the lifetime of the VaListImpl it was created from. This has some interesting safety consequences:

it prevents users from accidentally assigning incompatible VaList or VaListImpl values, i.e., VaList values from two different variadic functions, and
it prevents VaListImpl<'f> values from escaping their variadic function.

For example, this code will fail to compile:

pub unsafe extern fn foo<'a>(mut ap: ...) -> VaListImpl<'a> {
    // `VaListImpl` would escape
    ap
}

fn bar<'a, 'f, 'g: 'f>(ap: &mut VaList<'a, 'f>, aq: VaList<'a, 'g>) {
    // Incompatible types
    *ap = aq;
}

Translating C Variadics to Rust

With the extended interface, we have all the tools we need to convert C variadic macros and types into their Rust equivalents using the following rules:

C code	Rust equivalent	Rule
`void foo(int x, ...)`	`fn foo(x: c_int, mut args: ...)`	The ellipsis argument is named `args`
`void bar(int, va_list)`	`fn bar(_: c_int, _: VaList)`	`va_list` function arguments get the `VaList` type
`va_list ap;`	`let mut ap: VaListImpl;`	`va_list` function locals get the `VaListImpl` type
`va_start(ap, x);`	`ap = args.clone();`	`va_start` becomes `args.clone()`
`va_arg(ap, int)`	`ap.arg::<c_int>()`	`va_arg` becomes `VaListImpl::arg`
`va_clone(aq, ap);`	`aq = ap.clone();`	`va_copy` becomes `VaListImpl::clone`
`va_end(ap);`	`/* Ignore va_end */`

The rules above are generally straightforward, with a few notable exceptions. First, since VaListImpl does not provide a constructor or a start function, we actually implement va_start using the clone function. We use the function’s ellipsis argument as the source for every such clone call. Since this requires that the argument is named, we assign it a unique name of args plus an optional suffix. Second, since VaListImpl values are automatically dropped, we simply ignore va_end calls.

For example,

void bar(int, va_list);

void foo(int x, ...) {
    va_list ap, aq;
    va_start(ap, x);
    va_copy(aq, ap);
    bar(x, aq);
    va_end(ap);
    va_end(aq);
}

becomes

extern "C" {
    #[no_mangle]
    fn bar(_: c_int, _: VaList);
}

pub unsafe extern fn foo(x: c_int, mut args: ...) {
    let mut ap: VaListImpl;
    let mut aq: VaListImpl;
    ap = args.clone();
    aq = ap.clone();
    bar(x, aq.as_va_list());
}

Acknowledgments

We would like to thank Dan Robertson, Josh Triplett, and Eduard-Mihai Burtescu (eddyb) for the comments and suggestions for the design and implementation of Clone for VaList and the VaList/VaListImpl split.