8 minutes
Improved C Variadics in Rust and C2Rust
By Andrei HomescuIntroduction to C Variadics
The C language provides a special class of functions called variadic functions that can be called with a variable number of arguments. The declaration of a variadic function ends with an ellipsis, e.g.:
void variadic_function(int x, ...);
Variadic functions can be called with any number of arguments in place of the ellipsis (including none at all).
The C runtime provides a set of helper macros that developers use to retrieve
the values of the variadic arguments.
All the macros, along with the va_list type, are defined in the
stdarg.h header and most commonly implemented as compiler builtins:
-
va_start(ap, last)initializes the variableapof typeva_listto the next argument of the current function followinglast, wherelastis the name of one of the non-variadic arguments of the function.lastmost commonly refers to the argument immediately before the ellipsis, soapis effectively initialized with the address of the first variadic argument. -
va_arg(ap, type)returns the value of the next argument fromap, under the assumption that the type of the argument at the variadic function’s call site istype. -
va_copy(aq, ap)copies the current state ofapintoaq. Both arguments have typeva_list. -
va_end(ap)frees all resources used byap. Eachva_startandva_copycall must be matched by exactly oneva_endcall in the same function.
For example, this is a very common implementation of the printf function:
int printf(const char *fmt, ...) {
int res;
va_list ap;
va_start(ap, fmt);
res = vprintf(fmt, ap);
va_end(ap);
return res;
}
The signature of the vprintf function is
int vprintf(const char *fmt, va_list ap);
and it internally calls va_arg to retrieve the values of the
arguments passed to printf.
C Variadics in Rust
Our goal in the C2Rust project is to translate any valid C99 program into equivalent Rust code. Naturally, this means we need to properly support translating C variadic functions into Rust.
For a long time, the Rust-C FFI only allowed one-way calls to C variadic
functions: Rust code could call C variadic functions, but not the other way
around. For example, the printf function could be declared and called from
Rust as:
extern "C" {
fn printf(fmt: *const c_char, ...) -> c_int;
}
but such a function could not be implemented in Rust.
Rust RFC 2137 proposed an interface for Rust code to provide C-compatible variadic functions, which was later implemented as a series of patches by Dan Robertson that have been merged into nightly Rust from November 2018 to February 2019.
The new interface provides a new VaList that is compatible with C’s
va_list and which implements the following interface (simplified for brevity):
impl VaList {
/// Rust equivalent of `va_arg`, extracts the next argument of type `T`.
pub unsafe fn arg<T>(&mut self) -> T;
/// Calls function `f` with a copy of `self`, constructed with `va_copy`
/// and safely destroyed using `va_end`.
pub unsafe fn with_copy<F, R>(&self, f: F) -> R
where F: FnOnce(VaList) -> R;
}
These methods provide a safer alternative to their C counterparts (but still
marked unsafe, since arg still performs a form of type punning),
guaranteeing that every call to va_start and va_copy has a matching va_end.
C variadic functions defined in Rust use a special syntax: the variadic
arguments are defined with a special ellipsis type, which the compiler
internally transforms into a VaList type and automatically calls va_start
and va_end for that parameter, e.g.:
pub unsafe extern "C" variadic_function(mut ap: ...) {
// rustc calls `va_start(ap)` internally here
// Print the first argument as a `u32`
println!("{}", ap.arg::<u32>());
// rustc now calls `va_end(ap)` automatically
}
Implementing Clone for VaList
While the VaList API above does not directly match C’s macros,
it provides a sufficient interface for the C2Rust transpiler to support
conversion of C va_start, va_arg and va_end calls. However, some uses of va_copy
cannot be translated to with_copy, such as
va_list ap1, ap2;
va_copy(ap1, ap);
if (condition)
va_copy(ap2, ap);
// ...other code that uses ap1 and ap2...
va_end(ap1);
// ...more code that uses ap2...
if (same_condition)
va_end(ap2);
and
va_list aq, ap1, ap2;
// ...initialize ap1 and ap2...
if (condition) {
va_copy(aq, ap1);
} else {
va_copy(aq, ap2);
}
// ...code that uses aq...
va_end(aq);
In the first example, the problem is that the lifetimes of the ap1 and ap2 variables overlap, but neither includes the other. Therefore, we cannot replace each va_copy call with with_copy while also maintaining the order of variadics operations (we could move va_end(ap1); after the last if statement, but that might change the behavior of the entire program). In the second example, the aq copy is initialized in one of the branches of the if statement, but needs to escape the statement and live until the corresponding va_end call which is outside the statement. If we were to place the with_copy call inside the if, the latest its internal scope could end (and implicitly call va_end) would be at the end of each branch.
For a real-world example of C code that cannot be trivially transformed to a with_copy call,
see the vasprintf function in the Julia language implementation.
The underlying issue is that with_copy creates a new scope which the copy lives and is destroyed in at the end, and assumes that all other uses of the copy can be cleanly moved into this scope. Our examples show that this is not always simple or even possible at all.
To solve this problem, we submitted a Rust language pull request
with an extension to this interface that would expose a Rust version of va_copy.
After several redesigns based on discussions and suggestions from Rust language
team, we settled on a final version of the interface with the following changes
(based on a design proposed by Rust compiler team member eddyb):
-
Split the previous
VaListinto two structures: an internalVaListImplused by the compiler as the backing structure for the ellipsis argument, andVaListused as the public C-compatible interface for the former. -
Implement the
Clonetrait which copies a givenVaListImpland returns the copy.
The VaList split brings Rust’s data structures closer to their C equivalents,
since on some architectures va_list is defined, e.g., by
clang, as
typedef struct __va_list_tag va_list[1];
Due to C’s implicit array-to-pointer decay, va_list decays to the
struct __va_list_tag* pointer type when used in function signatures, but
remains a single-element array (with the same size as the structure itself)
when used to declare a local variable.
To preserve this distinction in the Rust interface, we refactored VaList to match
the pointer version of va_list, and added VaListImpl as an equivalent for
the __va_list_tag structure.
The new interface for the two structures is (simplified once again):
impl VaListImpl {
/// Explicitly convert `VaListImpl` -> `VaList` for callees.
pub fn as_va_list(&mut self) -> VaList;
pub unsafe fn arg<T>(&mut self) -> T;
/// Calls function `f` with a copy of `self`, constructed with `va_copy`
/// and safely destroyed using `va_end`.
pub unsafe fn with_copy<F, R>(&self, f: F) -> R
where F: FnOnce(VaList) -> R;
}
impl Deref for VaList {
/// Deref-coercion for `VaList`, so it can be used in lieu of a `VaListImpl`.
fn deref(&self) -> &VaListImpl;
}
/// `DerefMut` implementation for `arg`
impl DerefMut for VaList { ... }
impl Clone for VaListImpl {
/// Copy `self` into a new `VaListImpl` and return it.
fn clone(&self) -> Self;
}
We omitted one interesting and significant detail from the interface for
brevity: the actual types are not VaListImpl and VaList, but VaListImpl<'f>
and VaList<'a, 'f>. The 'a lifetime has a simple purpose: since VaList
is internally just a &'a mut VaListImpl reference to its backing VaListImpl,
'a is the lifetime of that reference. On the other hand, 'f has a much more
interesting motivation: by making both structures invariant over this lifetime,
we tie each VaListImpl structure to the function it was created in (for the
VaListImpl<'f> creates implicitly by the compiler for an ellipsis argument,
its lifetime argument is always the entire body of that variadic function),
and tie each VaList to the lifetime of the VaListImpl it was created from.
This has some interesting safety consequences:
-
it prevents users from accidentally assigning incompatible
VaListorVaListImplvalues, i.e.,VaListvalues from two different variadic functions, and -
it prevents
VaListImpl<'f>values from escaping their variadic function.
For example, this code will fail to compile:
pub unsafe extern fn foo<'a>(mut ap: ...) -> VaListImpl<'a> {
// `VaListImpl` would escape
ap
}
fn bar<'a, 'f, 'g: 'f>(ap: &mut VaList<'a, 'f>, aq: VaList<'a, 'g>) {
// Incompatible types
*ap = aq;
}
Translating C Variadics to Rust
With the extended interface, we have all the tools we need to convert C variadic macros and types into their Rust equivalents using the following rules:
| C code | Rust equivalent | Rule |
|---|---|---|
void foo(int x, ...) |
fn foo(x: c_int, mut args: ...) |
The ellipsis argument is named args |
void bar(int, va_list) |
fn bar(_: c_int, _: VaList) |
va_list function arguments get the VaList type |
va_list ap; |
let mut ap: VaListImpl; |
va_list function locals get the VaListImpl type |
va_start(ap, x); |
ap = args.clone(); |
va_start becomes args.clone() |
va_arg(ap, int) |
ap.arg::<c_int>() |
va_arg becomes VaListImpl::arg |
va_clone(aq, ap); |
aq = ap.clone(); |
va_copy becomes VaListImpl::clone |
va_end(ap); |
/* Ignore va_end */ |
The rules above are generally straightforward, with a few notable exceptions.
First, since VaListImpl does not provide a constructor or a start function,
we actually implement va_start using the clone function.
We use the function’s ellipsis argument as the source for every such clone
call. Since this requires that the argument is named, we assign it a unique
name of args plus an optional suffix.
Second, since VaListImpl values are automatically dropped, we simply ignore
va_end calls.
For example,
void bar(int, va_list);
void foo(int x, ...) {
va_list ap, aq;
va_start(ap, x);
va_copy(aq, ap);
bar(x, aq);
va_end(ap);
va_end(aq);
}
becomes
extern "C" {
#[no_mangle]
fn bar(_: c_int, _: VaList);
}
pub unsafe extern fn foo(x: c_int, mut args: ...) {
let mut ap: VaListImpl;
let mut aq: VaListImpl;
ap = args.clone();
aq = ap.clone();
bar(x, aq.as_va_list());
}
Acknowledgments
We would like to thank Dan Robertson, Josh Triplett, and Eduard-Mihai Burtescu
(eddyb) for the comments and suggestions for the design and implementation of
Clone for VaList and the VaList/VaListImpl split.