Peter Kos

An O(1) Generic Blog Post About Rust

Posted on

Topics include: generics, const generics, monomorphization, macros, and GitHub archaeology

Credit goes to Prof. Matthew Fluet's Programming Skills: Rust class at RIT for the Rust knowledge and inspiration, and to @Ryan_A_Gillie for proofreading.

The Generic Groundwork

Generics are a powerful language tool to use in any language. In Rust, these are especially powerful, and by powerful I mean really efficient.

Let's look at a brief example to get familiar with syntax:

trait HasPrintData { /* ... */ }

// Defines a generic T that must implement the `HasPrintData` trait
fn cool_print<T: HasPrintData>(data: T) { /* ... */ }

struct MyType { data: &'static str }
impl HasPrintData for MyType { /* ... */ }

let asdf_data = MyType{ data: "asdf" };
cool_print::<MyType>(asdf_data);

As a Rustacean will proudly let you know, the efficiency comes from the fact that generics in Rust are zero-cost abstractions. rustc performs a process called monomorphization, where generic items (methods, structs, etc.) are "flattened" at compile time into only the types that are used. (Compare this to a language like C#, where generics are evaluated at runtime.)

An example:

// We use `cool_print` with a parameter of type `MyType`
let asdf_data: MyType{ data: "asdf" };
cool_print::<MyType>(asdf_data)

// So, the function definition is transformed accordingly:
// fn cool_print<T: HasPrintData>(data: T) { /* ... */ } [no longer exists]
   fn cool_print_mytype(data: MyType) { /* ... */ }

// This occurs for as many variations of `T` that exist in calls to `cool_print`.
// If we had `TypeA` and `TypeB`, which were both `HasPrintData`, also call `cool_print`,
// we'd end up with the following additional definitions generated by the compiler:
fn cool_print_typea(data: TypeA) { /* ... */ }
fn cool_print_typeb(data: TypeB) { /* ... */ }

And of course, this leads to any further optimizations the compiler may decide to do, re: inlining.

This is great! We have a mechanism that allows us to mix the convenience of generic programming with the speed of literally not doing it. However, as we are soon to see, there are more features that we would like generics to have.

The Generic Problem

There are a few limitations with plain-old generics. Compile time generics with monomorphization are great and all, but there's a specific pattern in the Rust language that would be nice to implement, well, generically: arrays!

let my_nums = [0; 16]; // An array of 16 zeros

We just want an implementation (with magic syntax we won't get into1) that goes something like this:

fn make_arr<T>() -> /* something */ { /* ... */ }

But what is an array?

The existence of any array is inexorably linked to its capacity. Since Rust has a very developed type system, we can attach this idea (that an array capacity is part of the type) directly to the type of the current instance of the array... right?

Before we answer that, let's see how early Rust implemented arrays (src):

macro_rules! array_impls {
    ($($N:expr)+) => {
        $(
            #[stable(feature = "rust1", since = "1.0.0")]
            impl<T> AsRef<[T]> for [T; $N] { /* ... */ }
        )
        // ...
    }))
}
//...
array_impls! {
     0  1  2  3  4  5  6  7  8  9
    10 11 12 13 14 15 16 17 18 19
    20 21 22 23 24 25 26 27 28 29
    30 31 32
}

And before that, in the pre-alpha days of Rust, arrays were defined with a variadic macro. The /* something */ above was a [T, ..$N], where T is the type, and ..$N defines a range (I believe -- old Rust is weird) up to the number of specified elements.

Ouch.

The standard library generates a new type for each 0..=N for type T (e.g., [T; 0], [T; 1], [T; 2]).

This means that if we want to implement anything on top of array -- Ord, PartialEq, etc. -- that means we need to implement it for all types of the array. (And indeed, in old versions of Rust, array docs were really messy, as they showed each implementation for all N!)

// This * 7 or 8 traits * 32 total variations...
impl<T> Debug for [T; 0]
impl<T> Debug for [T; 1]
impl<T> Debug for [T; 2]
impl<T> Debug for [T; 3]
impl<T> Debug for [T; 4]
// ...

This problem is the perfect candidate for a new type of generic: const generics.

The O(1) Generic Solution

Const generics are presented very eloquently in RFC 2000: Const Generics. I'm going to summarize that RFC later on, with some tangents where appropriate, but let's start with a brief overview of the topic.

On its own, a const generic is generic that is restricted to be a specific constant value, specified (simply) with the const keyword2. I think they're best understood in the context of monomorphization.

Consider the following type:

enum NoteName { A, B, C, D, E, F, G }

struct Note<const P: i8> {
    name: NoteName
}

let middle_c = Note::<3>{ name: NoteName::C };
let high_c   = Note::<4>{ name: NoteName::C };

// Monomorphization will transform the above `Note` type into the following two types:
struct Note_3 {
    name: NoteName
}

struct Note_4 {
    name: NoteName
}

The important thing here is that Note_3 and Note_4 are their own distinct type. Consider the counterexample, if we had just used a new parameter in the field:

enum NoteName { A, B, C, D, E, F, G }

struct Note {
    pitch: i8,
    name: NoteName
}

let middle_c = Note { name: NoteName::C, pitch: 3 };
let middle_d = Note { name: NoteName::D, pitch: 3}

// Monomorphization does not occur here, as there are no generics in use.

This reveals the motivation behind the humble const generic. If we want to have a type that is exclusively distinguished by a constant (some might say "by association" of a constant), then a const generic is a fantastic qualifier. (Arrays are a good example here.) Otherwise, if a type will have many invocations with different values, it may be better to stick to a traditional parameter-in-struct approach.

Now that we've established the basics of const generics, let's dig more into the edge cases we may encounter.

The O(1) Generic Technical Asterisks

One of the more important distinctions in the RFC is between a const and a const parameter. A const is just a value that is guaranteed-known at compile time. A const parameter is a constant value that is then "attached" to an instance of a type or function.

(So, when we say we have a "const generic", the generic is acting as a const parameter -- a value, known at compile time, that we attach to a type via monomorphization.)

We've seen an example above, but in closer detail, they're fairly straightforward:

fn cool_print<const T: usize>() { /* ... */ }

// Generally:
fn foo<const $iden: Type>()
// where $iden is the identifier, and `Type` is the type of the constant to be rendered.

Const generics can also kind-of sort-of be expressions when called:

cool_print::<3>();      // Valid!
cool_print::<{2 + 1}>(); // Also valid

Raw expressions are not possible within a function, however.

fn arr_gen<const N: usize>() -> [i32; N] {
    [3; N + 1]
    //  ^ cannot perform const operation using `N`
}
arr_gen::<4>();

// Note that this applies to expressions of the const generic,
// not the "use" of a generic in its callsite.
// The following is valid:
arr_gen::<{4 + 1}>();
// but the following is not:
arr_gen::<4 + 1>();

So sure, we can't use const generics as expressions. This should be a relatively straightforward issue to fix, alongside some other conveniences for generics... right?

Generic Communication

Despite this seeming like a straightforward feature overall, const generics don't have an entry in the Rust book. They are "stabilized", in the sense that the MVP is stable and safe to use. And it took three years for the RFC to go from accepted to beta.

In response to the hype (and against the technical difficulty), a project group was formed for const generics that's focused on aligning the overall implementation. They've been working on a book for a while to describe and align all the features together. (The A-const-generics tag on the Rust GitHub has 101 open issues at time of writing!)

Generic O(1) Compiler Issues

Before we discuss the bleeding edge of const generics, let's take a quick look at an implementation of a specific const generics feature to better understand the complexity around this issue.

One of the more anticipated issues is the use of const generics as an expression:

fn foo<const T: usize>() -> [i32; T + 1] { /* ... */ }

This is not currently possible due to an ambiguity around well-formed types.

A type is defined to be well-formed iff it respects its declared bounds. Once we have a well-formed type, the compiler can then perform type checking. But there is a problem:

//                                 (1)
fn foo<const N: usize>() -> [i32; N + 1] {
    //   (2)
    [0; N + 1];
}

The expression at (1), according to the compiler, is not seen as the same as the expression at (2)! Therefore, even if N is well-formed, and N + 1 is well-formed (that is, we can verify the type of N + 1), it's meaningless as we cannot currently see that both of those well-formed types are the equivalent, as they are not equatable.

Taking a brief dive into rust/compiler/, the unification is defined to always fail (rustc_trait_selection/src/traits/const_evaluatable.rs), and that failure is corroborated by a type error in a trait resolver (rustc_resolve/src/ident.rs). It looks like the compiler uses an (anonymous) projection, i.e., every time it sees a new const generic parameter, it generates a new value determined by its input (but which may potentially be unknown0). Unification is not possible between two anonymous projections. (Although, some work has been done to tighten up the reverse -- that is, evaluation of anonymous constants which depend on anonymous types: #74595. Additional discussion on this Zulip thread.)

I can't seem to piece it together with the unification restriction, but PR #74113 mentioned a recursive type check issue with const arguments. The problem occurs when the type checker references its TypeckTables of the surrounding body, attempting to look up the const_arg type. However, the const_arg may be defined in the same scope, which will cause an infinite loop. The solution is to instead check the type of the parameter's def_id (a unique(?) index into the HIR) instead. This seems logical, as by definition, a const generic parameter is still const -- the type of which may be an expression itself, but that expression should not change further into its use.

With these discrepancies, we can hopefully see how expressions for const generic parameters are difficult:

  • Unification for separate-use generics is not supported (i.e., declaration, return type, and use) due to anonymous projections,
  • Type checking is wonky with respect to const generic arguments3 (hopefully fixed :pray:),
  • Anonymous (generic) constants can't be evaluated with anonymous types due to weird edge cases

All of this is to point out the technical difficulty in implementing one additional "user-facing" feature for const generics. There's many dependencies that need to be addressed in order for the feature to actually land -- never mind land in a stable context.

Generic Future

What about the future of const generics? What's left for the quirky language feature that axed one more set of macros from the standard library?

Looking through the current list of nightly features, there's a few relevant ones for const generics, in particular:

  • adt_const_params (#95174)
  • generic_arg_infer (#85077)
  • generic_const_exprs (#76560) (supportive issue)

The last RFC, generic_const_exprs proposes a new inline const { ... } syntax that allows for blocks of code to be evaluated into a const value at compile time4. The relevant piece here for const generics is that currently, array length expressions cannot refer to generic parameters:

fn gen_arr<T, const N: usize>() {
    let cool_arr = [4i32; N + 1];
}

(This might be useful in side-stepping the const generic expression limitations, or at the very least in making more of the code readable for contexts where arrays depend on the size of a type, i.e., std::mem::size_of::<T>().)

Aside from these specific issues and proposals, there continues to be ongoing discussion in the way of documenting the immense technicality of const generics. This requires collaboration between many areas of the Rust project (compiler, lang, std library), and it can be difficult to coordinate what should be possible vs. what is possible to implement vs. what people have the time to dedicate to.

The fact that the RFC was feature-flagged with mvp_ should be enough indication that this is a big job!

A Generic Summary

Overall, we've explored many things:

  • Basic use of generics in Rust
  • const generics
  • Compiler optimizations (monomorphization, inlining)
  • Macro workarounds
  • The communication challenges of implementing a widely-scoped technical feature5
  • A brief look into the compiler's implementation (and restriction around) const generic expressions
  • Future proposals for const generics

This goes to show the impressive complexity and power underneath every Rust feature. I personally am looking forward to see new updates from the const generics working group!

1

The RFC 0520 added the magic syntax. It's 2014 Rust, so good luck!

2

What's nice is that the const generics group has stated that being able to "just add const" is a key goal of the project. I think this is a great way to keep down unnecessary complexity, which the problem of "complexity vs. sugar" is a hotly debated topic in many languages, re: if we can reuse existing intuitive syntax, we might as well, instead of creating new fancy syntax just for this feature.

3

See the const generic RFC 2000 for a technical definition of const generic parameter vs. const generic argument

5

Scoped in technical nature, not in initial featureset. The RFC is remarkably concise!

4

Future Peter here: I wrote this section quite late in the night, and this part might be a bit wrong. I can't find the idea I talked about in the GH issue itself, but it feels right. Reach out to me on Twitter @pkos91 if you want to correct me!