Benjamin Sago / ogham / cairnrefinery / etc…

Technical notes Speed up Rust compilation by skipping ‘#[derive]’

Recently, I have been working on a Rust project involving an annoyingly large number — think several hundred — of wrapper types, all of the form Thingy<T>. These types are all similar but fundamentally different, and we don’t want to mix them up, which is why they’ve all been given their own type.

Having all these types sloshing about in one’s source code is going to make compilation slow no matter what. But what surprised me is how much compile time I was sacrificing not by compiling the code itself but the code produced by the automatically-derived trait definitions.

Fortunately for me, all these types were created by the same macro, so the cost of making the macro longer was worth quicker builds.

Running the measurements

To measure the impact of deriving traits on compile time, we’re going to write and then measure the compilation speed of three programs: one that derives several traits, one that implements those traits manually, and, as a baseline, one that does neither.

Here’s the deriving version, which uses a macro to define 100 structures, each with five common traits derived for them:

macro_rules! create_thing_derived {
    ($name:ident) => {
        #[derive(PartialEq, Eq, Hash, Clone, Debug)]
        pub struct $name<T>(T);
    }
}

create_thing_derived!(Thing0);
create_thing_derived!(Thing1);
create_thing_derived!(Thing2);
// repeat 100 times...

And here’s the same program with the manual implementations, none of which are very long or complicated but still appreciably add to the overall length of the code snippet:

macro_rules! create_thing_manual {
    ($name:ident) => {
        pub struct $name<T>(T);

        #[automatically_derived]
        impl<T> ::std::cmp::PartialEq for $name<T>
        where T: ::std::cmp::PartialEq
        {
            fn eq(&self, other: &Self) -> bool {
                self.0 == other.0
            }
        }

        #[automatically_derived]
        impl<T> ::std::cmp::Eq for $name<T>
        where T: ::std::cmp::Eq
        { }

        #[automatically_derived]
        impl<T> ::std::hash::Hash for $name<T>
        where T: ::std::hash::Hash
        {
            fn hash<H: ::std::hash::Hasher>(&self, state: &mut H) {
                ::std::hash::Hash::hash(&self.0, state);
            }
        }

        #[automatically_derived]
        impl<T> ::std::clone::Clone for $name<T>
        where T: ::std::clone::Clone
        {
            fn clone(&self) -> Self {
                $name(::std::clone::Clone::clone(&self.0))
            }
        }

        #[automatically_derived]
        impl<T> ::std::fmt::Debug for $name<T>
        where T: ::std::fmt::Debug
        {
            fn fmt(&self, f: &mut ::std::fmt::Formatter<'_>) -> ::std::fmt::Result {
                write!(f, "{:?}", self.0)
            }
        }
    }
}

create_thing_manual!(Thing0);
create_thing_manual!(Thing1);
create_thing_manual!(Thing2);
// repeat 100 times (again)...

There’s a whole lot of… well, nothing in this code. All it does is manually call the same function for each of these traits that we’re defining, using self.0 as the receiver instead of self, in code that’s rather ugly and symbol-heavy.

But is it quicker? Significantly:

0 ms0 milliseconds100 ms100 milliseconds200 ms200 milliseconds300 ms300 milliseconds400 ms400 millisecondsBaseline125.3 msDerived381.5 msManual267.3 ms

I ran these tests using rustc +stable manually, avoiding any overhead from Cargo and the rustup version-searching mechanism, and did the measurements using hyperfine. The results show that it takes (very roughly) twice as long to compile the code and trait definitions when they’re written out by hand, compared to having the compiler derive them.

This is a pretty good result, and it reaffirmed my decision to only write out the code manually in long loops — such as here, where I’m defining a hundred types using a macro — and not to shy away from using #[derive] in all other circumstances. The readability of code I don’t have to write beats having to wait an extra millisecond or two.