Benjamin Sago / ogham / cairnrefinery / etc…

Technical notes Browse the code of locally-cached crates

Rust’s package manager and build system, Cargo, downloads the entirety of each crate used as a project’s dependency and stores it locally. When I need to read the crate’s documentation, I’m usually both lazy and online, so I just use the superb docs.rs service, which builds and indexes each published crate’s documentation automatically. If I happen to not have internet access, I can still build a copy of the docs locally with cargo doc --open, which very handily opens my Web browser to read it.

Sometimes, though, I’m not interested in the public documentation of each module, I need to view the code directly. And that code is stored locally, after cargo fetch has been run — it just needs to be extracted from its archive.

The browse-crate function

To make it easier to browse a crate’s code, I wrote this Fish function:

functions/browse-crate.fish
function browse-crate -d "Browse a locally-cached Rust crate" -a crate_name
    if test -z "$crate_name"
        echo "browse-crate: No crate name given" >&2
        return 3
    end

    mkdir -p ~/tmp/crates
    cd ~/tmp/crates; or exit $status

    set -l base_crate_name (string replace -r 'crate$' '' $crate_name)
    set -l crates_dir ~/.cargo/registry/cache/github.com-*/

    if test -e "$base_crate_name"
        echo "browse-crate: ‘$base_crate_name’ is already extracted" >&2
    else if test -f "$crates_dir/$crate_name"
        echo "Extracting ‘$base_crate_name’..."
        tar -xf $crates_dir/$crate_name
    else
        echo "browse-crate: crate at ‘$crates_dir/$crate_name’ does not exist" >&2
        return 1
    end

    cd $base_crate_name
    pwd
end

This function takes one argument — the name of the crate archive file to extract, such as anyhow-1.0.59.crate — and creates a directory within ~/tmp/crates (or wherever you deem best to put temporary files such as these), and then extracts the crate into it. It then cd-s into that directory, leaving you free to search the code, or open it in your editor, or similar.

The actual directory we are searching contains both github.com, the name of the default Cargo registry, and a random hash, so we have to glob the file name in order to find it. There should be only one such directory present.

If you’re following the advice in “Unclutter $HOME with wrapper scripts” and are setting the CARGO_HOME environment variable, you’ll have to re-write the glob from searching ~/.cargo/ to $CARGO_HOME/. That’s the price of a clean home directory: constant vigilance.

Reverse-version-ordered completions

So far so good. There’s just one problem: we need to know not just the name of the crate to extract, but its version, which will vary from project to project. Almost all the time, I’m using the most recent version of a dependency, but this is no means a guarantee, and I’d still like to be able to specify the version two or three updates behind if necessary.

The best way I’ve found to handle all possibilities is to write a custom completions function for the script that sorts in reverse version order: one that will offer the completion for the latest version of the crate first, the second-latest version second, and so on. This means that I get the behaviour I usually want first — all I have to do is type the first few letters of the crate’s name and hit Tab — while still letting me choose from older versions if I happen to be lagging behind.

There are two challenges with this approach. The first challenge is that even though we want the versions of each crate to be sorted in reverse order, we want the names of the crate to be sorted in non-reverse alphabetical order. For example, the first few crates that I happen to have cached locally are these:

addr2line-0.14.1.crate
addr2line-0.16.0.crate
addr2line-0.17.0.crate
aho-corasick-0.7.15.crate
aho-corasick-0.7.18.crate

If just a is on the command-line buffer, then the completions script should, ideally, offer the third line as the first completion, as it’s the latest version of the crate that comes first alphabetically. Next would be the second line, and then the first line — and then the fifth, and lastly the fourth.

The second challenge is that we need to extract the version number from each line: the crate name and version are separated by a hyphen, but the crate name could contain hyphens itself, so it’s everything after the final hyphen.

Is such a thing even possible without resorting to a proper programming language? Believe it or not, it is, though you have to write a crazy train of shell pipelines, and use a feature of sort only present in GNU coreutils, in order to do it correctly.

This is what I came up with, placed in my Fish completions directory:

completions/browse-crate.fish
function __complete_crates
    command ls -1 ~/.cargo/registry/cache/github.com-*/ \
        | rev | sed 's/-/#/' | rev | gsort -t'#' -k'1,1' -k'2,2rV' | sed 's/#/-/'
end

complete --command browse-crate --no-files
complete --command browse-crate --arguments "(__complete_crates)" --keep-order

Let’s go through this, step-by-step.

  1. Firstly, we print all the file names in the cargo cache directory, one file name per line, without printing the directory the files are in. It’s usually frowned upon to use ls to print the names of arbitrary files, since the files’ names could contain newlines themselves; in this case, the crate archives follow a known format, so this is safe.

  2. Next, we need to transform each file name so that the version part of the name can be more easily plucked out, replacing the final hyphen with a different character that can be scanned for specifically; here, #. This is the rev | sed 's/-/#/' | rev construct: as there is no straightforward way of replacing the final hyphen on the line, we reverse each string, replace the first hyphen, and then reverse it back again. This transformation turns the list of files into the following:

    addr2line#0.14.1.crate
    addr2line#0.16.0.crate
    addr2line#0.17.0.crate
    aho-corasick#0.7.15.crate
    aho-corasick#0.7.18.crate
  3. With this in place, we can sort each line using the sort binary that comes with GNU coreutils. This may be already installed as the sort executable on your system; on macOS, though, we default to BSD sort, so I’ve had to install the GNU coreutils through Homebrew, which calls it gsort to distinguish it from the system default. We need GNU sort because, astoundingly, it comes with a version sort feature!

    The invocation, gsort -t'#' -k'1,1' -k'2,2rV', uses two rarely-used features: the field separator, -t, and sorting by keys, -k. Remember how we need to sort the version numbers in reverse, but sort the names non-reverse? We do this first by passing # as the field separator, then telling sort to first sort by field #1 (-k'1,1') and secondarily sort by field #2 using reverse version sort (-k'2,2rV', the r meaning “reverse”, the V meaning “version”).

  4. Finally, our strings still have # signs in, so we need to turn them back to hyphens.

This is shell scripting in a nutshell. It’s ordinary programming turned on its head: what’s usually hard is easy, but what’s usually easy is hard. Sorting strings as version numbers in an ordinary programming langauge would either require writing a complex comparison function or using an external dependency, but for GNU sort, all it took was adding one extra character. On the other hand, trying to split each filename on the last hyphen rather than the first hyphen would usually be a routine operation — and in shell script land, we not only have to pipe through rev twice, but sed twice, too!