git_cache_http_server.into_rust(): parsing args

08 June 2021. Estimated reading time: 5 min.

I decided to start rewriting the git cache server in Rust by the command-line interface.

The Haxe implementation used docopt. Originally developed for Python by Vladimir Keleshev, docopt is a command-line argument parsing library where the parser is automatically constructed from the usage message.

I could just use docopt for Rust and be done with it. But what else is there in Rust? And what are my requirements?

requirements

A simple program should not require a complicated parser.

There should be little or no duplication between the usage message and the parser construction.

The usage message, either manually written or generated by the argument parser, should look nice.

docopt for Rust

Andrew Gallant's docopt for Rust is by all accounts a fully compliant implementation on docopt.

However, a common problem with using docopt in statically-typed languages—like Rust or Haxe—is that its API traditionally returns a dictionary, where the keys are the commands, arguments and options names, as strings.

In addition to the untyped keys, each value can be a string, a boolean or a list, depending on the specific usage message and what use cases are possible from it. Parsing the values is left to the caller, and it's very easy for the usage message, resulting dictionary, and parsers for each value to get out of sync.

That said, docopt for Rust significantly improves this. It includes a second API that uses "type-based" deserialization. This requires declaring the structure to be filled in addition to the usage message.

use docopt::Docopt;
use serde::Deserialize;
use std::path::PathBuf;

const USAGE: &'static str = "
A caching Git HTTP server.

Serve local mirror repositories over HTTP/HTTPS, updating them as they are requested.

Usage:
  git-cache-http-server [options]

Options:
  -c, --cache-dir <path>   Location of the git cache [default: /var/cache/git]
  -p, --port <port>        Bind to port [default: 8080]
  -h, --help               Print this message
  --version                Print the current version
";

#[derive(Deserialize)]
struct Opt {
    flag_cache_dir: PathBuf,
    flag_port: u16,
}

fn main() {
    let opt: Opt = Docopt::new(USAGE)
        .and_then(|d| d.deserialize())
        .unwrap_or_else(|e| e.exit());
    println!("INFO: cache directory: {}", opt.flag_cache_dir.to_string_lossy());
    println!("INFO: port number: {}", opt.port);
}

The minimal main function calls the parser and replicates the output of the Haxe version.

This approach is very powerful, and more complex value conversion and validation that was shown here is possible. The downside is that the deserializer specification and the usage message can end up out of sync: for example, fields may be missing from either the message or the typed structure.

clap and structopt

So, what is commonly used in Rust? Well, the Rust CLI working group mentions clap and structopt in the Command Line Applications in Rust book.

The latter is a frontend to the former and, like docopt, constructs the argument parser from a higher-level specification. However, instead of starting with the usage message, which has no type information, structopt builds the CLI and the usage message from a single struct type that you expect the program arguments to fill.

use std::path::PathBuf;
use structopt::StructOpt;

/// A caching Git HTTP server.
///
/// Serve local mirror repositories over HTTP/HTTPS, updating them as they are requested.
#[derive(StructOpt)]
struct Opt {
    /// Location of the git cache.
    #[structopt(
        short,
        long,
        parse(from_os_str),
        default_value = "/var/cache/git",
        name = "path"
    )]
    cache_dir: PathBuf,

    /// Bind to port.
    #[structopt(short, long, default_value = "8080")]
    port: u16,
}

fn main() {
    let opt = Opt::from_args();
    println!("INFO: cache directory: {}", opt.cache_dir.to_string_lossy());
    println!("INFO: port number: {}", opt.port);
}

The output from strucopt (clap) is very acceptable.

$ git-cache-http-server -h
git-cache-http-server 0.1.0
A caching Git HTTP server.

Serve local mirror repositories over HTTP/HTTPS, updating them as they are requested.

USAGE:
    git-cache-http-server [OPTIONS]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -c, --cache-dir <path>    Location of the git cache [default: /var/cache/git]
    -p, --port <port>         Bind to port [default: 8080]

closing thoughts

In programs with many options it can be useful to manually group and order the options in a way that is easier for the user to reason about. This doesn't matter here, and is still possible with clap/strucopt, but for these scenarios docopt.rs may still be a better fit.

By the way, both usage messages are suitable for automatic generation of completion for zsh.

$ compdef _gnu_generic git-cache-http-server

$ git-cache-http-server --<Tab>
--cache-dir  -- Location of the git cache (default- /var/cache/git)
--help       -- Prints help information
--port       -- Bind to port (default- 8080)
--version    -- Prints version information

And great news, clap 3.0 will incorporate structopt.

To be continued...