Tomáš Zemanovič : Rust coding style

This post is about a high-level Rust coding style (as in it doesn’t go into specific details), partly inspired by data-oriented design and partly by ML-based functional programming languages, which imho fits quite naturally into Rust and makes for ergonomic, flexible and easily extensible APIs. It’s nothing advanced, but I hope this would be useful for people coming to Rust, perhaps with some background in one of the common OOP languages.

I’ll try to sum up the main points into a handful of guiding principles, highlighted in block quotes for easy skim reading. If you don’t agree with something, I invite you to read the supporting arguments and if you still don’t agree I’d love to hear your opinion.

_{Sponge Decorator Crab by Richard Ling / CC BY-NC-ND 2.0}

Data-oriented design originated in games development and is very well suited for performance and opens the door to better optimization. When you pick Rust over some higher-level language in which you, for example, don’t need to think about ownership, you probably care about performance and might want to optimize your code at some point and even if you don’t, you’ll reasonably expect good out-of-the-box performance. But of course, there are many other great reasons to use Rust too, not least its great community!

Somewhat analogously to the rule of least power, I think that fns with struct, enum , type aliases and mods are not only sufficient for most things we commonly do but sticking to them is advantageous. In most modern computer architectures, data layout and locality are of the utmost importance for performance.

Pay a lot of attention to your data structures, they have a huge impact on what you’ll be able to do.

It’s not just for performance sake, data structures are very helpful in getting a good understanding of the problem and implementing a solution in a clear, readable and maintainable way. To quote Mike Acton “if you don’t understand the data, you don’t understand the problem”.

In functional programming, we take advantage of the fact that sum types (enums in Rust) allow you to express data of arbitrary cardinality (number of possible variants), so you can:

Make invalid states impossible to represent with your data types.

If it’s not clear how to do that, there are great articles and talks you can find on the topic.

A very simple example: Say you have a type (or arguments to a function) in which you want to have either A, B, or neither of them. If you simply use two Option types (NotXorOpts below), it’s possible to represent a value with both of A and B present. But you can very quickly define a custom enum type that rules that out:

struct A;
struct B;

type NotXorOpts = (Option<A>, Option<B>);

enum XorOpts {
  A(A),
  B(B),
  Neither,
}

The essence of it is to think of types as sets. In particular, Rust enums act as closed sets - open sets can be represented with traits. Often, traits are used in places where closed sets are sufficient.

Prefer to only use traits if and when you need open sets.

That is, only use them when you don’t know or cannot foresee all the possible variants.

You can often see encapsulation being achieved by keeping fields of data structures private, which looks satisfying because it can prevent the consumer from doing modifications that would break some abstractions. In a struct, fields are private by default so you do get directed into this, but there are other ways to achieve encapsulation. Here, I want to advocate:

Never hide data fields of types that appear in your public API.

When you’re using a crate and its abstraction doesn’t exactly fit with what you’re doing, having all public data types may allow making it suit your use case without having to wrangle it with some transmutations, fork it or worse abandon it altogether. As we know, pretty much all the abstractions in software are leaky. Furthermore, when you do care about the layout of your data and some abstraction is trying to hide it from you, it is only getting in the way of that.

Instead of putting your invariants on data types, place them on functions, document them, and use debug_assert!s generously.

Another pain point which might ruin your day is that when you follow the data-hiding encapsulation technique, you’ll be limited in what you can do when you’re using a method that requires you to mutably borrow the data you’re working with (&mut self). Before the mutable borrow is returned (to avoid any confusion, I do not mean the keyword return, but rather the moment when the mutable reference of the thing that’s being borrowed is released), you will not be able to call other methods on it that require immutable borrowing (&self), even if you can reason yourself that the fields that are being mutated are different from the fields that you want to access immutably. While there are some proposals on how to allow such code to be expressed, when the fields are public, you’re all good to partially borrow them directly! Simpler and easier.

Let’s look at an example that may be given as a motivation for hiding data to achieve encapsulation. Say we have some list of integers, which can get large and we’ll want to often use its average without having to recompute it every time, so we might want to store it with the data. In an OOP way, this may look like this:

#[derive(Default)]
struct AvgVec {
  data: Vec<i64>,
  avg: i64,
}

impl AvgVec {
  pub fn get_avg(&self) -> i64 { self.avg }

  pub fn push(&mut self, val: i64) {
    self.data.push(val);
    self.update_avg();
  }
  
  pub fn pop(&mut self) -> Option<i64> {
    let res = self.data.pop()?;
    self.update_avg();
    Some(res)
  }

  fn update_avg(&mut self) { todo!() }
}

Whenever the data changes, which here is only possible via the public push and pop methods, we call update_avg and so we internally enforce data consistency, which is not a bad thing, and after making it generic you might consider the problem solved. But we can achieve the same without hiding any data:

struct AvgVec(pub Vec<i64>);

impl AvgVec {
  /// Push a value and return the new average.
  pub fn push(&mut self, val: i64) -> i64 {
    self.0.push(val);
    self.avg()
  }

  /// If the data is not empty, returns a pair of the popped value and the new average.
  pub fn pop(&mut self) -> Option<(i64, i64)> {
    let res = self.0.pop()?;
    let new_avg = self.avg();
    Some((res, new_avg))
  }

  /// Compute the average value. Returns `0` if the data is empty.
  pub fn avg(&self) -> i64 { todo!() }
}

Admittedly, this does feel a bit trivial and contrived, but I hope it illustrates the point. If you can think of a better (counter-)example, I want to hear from you!

A nice side effect of this style is that unlike in the first version, if you forget to return the new average, it’s a type error. If you remove the call to self.update_avg(); or you forget to use it in a new function that mutates the state, you don’t get much help.

Of course, it’s still a good practice to minimize how much you rely on the internals of some data type outside of its module(s), as you will be less likely affected if and when the data type changes, but even if you do, the type system has your back.

Use the module system to encapsulate logic.

For more complex modules, you can split it up into (private) sub-modules and then re-export (pub use) the public parts from the parent module. The Rust compiler will even check for you when you accidentally expose something that is using another thing in its API that itself isn’t public.

In Rust, methods are just syntax sugar providing some convenience, but often, there is over-reliance on using methods for code that doesn’t need it, which I suspect is at least partly habitual from OOP languages. Taking the example from above, we could just as well do this:

pub type Data = Vec<i64>;

/// Push a value and return the new average.
pub fn push(data: &mut Data, val: i64) -> i64 {
  data.push(val);
  avg(&data)
}

/// If the data is not empty, returns a pair of the popped value and the new average.
pub fn pop(data: &mut Data) -> Option<(i64, i64)> {
  let res = data.pop()?;
  let new_avg = avg(data);
  Some((res, new_avg))
}

/// Compute the average value. Returns `0` if the data is empty.
pub fn avg(data: &Data) -> i64 { todo!() }

Not everything needs to be a method, in fact, often a method can carry a lot of baggage when you only need a handful of things from self.

Strive to reduce the inputs to your functions to the bare minimum needed to implement them.

This makes it easier to test these functions very thoroughly in isolation from things that are not relevant to them. You can build very readable code with plain functions using qualified symbols with similar usability to methods. For example:

// This `mod` might be a file `space.rs`
pub mod space {
  pub struct Ship;
  
  pub fn yeet(ship: Ship) {}
  pub fn is_the_place() -> bool { true }
}

fn main() {
  let space_karen = space::Ship;
  space::yeet(space_karen);
  assert!(space::is_the_place());
}

When you’re used to having a drop-down of possible methods to come up after you type . following a name of a variable, switching to :: following a module name works much the same.

Last but not least. While Rust’s type system is powerful and it will help you to find how to do things right and prevent many bugs, it also makes it very easy to add tests and sometimes it can be simpler to test against issues than it is to prevent them at type level.

Stay pragmatic, not everything needs to be solved at type level.