Learning Rust - Early thoughts
I decided to stretch my programming muscles by taking a peek at Rust. This article has a few thoughts from the early learning journey. Most of it is a kind of stream of consciousness notes as I went through The Book which probably was more valable for me to write than it is for you to read in detail. I’ve summarized a few general impressions in a first section, that might be useful if you are interested in “what’s rust like?” for someone with a background in imperative, OOP languages like C++, Java, Python, Dart, …
General impressions
Rust is a compiled language with a C-like syntax and no runtime. It’s aimed at systems programming, with high-performance and strong memory and thread safety guarantees.
Rust’s overall approach to structuring programs felt most similar to
Go to me. Both languages turn away from classic OOP with
objects, classes and inheritance and instead move the focus back towards
data structures and (associated) methods. Rust offers generics,
choosing the C++-style of actually expanding generics to concrete types,
a process called monomorphization. And - again similar to go -
Rust allows polymorphism through an interface concept, called Traits
.
Traits allow defining a known set of methods that types can implement.
And Trait bounds
along with dyn
amic dispatch then allow handling
different types, as long as they implement a given (set of) traits.
Rust’s main novelty is making memory and thread safety core parts of
the language, giving it a kind of stubborn beauty. Because Rust has
no runtime, all the safety checks happen at compile time through
the design of the language and ideas like the borrow checker. By
forcing you to be very deliberate and explicit about things that you
can - at your own peril - choose to ignore in other languages, Rust
eliminates whole classes of suble errors. Yes, the compiler errors
can be daunting, but they often just make things explicit that might
bite you at runtime later. I’ve often seen programmers ultimately eschew
flexibility in other languages in favor of conventions and best practices
that gravitate to a similar place as Rust. For example, the borrowing rules
felt very similar to how modern C++ gravitates towards an ownership model
with RAII, unique_ptr
and std::move
(without the added safety/convenience
of the compiler checking your work).
Overall I quite enjoyed Rust. Seeing how to approach problems in ways other than traditional OOP helped broaden my horizons. And I loved how the focus on memory and thread safety forced a more explicit mindset around ownership, etc. that’s actually helpful in other langauges, too. Rust does have a steeper learning curve than other langauges, and I’m not sure I’d choose it for short-lived projects where quick results matter most, say tasks you’d break out Python and Jupyter for. But I think it’s likely a great choice for production software, that will run and be maintained for years, and where the increased performance, security, etc. are worth the higher cost of teaching your team(s) Rust and grappling with the rules.
I don’t think I have a great sense yet for how Rust’s various concepts would fit together efficiently in a large-scale project, but I really enjoyed the introduction and want to remark positively on how friendly and approachable both the Rust book and the compiler try to be.
With that said, on to my learning journey…
Setting up
I decided to follow The Book, starting
with installing on MacOS using rustup
. That went very smoothly, and I was
off to the races, building Hello World!
. Not much interesting to report on that one.
Once I started playing with Cargo - rust’s package manager and build system - an interesting detail was that cargo new
also created a new git repository by
default. A neat, batteries-included approach (and great that you can opt-out
with flags for users of a different version-control persuasion).
NOTE: I also learned about TOML as a config file format, whichI immediately liked, after struggling with the idiosyncracies of YAML for Home Assistant. See Ruud’s article on why TOMLs focus on obvious semantics is welcome.
Overall crate seemed on par with the kind of system you’d expect as a developer
for a modern language ecosystem. I liked how crate has simple and obvious verbs like new
, build
and check
, and that building a Rust project is as simple
as
$ git clone example.org/someproject
$ cd someproject
$ cargo build
IDE Support
Before I jumped deeper into the language, I wanted the usual amenities of
an IDE. I’ve been using VS Code a lot lately
so I wanted support for that. I ended up installing the rust-analyzer
extension
from the Marketplace, which got me the usual conveniences of syntax highlighting,
error checking and the ability to run the binary from the IDE (remember to re-start
VS Code after you install rust, mine didn’t find cargo
in the PATH at first).
Chapter 2 - First impressions
Going through the initial examples, Rust looks vaguely C-ish in terms of the
syntax. I found it interesting that the println!
function is a Macro, and that
it seems to allow for an interesting mix of formatting using both parameters
passed to the invocation as well as references to variables in the format string:
let x = 5;
let y = 10;
// That {x} in there was surprising, coming from C++.
println!("x = {x} and y + 2 = {}", y + 2);
I of course immediately tried what would happen if you referenced a variable that doesn’t exist in the current scope and I’m glad to report that the compiler flags that as an error.
In later chapters I also came across the handy dbg!
macro, that
can print what a given expression evalutes to stderr for you,
allowing for more concise inline debugging:
let y = 10;
let x = dbg!(y + 20);
I learned that Rust seems to take the opposite approach to C++, offering a
mut
keyword instead of a const
one. Having worked on making legacy C++
code const-correct, that strikes me as a good idea…
I’m not sure how I feel about the let
statement and the mutable
reference syntax - function(&mut variable)
. Let often
felt unnecessary in the languages where I encountered it, and the &mut
seems long at first. But I’ll withhold judgement until later, as both
might serve as a useful visual sign-posts to the code, and it’s quite possible
that continued exposure to the language will make them feel natural.
Rust’s approach to error handling strongly reminds me of Go
with errors being an explicit part of a function’s return type, but I liked
how the combination of Rust’s enum
type with the methods on Result
allow dealing with the error in-line (e.g., returning some default value if appropriate)
compared to Go’s syntax that often requires writing an explicit if
after each command.
# Golang
f, err := os.Open("filename.ext")
if err != nil {
log.Fatal(err)
}
# Rust
let mut f = File::open("filename.txt").expect("No such file");
I found it interesting that Rust has ranges as an explicit datatype with built-in literals, which is an intersting way to handle the usual slicing magic of languages like Python or Go:
let x = 0..=2; // Gives you a RangeInclusive<usize>
let y = [0, 1, 2, 3, 4];
for i in y[x].iter() {
println!("{i}"); // Prints 0, 1, 2
}
I also found that simply assigning, say
let z = y[x]
does not work, as Rust expects an explict size for local variables, via aSized
trait, giving me a first taste of how people can struggle with Rust’s invariants that let it guarantee safety without a runtime system.
Back to Cargo
Adding an external dependency was as easy as expected, with cargo
downloading the dependency tree from http://crates.io.
I was ever so vaguely disappointed that cargo didn’t seem to flag unused crate dependencies as a warning or similar. Having worked on large codebases with many developers has made me appreciate tools that help removing obsolete depencencies. I’m sure the compiler will elide such code, so these dependencies mainly worry me from a mental load angle.
I liked the approach cargo takes to dependency versioning. Both the
opinonated semantic versioning
- assuming that minor version number changes like
0.8.5 -> 0.8.6
are API-compatible patch releases, while major changes
0.8.6 -> 0.9.0
may break the API - and the fact that updating to new
depencency version was an explicit action through Cargo.lock
and cargo update
.
A later chapter also mentions the ability to reference local packages through the path syntax, which might come in handy for things that you don’t want to or can’t upload to crates.io.
hello_macro = { path = "../hello_macro" }
hello_macro_derive = { path = "../hello_macro/hello_macro_derive" }
Chapter 3 – The Similarities
Okay, so Rust does offer a const
keyword, too, but that’s for bona-fide
constants and replaces let
. Fair enough. And it seems Rust follows C++'
convention of making constants ALL_CAPS_WITH_UNDERSCORES
.
When discussing variable shadowing I wondered how this interacted with freeing memory, given I head heard Rust’s model is similar to C++' RAII. So, for example, I wondered when the initial value for x would be freed in a program like this (assume some complex value on the heap, if ints are special):
let x = 5;
let x = 6;
The examples in the chapter don’t answer that, but this answer on stackoverflow suggests that we free the value as soon as the compiler can show it to be inacessible.
The integer types are the usual suspects (i8
, u16
, …) up to 128 bits,
including an architecture-dependent size type (isize
/ usize
). I love how
they made the octal literal consistent with the hex one (0o17
like 0x1F
) and
that there is an explicit binary literal type (0b11001...
), as well as some
support for ASCII (b'F'
). Treating integer overflows as a panic in debug
builds is not something I’m used to, but feels in line with Rust’s aim of
balancing safety and performance. Interesting that integers default to 32
bits while floats are 64 bit by default. char
looks pretty standard, but
seems 4 bytes wide and Unicode by default. Rust seems to be very strict
with type conversion, refusing to even convert i32
to u32
.
The try_into
trait can be used for “casting”:
let x: i32 = 10;
let y: u32 = x; // Compiler error
let y: u32 = x.try_into().expect("Argh");
Tuples look pretty standard and use a member accessor like syntax for index
access. For example, t.0 + t.2
adds up the first and third elements of
tuple t
. Looks like tuples can be mutable, but decomposition doesn’t
easily mix with mutability: let mut (a, b, c) = tuple
doesn’t work.
Rust distinguishes arrays (fixed length) and vectors (variable length).
Arrays have the length baked into their type, and there’s a similar syntax
to initialize arrays to the same value. Array access follows the usual
a[0]
syntax. Array access with a dynamic index is allowed, but will
panic at runtime if an out of bounds access happens.
let a: [i32; 5] = [1, 2, 3, 4, 5];
let a: [i32; 5] = [0; 5]; // same as [0, 0, 0, 0, 0]
let x = a[20]; // Compiler error
let i = 20;
let x = a[i]; // Runtime panic
Functions use snake case and can be defined in any order (nice).
Parameters must have type annotations (to enable type inference
in other places), and use ->
for the return type. While Rust
has a return
keyword (to return early) it’s idiomatic to end
functions on an expression (note the missing semicolon) instead,
which becomes the value of the block:
fn compute_sqrt(x: u32) -> u32 {
println!("Computing sqrt({x})"); // Statement
x*x // Expression (value of the block)
}
It seems statements “return” the unit type
()
(an empty tuple), so using a statement where an expression is needed will result in a type mismatch.
if
looks similar to C++ but doesn’t use braces
for the condition. We can also use the expression syntax to
cram whole blocks of code in the condition (not sure if that’s
useful or idiomatic, but nice to see it works):
let number = 3;
if number < 5 {
println!("condition was true");
}
if {println!("Bla"); number < 5} {
println!("condition was true");
}
Turns out, if is an expression, so we can also use it in a context that needs a value, for example:
let cond = true;
let x = if cond {5} else {10};
Loops come in the usual while
and a loop
flavour. The former
has no braces around the condition (like if
), the the latter is
an explicit keyword for loops without a condition. break
/ continue
act as statements for flow control in the loop, with loop labels using a
syntax like 'label: loop {...}
. Interestingly, loops are expressions
and break
can take a value to return from the loop:
let mut i = 5;
let x = loop {
i = i * i;
if i > 1000 {
break i; // "Returns" i from the loop.
}
};
for
is only used for “range-based” for loops, but combined with
the range type, this can be used similar to traditional for loops:
let a = [10, 20, 30, 40, 50];
for v in a {
println!("{v}");
}
for i in 0..10 {
println!("{i}");
}
Chapter 4 – Things become different
Okay, the infamous borrow checker rears its head. The ownership
model reminded me of modern
RAII-style C++
with mechanisms like unique_ptr
and std::move
(not sure if the
language designers took a page from each other’s book or if this is
more a case of convergent evolution).
Rust distinguishes copyable types - identified by the Copy trait - and movable ones (everything else). By allowing only a single regular variable to refer to a movable type, Rust can use the scope of that symbol to control memory dealllcation. If the owner goes out of scope, the instance gets dropped and the memory on the heap is freed.
References - available as both mutable and immutable - allow referring
to movable types without implying ownership. This is called borrowing
in the sense that you get to use the value but don’t own it.
let i = 5;
let j = i; // Makes a copy as i32 has the Copy trait
let s = String::from("Hello");
let mut t = s; // Moves ownership to t, s is no longer valid.
println!("{s}"); // Error
let u = &t; // Takes a const reference to t, no ownership
println!("{t} {u}");
let v = &mut t; // Takes a mutable reference to t, no ownership
v.push_str(" world");
As the
let mut t = s;
statement shows, const-ness / mutability is a concept that applies to the variable(s) in Rust. Rust is fine to let us move things from a consts
to a mutablet
. The underlying data is mutable, it’s only the variable referring to it that controls whether we can change things. We cannot, however, take a mutable reference to something that’s immutable underneath.
Rust also has restrictions on what references can exist to a given value, allowing an arbitrary number of const references, but only one mutable reference to exist at any given moment. This is done to prevent data races (unsynchronized writes, or mutations while reads may happen). I worried if this would be limiting, but given the compiler is smart enough to let names go out of scope the moment they are not used anymore, common patterns like passing data structure we into a sequence of functions should work fine.
I wondered if a mutable variable and a mutable reference could co-exist, and it turns out they cannot (which is good, as that would allow for data races again):
let mut t = String::from("Hello");
let v = &mut t;
v.push_str(" world");
t.push_str(" world"); // Things are okay till here,
// v can go out of scope before t was used.
println!("{v}"); // Error, two mutable borrows active at the same time.
Functions can use any of regular types or references as their return value. Using a regular (movable) type implies the function takes or gives ownership, while references imply leaving ownership with the caller or returning a referenced owned to something outside of the function:
fn take(String s) { ... }
fn give() -> String { ... }
fn borrow(&str s) { ... }
fn owned_somewhere_else() -> &str {...}
I found it impressive that Rust is smart enough to track references through function calls (yeah, obvious in hindsight, as the whole data races guarantee wouldn’t be worth much otherwise, but seeing it for the first time was a woa moment for me).
fn first_word(s: &str) -> &str {
for (i, c) in s.as_bytes().iter().enumerate() {
if *c == b' ' {
return &s[..i];
}
}
s
}
fn main() {
let mut s = String::from("Hello world");
let w = first_word(&s);
s.push_str("Bla"); // Error, s is borrowed as immutable through w.
println!("{w}");
}
String and other slices allow creating a reference to part of some list type. They use the range type we encountered earlier:
let s = "Hello world";
let world = &s[6..]; // world
let hello = &s[0..=4]; // Hello
println!("{hello} {world}");
let a = [10, 20, 30, 40, 50];
let r = 0..2;
let b = &a[r];
Chapter 5 – A whiff of Go
Structs look pretty similar to what you’d expect from other languages with named fields, and the usual ways to access them. Structs use curly braces for initialization:
struct Rectangle {
width: uint32,
height: uint32
}
let r = Rectangle{width: 20, height: 10};
Structs have some syntactic sugar to make it easier to initialize them from variables with the same names as fields (in functions) and to update them:
let width = 20;
let r = Rectangle{
width, // Same as width: width,
height: 10};
let s = Rectangle {
height: 50,
..r // Take any field values not given from r
};
There is also a tuple struct type in case the fields don’t need names:
struct Point(i32, i32, i32);
let p = Point(10, 20, 30);
let x = p.0;
Interestingly, this is the first time I’m seeing something along the lines of a constructor in Rust. Most of the datatypes so far just had the curly brace initialization.
Similar to Go’s approach, Rust lets you define functions related
to structs (or enums) by using an implementation block with the impl
keyword.
Methods take a Python-esque &self
reference and can be
invoked with the usual object.methhod(...)
syntax. And we can “associate”
free-standing (or static) functions with a type (no self
reference) and
invoke them using Type::function(...)
. The latter is often used for
“constructors”, although to Rust these seem to be normal functions without
any special treatment. Rust allows functions to have the same name as a
field, which allows for creating getters and setters (struct fields have
private visibility outside a module by default).
struct Rect {
width: i32,
height: i32
}
impl Rect {
fn square(x: i32) -> Rect {
Self{width: x, height: x}
}
fn area(&self) -> i32 {
return self.width * self.height;
}
fn width(&self) -> i32 {
self.width
}
}
fn main() {
let r = Rect::square(10);
println!("The area of a square with width {} is {}", r.width(), r.area());
}
Rust allows referring to the type of the impl block with
Self
, and&self
is a shorthand forself: &Self
.
Chapter 6 – let me enumerate the ways
Rust likes its enums, maybe due to the strict type system that prefers
having concrete types in lots of places and the builtin support through
mechansims like the match
statement.
The basic enum syntax looks similar to C++ or Go, but Rust allows associating
data with each enum value, and even different kinds of data. This allows
creating precise constructs that keep just the required data for each case around.
Enums support both tuple structs
and actual structures. And we can associate methods
with an enum just like a struct:
enum Message {
Quit,
Heartbeat(u64),
Tweet { sender: String, text: String },
}
impl Message {
fn check_crc(&self) {
// Do something with the message.
}
}
fn main() {
let m = Message::Tweet {
sender: String::from("foo"),
text: String::from("bar"),
};
m.check_crc();
}
Two very commonly used enums are
Option
andResult
, which are the idiomatic ways to handle possibly absent values (no null in Rust) and operations than can fail.
Enums are usually handled in code through the use of match
, a kind of
switch-case statement on steroids. match
evaluates an expression,
compares that to a sequence of patterns, and executes the first match.
The compiler ensures the patterns exhaustively cover the options, but we
can use _
as a wildcard to match anything (similar to the default:
case
in C++).
fn main() {
let m = Message::Tweet {
sender: String::from("foo"),
text: String::from("bar"),
};
match m {
Message::Heartbeat(t) => println!("Heartbeat with timestamp {}", t),
Message::Tweet { sender, text } => println!("{} says {}", sender, text),
_ => (), // Ignore all other cases
}
}
Match is an expression so we can use to produce some value by returning values from each arm:
fn main() {
let o = Some(5);
let n = None;
let a = match o {
Some(t) => t,
None => 42, // no number? default to the answer!
}
}
It seems the case of wanting to handle just one specific case and ignoring
all others in a match
is common enough that there is syntactic sugar for
it in the form of the if let
statement:
fn main() {
let m = Message::Tweet {
sender: String::from("foo"),
text: String::from("bar"),
};
// Only handle the Tweet case, ignore all others.
if let Message::Tweet{sender, text} = m {
println!("{text}");
}
}
Chapter 7 – Would you like a crate for that?
This chapter talks about how larger codebases can be structured with crates, modules and packages. Rust and Cargo seem to be pretty opinionated with how files should be laid out (good), but I struggled a little to get a clear mental model of things from the book’s description.
If we start from the outside in Packages are the level at which
the Cargo package manager operates. A package has a manifest in the
form of the Cargo.toml
file, and a tree of Rust source code with
a well-known layout (e.g., src/main.rs
) that encapsulates some
related functionality. Packages can depend on other packages, and
I’d assume packages often map 1:1 to git repositories (e.g., for
packages that get published to http://crates.io). Packages can
contain at most one library and any number of binary crates.
There are well known default paths for a packages “top-level”
binary (src/main.rs
) or library (src/lib.rs
) crate, but
other binary crates can exist in subdirectories.
Cargo seems to want to restrict packages to a single library crate so there is no ambiguity what another package will import when it depends on a given package. However, a package can contain multiple binary crates, so example programs, CLIs, etc. can be shipped along with a library.
Crates stayed a little fuzzy to me after reading the book. The
book claims crates are “the smallest amount of code that the Rust
compiler considers at a time”, but they are not just a single
file of source code (e.g., if a module spans multiple files). After
reading some more external sources, it seems Rust’s crate idea is
somewhere between C++’s translation unit and an actual compiler
output artifact. That is, crates start from a single file (the
crate root) with its dependencies expanded. Where C++ would expand
the header files referenced by #include
, Rust will bring in
module code defined in other files and referred to by mod
:
src/main.rs
mod foo;
fn main() {
let f: foo::Bar = ...;
}
src/foo.rs
mod foo {
enum Bar {...}
}
Binary crates, compile into executables. The must have a single
main
function somewhere as the entry point. Library crates
compile into a library to be imported by other library or binary
crates. They don’t have a main
function, but instead will usually
export some functionality via public modules.
Modules seem to be somewhere between a C++ namespace and a Java
package. That is, modules can be created in the code by having (nested)
module sections with the mod
keyword and referring to symbols in them
with paths using the ::
separator (very similar to a C++ namespaces).
The use
keyword can be used to bring symbols into the current scope
and use ... as ...
can be used to resolve name clashes:
mod net {
pub mod ip {
pub struct IPAddress;
}
}
fn main() {
let ip = net::ip::IPAddress;
use net::ip;
let ip2 = ip::IPAddress;
use net::ip::IPAddress as MyIp;
let ip3 = MyIp;
}
But - unlike C++ - Rust seems to have a much stronger connection between modules and the file layout, similar to Java’s conventions for packages. Rust allows eliding the definition of a module and then expects another file in a well-known location with the code (note that the files do NOT repeat the module around the code):
src/main.rs
mod net; // Will have the compiler looking for net.rs
fn main() {
let ip = net::ip::IPAddress;
}
src/net.rs
// NOTE: No mod net {...} in this file
pub mod ip; // Compiler will look for net/ip.rs
src/net/ip.rs
// NOTE: Also no mod ip {...} here.
pub struct IPAddress;
Rusts visibility and encapsulation anchor on modules.
Parent modules have no visibility into their child modules
by default, and we selectively need to open up visibility
using the pub
keyword. pub
applies both to overall
modules and to individual functions, types and even
fields within structs. Modules can be re-exported
using pub use ...;
.
mod backyard {
pub mod vegetables {
pub enum Legumes { // enum fields are pub by default
Cucumber,
Squash,
}
pub struct Patch { // struct fields are private by default
pub kind: Legumes,
amount: u32,
}
impl Patch {
// because the struct has private fields, we need some public
// constructor so other modules can create instances
pub fn new(kind: Legumes) -> Self {
Patch { kind, amount: 0 }
}
}
}
}
fn main() {
use backyard::vegetables::*;
let p = Patch::new(Legumes::Cucumber);
}
struct
’s have private visibility by default, i.e.,
their fields need to be explicitly marked pub
, while
enums default to being open.
Chapter 8 – Collections
This chapter talks about common data structures. Vectors and maps are pretty much like you’d expect, Strings on the other hand are more different to other languages.
Vectors get created with the vec!
macro (I guess because
the literal syntax [1, 2, 3]
is already taken by Rust’s fixed-length
array type). Access works with the usual vector[10]
syntax,
although it’s common to see &vector[10]
for vectors that
don’t contain copyable types because you can’t really take
ownership away from the vector for a single element. There’s also
a vector.get(10)
syntax that returns an Optional
instead of
panicking for out of bounds access. Slices and support for a
range-based for loop make it easier to operate on a vector’s elements.
fn main() {
let mut v = vec![10, 20, 30, 40];
println!("{}", v[2]);
v.push(50);
match v.get(100) {
None => println!("Oops"),
Some(i) => println!("{i}"),
}
for i in &v[0..2] {
println!("{i}");
}
let b = &v[100]; // panic
}
How vectors interact with the borrow-checker might be surprising at first: adding an element elsewhere in the vector invalidates references to all elements in the vector. But when you stop for a moment and think about how a vector outgrowing its reserve needs to copy elements to another location in RAM, this makes perfect sense.
fn main() {
let a = vec![10, 20, 30, 40];
let first = &a[0];
a.push(50); // borrow error
println!("{first}");
}
The most surprising thing about Maps to me was the lack of
syntactic sugar for them in the language. They are not imported
by default in the preamble and there is no literal or macro to
help construct them (which is quite unlike, say python where they
feel like a first-class language feature). But otherwise
maps are pretty straightfoward, with methods like get
or
insert
. At least, there is good iteration support:
fn main() {
let mut v = HashMap::new();
v.insert(10, String::from("ten"));
v.insert(20, String::from("twenty"));
if let Some(x) = v.get(&10) {
println!("Found {x}!");
}
for (k, v) in &v {
println!("{} => {}", k, v);
}
}
Strings can feel more arcane in Rust because the language
refuses to play fast and loose with Unicode. Most string-ish
operations look as expected. Concatenation works with methods
like push_string
or the +
operator, and there is
a format!
macro that lets us build strings similar to println!
:
fn main() {
let mut s = "Hello".to_string();
s.push_str(" world");
s.push('!'); // pushes a single char
println!("{s}");
let u = s + " Goodbye world!"; // takes ownership of s!
let t = format!("Now we have {u} as our string");
println!("{t}");
}
But breaking up a string into characters is more rigid than
in other languages, as strings are UTF-8 in Rust and the
language tries hard to avoid errors like splitting up
a multi-byte codepoint. You can use the slice syntax on
a string and that treats it like a sequence of bytes, but
Rust will panic if your slice cuts a multi-byte codepoint
in half. Better to use the bytes
and chars
methods
to explicitly treat the string as bytes or Unicode scalar
values:
fn main() {
let s = "Здравствуйте";
for c in s.chars() {
println!("{c}");
}
for b in s.bytes() {
println!("{b}");
}
// Danger, Will Robinson, this will panic.
// The slice treats the string as bytes,
// but there's a runtime check that panics
// if we don't end up on a character boundary.
let t = &s[0..1];
}
Chapter 9 – Error handling
Chapter 9 talks a little more about concepts we’ve already
seen before, panicking for unrecoverable programming
errors and using the Result
struct for expected or
recoverable errors.
The split here reminds me of Java’s checked and unchecked exceptions, although the actual error handling feels a lot more like Go’s choice of eschewing exceptions in favor of treating errors as return values.
While there wasn’t much suprising about panicking (your
own code can use the panic!
macro if that’s ever needed),
there were some nice bits of syntactic sugar for dealing with
Result
s in this chapter, in particular the ?
syntax
for returning errors to the caller:
The
?
syntax is somewhat reminiscent of how Dart handles null values, but I think it most closely resembles a a style of coding that’s very similar to what we do at Google with theASSIGN_OR_RETURN
C++ macros and theStatusOr
type.
fn read_text_from_file(f: &str) -> Result<String, Error> {
// Assigns the file handle to f if successful or returns errors to the caller.
let mut f = File::open(f)?;
let mut s = String::new();
// Same here. Calls can also be chained.
f.read_to_string(&mut s)?;
Ok(s)
}
?
also works forOptional
, where it’ll returnNone
to the caller if it encounters a missing value. Neat.
One more nice bit is that main() can return a Result
and there
is a std::process::Termination
trait that will map well-known
error values to numeric return values:
use std::error::Error;
fn main() -> Result<(), Box<dyn error::Error>> {
let s = read_text_from_file("/does/not/exist")?;
Ok(())
}
Chapter 10 – Generics
Rust’s generics feel closer to C++ than Java, with expansion to concrete types happening at compile time (rather than Java’s erasure concept). Generics use the customary angle-bracket syntax and can show up for structs & enums, functions and impl blocks, allowing to implement generic data structures, functions and methods:
struct Foo<T> {
t: T,
}
enum Bar<T, U> {
One(T),
Two(U),
}
impl<T> Foo<T> {
fn new(t: T) -> Foo<T> {
Foo{t}
}
}
fn main() {
let f = Foo::new(27);
println!("{}", f.t);
let f = Foo::new(String::from("bla"));
println!("{}", f.t);
}
Rust also allows implementing functions or methods only for specific instantiations of a generic type:
struct Foo<T> {
t: T,
}
impl Foo<i32> {
fn double(&self) -> i32 {
return self.t * 2;
}
}
fn main() {
let f = Foo { t: 32 };
println!("{}", f.double());
let f = Foo { t: false };
println!("{}", f.double()); // Saad trombone.
}
Rust solves the problem of letting the compiler check if a given type parameter actually supports the operations a generic method wants to call with Traits. A trait is similar to an interface in other languages, describing the required function signatures a type must provide to fit. However, traits are allowed to provide default implementations for functions (that may call other methods in the trait, even if they don’t have defaults), which allows for a pattern similar to template method.
trait Numeric {
fn to_number(&self) -> i32;
fn to_number_times_ten(&self) -> i32 {
to_number() * 10
}
}
struct Foo<T> {
t: T,
}
impl Numeric for i32 {
fn to_number(&self) -> i32 {
*self
}
}
impl Foo<i32> {
// Restrict N to types that implement the
// Numeric trait, ensuring we can call to_number.
fn add<N: Numeric>(&mut self, n: &N) {
self.t += n.to_number();
}
}
fn main() {
let mut f = Foo { t: 32 };
f.add(&10);
}
I was strongly reminded of Go here, whose interfaces work in a very similar way, except that Go makes implementing an interface completely implict (a type fits if it provides the right signatures, no matter whether they relate to a given interface or not). Rust on the other hand seems to make this more explicit.
There is a bunch of syntax around traits. We can specify multiple
traits with a +
, like fn Foo<X: TraitA + TraitB>()
, and there
is a shorthand for “some type implementing this trait” using
the impl
keyword, where fn Foo(a: &impl SomeTrait)
can be thought of as a shorthand for fn Foo<A: SomeTrait>(a: &A)
.
In case the trait bounds get unwieldy, they can be broken out
into a separate section with where
. So all these define almost
the same function (one difference is that the version with
impl
doesn’t have a generic type parameter):
trait Foo {
fn foo();
}
fn foo(f: &impl Foo) {}
fn foo2<T: Foo>(f: &T) {}
fn foo3<A>(f: &T)
where
T: Foo,
{
}
impl Trait
also works for a function’s return type, but it turns out the lack of an explicit type parameter makes a large difference there.fn foo() -> impl Trait
doesn’t give the caller any control over what type the function returns, it simply promises it will implementTrait
. However,fn foo<T: Trait>() -> T
lets the caller choose a particular type the function has to return with a syntax likefoo<i32>()
.
Combining traits with the ability to only implement functions for some subset of a generic type, yields a powerful combination where we can implement specializations for cases where the type parameter has some known, well, traits. For example, I could imagine some generic algorithm working more quickly if a given data structure has a trait like random access (similar to how C++ sometimes implements special variants of algorithms for more specialized iterators):
#[derive(PartialEq, Debug)]
struct Pair<T> {
x: T,
y: T,
}
impl<T> Pair<T> {
fn new(x: T, y: T) -> Self {
Self { x, y }
}
}
// Implement the PartialOrd trait for Pair instances
// where the generic type also supports that trait.
// Compare x and use y as a tie-breaker if needed.
impl<T: PartialOrd> PartialOrd for Pair<T> {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
let cx = self.x.partial_cmp(&other.x);
if let Some(std::cmp::Ordering::Equal) = cx {
return self.y.partial_cmp(&other.y);
}
return cx;
}
}
fn main() {
let p = Pair::new(10, 20);
let q = Pair::new(10, 30);
println!("{:?} > {:?} = {}", p, q, p > q);
}
Lifetimes are sometimes needed to give the borrow checker extra information about how long references are expected to be valid. Usually Rust infers the lifetime of a reference, but there are situations where this ends up ambiguous. For example, when a function takes and returns references, it might not be clear which reference gets returned (the borrow checker is a compile time concept, so it doesn’t attempt to understand what will happen at runtime):
// This fails because the borrow checker cannot tell
// if the return value will borrow from `left` or
// `right`.
fn sort(left: &str, right: &str) -> &str {
if left > right {
return right;
}
return left;
}
We can fix these ambiguities by annotating lifetime parameters,
which use a syntax like 'a
syntax and go in similar places
as generics and reference modifiers:
// The lifetime parameter tells the borrow checker that
// it must demand the same lifetime for left and right
// and that the return value will have the same lifetime
// as the parameters.
fn sort<'a>(left: &'a str, right: &'a str) -> &'a str {
if left > right {
return right;
}
return left;
}
Lifetimes can mix with generic types, too:
// This works and tells the borrow checker that
// `left` and the return value have the same lifetime
// (and nothing is said for `right`).
fn return_left<'a, T>(left: &'a T, right: &T) -> &'a T {
left
}
// This fails because `right` and the return value do
// not have the same lifetime, so we cannot return
// right.
fn return_right<'a, T>(left: &'a mut T, right: &T) -> &'a T {
right
}
At this point the
&mut
syntax started to feel natural to me, as it turns out Rust might accrue a bunch of modifiers for a given reference, and so themut
that felt very long at first, doesn’t feel so long anymore when we say, have a parameter like&'a mut impl SomeTrait
.
Lifetime annotations are also be needed for reference members in a struct. This is for two reasons, first, we seem to use them to formally ensure the struct does not outlive its references, but also so methods on the struct can talk about the lifetime(s) of the struct’s members:
struct RefPair<'a, 'b, T> {
a: &'a T,
b: &'b T
}
impl<'a, 'b, T> RefPair<'a, 'b, T> {
fn get_something(&self) -> &'a T {
return self.a;
}
}
I would guess the most common case is that the references in a struct all have the same lifetime, so seeing multiple lifetime variables like here is probably rare.
Chapter 11 – Is this thing on?
Nice to see that Rust has builtin support for testing at the
level of an individual crate. Rust uses the module visibility
rules (remember that submodules can see upward in the tree,
but parent modules cannot look into their children unless they
export symbols with pub
) to distinguish unit tests, which
are tests written in a (special) submodule and thus can perform
white box testing with access to all the internals of a module,
and integration tests, which are tests defined in a sibling
module and that test a module through it’s public interface(s).
I found calling the latter tests integration tests a little surprising as they were still pretty squarely focused on a single package, rather than how multiple packages might integrate with each other. Arguably you might be indirectly testing the dependencies of a given package in the “integration tests”, but I’ve more often seen the term used to describe large scale tests that combine multiple complex servers, systems, etc. than a black-box, API-level test of a given package.
Unit Tests rely on function annotations, similar to modern
JUnit. Tests go into functions
annotated #[test]
, which get put into a module marked
#[cfg(test)]
. The former allows Rust to enumerate these
functions, the latter instructs the compiler to skip compiling
this module in production builds. Tests us macros like assert!
,
assert_eq!
and assert_ne!
for their assertions:
pub fn add(a: i32, b: i32) -> i32 {
a + b
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn add_works() {
assert_eq!(add(2, 2), 4, "Add should return 4");
}
}
While I think it’s great how Rust encourages adding tests right next to the code, I immediately found myself wondering if there was a way to split the test code from the actual library (mabye because I’ve been working with GUnit/C++ and Java a lot that both have separate, but well-known files for tests). Turns out we can use the exernal modules from an earlier chapter for this, although it seems idiomatic to start with tests in the same file:
lib.rs
mod foo;
foo.rs
pub fn add(a: i32, b: i32) -> i32 {
a + b
}
#[cfg(test)]
mod test;
foo/test.rs
use super::*;
#[test]
fn add_works() {
assert_eq!(add(2, 2), 4, "Add should return 4");
}
Tests are run with cargo test
, which runs unit and integration
tests for the whole package by default. It also allows filtering
tests name cargo test foobar
(will run test methods matching
*foobar*
). If we prefer to not run tests by default (e.g., because
they are very large or slow), we can mark the test methods #[ignore]
.
Such tests can be run alone with cargo test --ignored
or along with
all tests with cargo test --include-ignored
.
Integration tests are put into a special tests
directory, next
to the package’s src
directory. Rust already treats this as
test-only code, so no #[cfg(test)]
is needed there. Rust
seems to expect each (suite) of integration tests to start from
a file in the top-level tests directory. Shared code between
integration tests is handled via the older module naming convention
with something like tests/shared/mod.rs
:
tests/adds_nums.rs
mod shared;
use rust_packages::foo;
use shared::*;
#[test]
fn adds_three_nums() {
assert_eq!(get_num() * 3,
foo::add(get_num(), foo::add(get_num(), get_num())));
}
tests/shared/mod.rs
pub fn get_num() -> u64 {
return 2;
}
The book only mentions, but doesn’t yet cover Documentation tests, which seem to be about compiling code snippets in documentation comments. That sounds like an exciting idea.
Chapter 12 – Let’s build something
Exciting, time for a slightly larger project. Seems we’re building a toy version of grep. TIL that grep seems to stand for “globally search for a regular expression and print”, a nice bit of CS history (the “re” was pretty obvious, but I didn’t know about the rest).
The project starts out simple enough, reading a few command-line
parameters, where Rust eschews the C convention of passing these
as parameters to main and instead has a Python-esque library
function for access via std::env::args()
. Also reading the
file stays pretty basic with std::fs::read_to_string
and
we just print out stuff for a start.
NOTE: It hadn’t really sunk in until I read chapter 13, but
std::env::args()
is an iterator, which is different to the array/vector type in other langauges (but that can be produced with acopy()
if needed).
I like that the book immediately suggests to break up the
main function to help separate concerns, emphasizing
that logic should move into lib/
where it can be
tested and re-used, and that all that remains in main
should be a very simpler driver for logic in lib. So
we create a Config
structure to hold the arguments
(this looks a lot like the various Options
) structures
that we have floating around for libraries in G.
My first inclinationf or handling too few arguments
was to print something to stderr and then look for
some way to exit the process (I found std::process::exit
):
fn usage() {
println!("Usage: minigrep <query> <file>");
}
impl Config {
fn build(args: &[String]) -> Config {
if args.len() < 3 {
usage();
process::exit(1);
}
// Parse args
}
}
NOTE: It seems Rustaceans use
new
only as a plain construct that should never fail, andbuild
for more logic-heavy construction that might fail.
The book instead suggests are more idiomatic way,
returning a Result
and printing that in main. In
the book unwrap_or_else
with a closure is used.
Funny that the book now uses println!
and
sys::process::exit()
. I chose to instead make use
of the fact that main can return an error, too,
which allowed me to use the convenient ?
syntax:
fn main() -> Result<(), Box<dyn Error>> {
let args: Vec<String> = env::args().collect();
let config = Config::build(&args)?;
// Snip
}
Once the code was moved to a library main.rs
looked
very tidy:
use std::{env, error::Error};
use minigrep::Config;
fn main() -> Result<(), Box<dyn Error>> {
let args: Vec<String> = env::args().collect();
let config = Config::build(&args)?;
minigrep::run(&config)
}
Next us is writing the actual search functionality, for which the book recommends using test-driven development (TDD). With TDD in mind, having the test module live in the same file as the code made sense (but I still suspect this won’t scale well if modules become larger, but purists would likely argue that this is a signal that the module should be split up anyway).
I found it interesting how Rust made me focus on the memory usage here,
making me wonder if my search function should make copies of strings
or just return references to lines that I assumed the caller had in
scope anyway. I toyed around with taking a slice of string slices
&[&str]
for the incoming lines (so I could write stuff like
search("needle", &["lines", "in", "haystack"])
in my tests), but
I quickly found that this was hard to match to the iterator that
string::lines
returned by default. I decided to wait for chapter 13,
which discusses them next.
The chapter wraps up by showing access to environment variables
through std::env::var()
(a nice parallel to std::env::args()
)
and using eprintln!()
to print to stderr instead of stdout.
I was delighted that my errors returned from main already
were printed to stderr by default (vindication! ;).
Chapter 13 – Let’s get functional
Most programming langugages have picked up a few functional tricks these days and it looks like Rust has closures and iterators. I was immediately curious how lifetimes and borrow checking would interact with the capturing in closures.
Closures have similarities to functions, but use pipes to hold the list of arguments. Type annotations are allowed but usually omitted, and closures assigned to a variable can be invoked with the function call syntax:
let closure = |x: i32| -> i32 {x + 1}
let closure = |x| x + 1;
let closure_with_no_args = || 20;
let result = closure(24);
NOTE: Rust will infer a single (set) of types for a closure, so a closure like
let c = |x| x;
will be assigned types the first time it is used. Code likec(3); c(5.0);
won’t compile (because the parameter type will be inferred toi32
from the first call while the second tries to pass afloat
).
Unlike C++, Rust doesn’t seem to explicitly list values captured
from the environment, but instead examines the code in the closure
to figure this out automatically based on how the value is moved.
Captures can be immutable borrows, mutable borrows (e.g., if some
mutating function is called on a variable from the environment)
and finally there is the move
keyword that goes in front of
the entire closure to force moving ownership of captures (e.g.,
when passing data to a new thread):
let mut v = vec![10];
let c_const = || v[0];
println!("{v:?}");
println!("{}", c_const());
let mut c_mutable = || { v.push(20); v[0] };
println!("{v:?}");
println!("{}", c_mutable());
let c_owned = move || v[0];
println!("{v}"); // Error, v is owned by the closure now
println!("{}", c_owned());
I was surprised that there didn’t seem to be ways to, e.g., move
only a subset of the captures, but when I played around with
things it seems such situations can be handled by creating
references in the surrounding code:
let s = String::from("hello");
let t = String::from("world");
let tr = &t;
// Moves ownership of s, but t is untouched
// as we only capture it by reference.
let c = move || {println!("{s} {tr}");};
println!("{s}"); // Error, s moved into closure.
println!("{t}");
Rust automatically makes closures (and functions) implement special(?) traits so functions can specify what kind of closure or function they want to receive:
- FnOnce is a closure that can be invoked at most once, e.g., because it returns an owned parameter.
- FnMut is a
closure that can be called multiple times and may mutate it’s
environment (also acceptable where an
FnOnce
is required) - Fn is a closure
that can be called mutiple times and that doesn’t mutate
state (also acceptable where an
FnMut
andFnOnce
is required).
fn accept_fn_once(f: impl FnOnce() -> String) {
f();
f(); // Error cannot invoke FnOnce multiple times
}
fn accept_fn_mut(mut f: impl FnMut() -> String) {
f();
f();
f();
}
fn accept_fn(f: impl Fn() -> String) {
f();
f();
f();
}
fn test_closures() {
let mut t = String::from("bar");
let s= String::from("foo");
accept_fn_once(|| { String::from("baz") }); // Fn
accept_fn_once(|| { t.push_str(" bar"); t.clone() }); // FnMut
accept_fn_once(|| { s } ); // FnOnce
accept_fn_mut(|| { String::from("baz") });
accept_fn_mut(|| { t.push_str(" bar"); t.clone() });
// Error: cannot pass a FnOnce, as function may call
// closure multiple times, but s can only be moved out
// of the closure a single time.
accept_fn_mut(|| { s });
accept_fn(|| { String::from("baz") });
// Error: cannot pass a FnMut or FnOnce, as function expects
// a side-effect free closure that can be called multiple
// times.
accept_fn(|| { t.push_str(" bar"); t.clone() });
accept_fn(|| { s });
}
Iterators in Rust feel relatively Python-esque. Collections
supply them with the collection.iter()
, they are lazy and
iteration has a single next()
method, that returns None
when there are no more elements. Also similar to Python,
iterator are equally acceptable in the range based for loop:
let v = vec![10, 20];
let i = v.iter();
for el in i {
println!("{el}");
}
let mut i2 = v.iter();
i2.next(); // 10
i2.next(); // 20
i2.next(); // None
NOTE: One thing I found surprising is that the
i1
iterator in the range based for loop does not need to bemut
whilei2
needs to be marked as such. Seems there is some hidden “magic” in the range-based for loop that reassigns the iterator to some mut reference.
Iterators are - of course - implemented through a trait. The
Iter
trait has a single required method called next()
and then
offers lots of convenience methods based on that, e.g., to
map()
an iterator to another type, to filter()
elements
or to collect()
the values of an interator into a collection.
Collections offer an into_iter()
method that takes ownership
of the underlying collection, e.g., for operations like
filtering down a larger collection to a smaller one with
code like:
let large = vec![10, 22, 43, ...];// Some large collection.
let small = large.into_iter().filter(|s| s % 10 == 0).collect();
// large can be freed now, as the iterator took ownership.
The chapter concludes by using iterators to make the minigrep program from the previous chapter a bit nicer.
Chapter 14 – We’ll need a bigger boat, more cargo.
This chapter extends on cargo and crates with an eye towards publishing code, e.g. on http://crates.io.
The chapter mentions that the dev
and release
profile
can be customized through sections in Cargo.toml
.
Documentation comments are similar to Javadoc or Pydoc
and use three slashes ///
or //!
syntax. The former
documents the item following the comment (e.g., structs,
enums, functions, etc.) while the latter documents the
thing containing the comment (and is used for file-level
comments in crates or modules). cargo doc
can turn the
documentation comments into HTML documentation, that gets
put into target/doc
by default.
Doc comments support Markdown and regular Markdown sections are used to document common things like Panics, Errors, Safety and Examples.
/// Adds one to the number given.
///
/// # Examples
///
/// ```
/// let arg = 5;
/// let answer = my_crate::add_one(arg);
///
/// assert_eq!(6, answer);
/// ```
pub fn add_one(x: i32) -> i32 {
x + 1
}
One really cool feature of Cargo is that it also compiles and runs example code in doc comments, making sure the examples do not go out of date with code changes. I think that’s a really neat idea.
Publishing to http://crates.io/ doesn’t have many surprises.
Seems the site uses github logins as the basis and - once
metadata like a description or license has been added to
Cargo.toml
- packages can be published with cargo publish
.
Publishing is permanent, seems Cargo learned a lesson from
npm’s left-pad fiasco,
but versions can be yanked
to discourage new projects to
depend on a broken package version (but this doesn’t delete
code, it just adds metadata to discourage dependencies).
Cargo workspaces allow creating a project that consists
of a number of packages that depend strongly on each other,
allowing both local and remote dependencies and adding some
conveniences like a combined build & testing, as well as
sharing Cargo.lock
so external dependencies with compatible
versions are only fetched and compiled once for all packages.
NOTE: I wonder if people publish their Cargo workspaces in place like github, or if that’s more of a local developer concept.
The chapter also covers installing binaries using cargo install
,
which downloads binary crates, compiles and installs the
resulting binaries in ~/.cargo/bin
.
Similar to git, cargo can be extended
with new subcommads by putting binaries named cargo-something
in the path (which then allows running them as cargo something
).
I assume it is somewhat common to have ~/.cargo/bin
in your
path and then installing new subcommands using cargo install
.
Chapter 15 – Smart Pointers
A-ha, it appears we can’t quite live without “pointers”
of some kind! This chapter introduces smart pointers like
Box<T>
or Rc<T>
wnich lets us store owned data on the
heap.
Smart pointers come in handy when we have dynamic data
structures on the heap where we still need ownership
in the picture, but cannot tell ahead of time how
the structure will evolve. Take a linked list as an
example (the book uses Cons
instead, but it’s the
same idea). A naive attempt with a plain variable fails
because that gives the struct infinite size
(the
compiler tries to figure out the size of the Node struct,
but looking at the members, one of them is Node itself):
struct Node {
value: i64,
next: Node,
}
We can get a version with references to compile (now the size of the struct is the size of an i32 plus the size for a reference/pointer):
struct Node<'a> {
value: i32,
next: &'a Node<'a>,
}
But that runs into another problem. Who owns all the Node
structures? next
has no ownership, so we would need to
introduce some external owner of all the nodes to
guarantee they all have the same lifetime 'a
, which
adds extra complexity. With a Box
, we can build a
traditional linked list (just like mother used to make),
by having next as a Box<Node>
.
NOTE:
Box
does not supportnull
values, so unlike in C++, we still need to resport to an enum orOption
to express the end of list case.
struct Node {
value: i32,
next: Option<Box<Node>>,
}
fn main() {
let head = Box::new(Node {
value: 1,
next: Some(Box::new(Node {
value: 2,
next: None,
})),
});
}
Box<T>
is pretty much the equivalent of C++’s
std::unique_ptr
. A smart pointer that owns its memory
and uses frees up storage when it goes out of scope.
Box (and Rc<T>
) is a structure (that store a pointer
and a few bytes of metadata like size or refcount) that
implement the Deref
and Drop
traits.
The Deref
trait is enables smart pointers to take
part in deref coercion
where the compiler silently
inserts calls to deref()
or deref_mut()
as needed
so smart pointers can behave like regular references
in expressions like *a
or a.member
. There is also
a DerefMut
trait that allows for mutable (de-)references.
There are some contributed crates around that allow to
#derive
the Deref/DerefMut code.
struct MyBox<T>(T);
impl<T> Deref for MyBox<T> {
type Target = T;
fn deref(&self) -> &Self::Target {
return &self.0;
}
}
Drop
essentially implements a destructor, defining
some code to run when a type goes out of scope. The
smart pointers use this to free the memory they own.
impl<T> Drop for MyBox<T> {
fn drop(&mut self) {
// perform cleanup
}
}
Rc<T>
is a shared pointer (think C++'
std::shared_ptr
) that uses reference counting to
free up storage when the last (strong) reference to
the data is gone. Rc::clone(&p)
makes creates a
copy of an existing Rc p
, increasing the number of
(strong) references. To help avoid reference cycles,
we can also Rc::downgrade(&p)
, which returns a
Weak<T>
that doesn’t own the value, but needs to
to be upgrade
d before trying to access the value.
NOTE:
Rc
only allows immutable borrows. That is you can not mutate values through anRc
pointer. I believe this is because the borrow checking rules cannot be statically checked in a shared ownership scenario.
use std::rc::Rc;
fn main() {
let a = Rc::new(27);
// Creates a second Rc<T> increasing the refcount
// and keeping the value until both a and b go
// out of scope.
let b = Rc::clone(&a);
// Gets a Weak<T> reference that doesn't affect
// the lifetime of the value ...
let c = Rc::downgrade(&a);
// ... but that means the value may be gone by
// the time we try to access it, so we must
// `upgrade` it back to a real Rc<T>, which might
// fail at runtime:
let d = c.upgrade().expect("Aiiie, value is gone");
}
The chapter also introduces RefCell<T>
which is a variant of
Box<T>
that defers some borrow checking to runtime. This allows
handling situations where the static rules of the compiler are too
rigid to allow some legal situations, for example when using
Rc<T>
to handle shared ownershiop.
The book presents the interior mutability pattern where we can use
a RefCell<T>
to allow borrowing a mutable value behind an immutable
reference. The book motivates this by a mock-object scenario where
the methods in some Trait we’re trying to mock take immutable
&self
references, but we need to change values internal to our
mock struct (I was reminded a bit of C++’s
mutable keyword here
which also allows bridging a const
interface with some internal
mutability).
RefCell<T>
has borrow()
and borrow_mut()
functions that
allow retrieving references, checked at runtime:
fn main() {
let a = RefCell::new(Node{value: 42});
let b = a.borrow_mut();
// Panics at runtime because we cannot have two mutable
// borrows at the same time.
let c = a.borrow_mut();
}
The book also has an example of combining Rc<T>
and RefCell<T>
to combine multiple-ownership and mutability:
// Wrap the value in a RefCell to allow mutating it, even though
// we arrive at it by following the Rc<List> pointers that only
// allow immutable borrows.
enum List {
Cons(RefCell<i32>, Rc<List>),
Nil,
}
I waasn’t really fond of how this ended up cluttering the data structure with what I’d consider to be more of an encapsulation problem, where we might want to restrict some users of our data structure to only immutable operations (e.g., by having trait methods taking a const
&self
) and allow mutations through users. But I guess that’s a very OOP mindset to a language that puts a greater emphasis on data structures and follows Go’s idea of not combining data and code into rigid classes.
Here’s an overview of how we can refer to some data now:
+————–|—————————–+———-+——————————————————+
- Construct | Syntax | Storage | Ownership | +————–|—————————–+———-+——————————————————+
- Variable |
a = <expr>
| Stack | Owns the data | - Reference |
a = &<expr>
| Stack | No ownership, might need a lifetime | - Box |
a = Box::new(<expr>)
| Heap | Owns the data | - Rc |
a = Rc::new(<expr>)
| Heap | Shared ownership (refcount), only immutable borrows | - RefCell |
a = RefCell::new(<expr>)
| ? | Like Box w/ runtime borrow checking | +————–+—————————–+———-+——————————————————+
Chapter 16 – Concurrency
Like most modern langauges, Rust has constructs to support concurrent and/or parallel programming. It seems Rust chose to support both the message-passing and shared access models, unlike some other langauges like Go. Rust uses a 1:1 model where Rust threads are OS threads (i.e., no green threads, fibers or similar).
Most of the threading support lives in the std::thread
crate. Threads
are created using thread::spawn
, which takes a closure to run ( hence
the support for move
closures that can take ownership) and return
a handle that can be joined.
Message passing is implemented using channels, in a way that’s very
reminiscent of Go’s model (the book even includes
a Rob Pike quote). The mpsc
package (many producers,
single consumer) implements most of the logic here.
Channels are typed, one-way “pipes” for data, created with mspc::channel
, which
returns a tuple with the transmitter and receiver of the channel. Methods
like send
, recv
(and the non-blocking try_recv
) on these can be used
to send and receive messages. Channels take ownership of values meaning
Rusts borrow checking will make sure only one side of the channel gets to
own a value at any given time (and together with the borrowing rules that
prevent data races, this should avoid a whole class of issues). There is a nice
bit of syntactic sugar where the receiver implements the Iter
trait, so you
can write code like for value in rx {...}
:
use std::{sync::mpsc::{self, Sender}, thread};
fn send_strings(tx: Sender<String>, thread_id: u32) {
for i in 0..10 {
tx.send(format!("I'm string {i} from thread {thread_id}")).unwrap();
}
}
fn main() {
let (tx, rx) = mpsc::channel();
let tx2 = tx.clone();
thread::spawn(move || {
send_strings(tx, 1);
});
thread::spawn(move || {
send_strings(tx2, 2);
});
for r in rx {
println!("{r}");
}
}
Rust also supports shared state or shared memory. Rust offers a mutex
to guard access to memory. However, unlike in other languages where mutexes
are standalone structures, Rust’s Mutex<T>
is another smart pointer that
actually owns the data it guards, which means the type system, borrowing
rules and Drop
trait can ensure a mutex must be unlocked, references can’t
be shared and the mutex is release again. However, mutexes are single-owned
so there is also Arc<T>
which is an atomic reference counted smart pointer.
Apparently it’s a common pattern to wrap a Mutex<T>
or one of the types in
std::sync::atomic
into an Arc<T>
:
let mtx: Arc<Mutex<i32>> = Arc::new(Mutex::new(0));
let m = Arc::clone(&mtx);
let h1 = thread::spawn(move || {
// use m with something like m.lock()
});
// Create more clones of mtx here for other threads.
Rust has two Traits for Concurrency in std::marker
called Sync
and
Send
to annotate types as thread-compatible. Send
can be used to annotate
types that are safe to send between threads (through a closure). Rc<T>
is a counterexample. Sync
means a type is safe to share between threads
(like Mutex<T>
) and implies that for a Sync
type T
, &T
would be
marked Send
(i.e., only if a type is Sync
is it fine to Send
references
to it across threads). The book notes that most primitive types are
both Sync
and Send
and that structures made up entirely of types with
one of these traits automatically get marked the same. The book notes that
building custom Sync
or Send
types is tricky and requires unsafe code,
so the recommendation is to start with structures made up of safe pieces.
Chapter 17 – OOP
This chapter discusses object-orientation and Rust, highlighting some ideas that are common in OOP and concluding that Rust has equivalents for some but not for others.
Overall, Rust and Go feel similar in their overall approach here. Both eschew the “traditional” OOP focus on classes, objects and inheritance and instead favor a structs with associated functions approach and use duck typing through interfaces / traits to support Liskov substitution, although Rust avoids Go’s implict “if it has all the methods, it implements the interface approach” that always felt a bit too loose to me.
The book brings up the concepts of Combining Data and Behaviour,
Encapsulation and claims Rust meets these. The former, because impl
blocks can associate functions with data types (and you can invoke them
through the common instance.function()
syntax). The latter because
module visibility (via the pub
keyword) can be used to hide certain
data members from callers, forcing them to access them through associated
functions instead. How this anchors on modules/packages rather than a single
datatype or class reminds me a little of Java’s
package private
concept.
Inheritance is a mechanism Rust (and Go) choose to avoid. Rust instead
favors generics and
traits which are distant cousins to interfaces. Generics work more like C++’s templates, which also
favor monomorphization (i.e., resolving a template to “multiple implementations”
at compile time), rather than Java’s erasure.
Traits are similar to interfaces in the sense that they allow us to ask
for dynamic dispatch at a callsite with dyn Trait
saying we expect
some reference to a datatype that conforms to Trait
, which ensures we
can call the methods defined by that trait, even though we might receive
different types. The ability to implement traits on types we don’t own
(reminiscent of Ruby or Go) and the fact that traits can themselves
specify trait bounds on the objects they want to work with are concepts
that aren’t often present with interfaces in other languages.
The book points to two reasons for why we might reach for inheritance,
namely code re-use and polymorphism. For code re-use, the book points
to default trait method implementations. A good example for this is the
Iterator trait,
which offers over 70 provided methods for only a single required one.
I’m strongly reminded of the
Template Method design pattern
here, where we offer some covenience or structure based on some smaller set
of required methods. And for polymorphism, the book points to traits,
trait bounds and dynamic dispatch via dyn
.
The example in the book is a gui library that has a Draw
trait that
various structs for shapes implement, allowing us to keep a list
of different types in a collection like a Vec<Box<dyn Draw>>
and then enumerate over them calling the draw()
trait method
on each of them.
The chapter wraps up by implementing the
State pattern where an
object (our blog::Post
in this case) delegates various methods to
an internal “state” trait object that can perform state-specific
behaviour and may request to transition to another state in the process.
The polymorphism here is expressed through a State
trait and multiple
different datatypes that implement this trait, allowing the main class
to have a state: Option<Box<dyn State>>
and delegating various calls
there.
The book the contrasts this with a more idiomatic approach in Rust,
that instead expresses the different states a post can have by
making them different structs, with different methods. The problem
of not handing out the text of a non-published article gets solved
by simply not having a content()
method on the DraftPost
type,
for example.
NOTE: I found it interesting how
Post::new
now returnedDraftPost
, which I found surprising at first. But given methods in an impl block are simply associated code, there is nothing but maybe a soft convention to return the same type fromnew
.
This solution felt surprising at first. I quite liked how this moved some errors to the compile time. However, I think this now makes it harder to, e.g., have a list of all posts regardless of their status, but I think this might be another example of OOP-inspired thinking, and maybe it’s a better approach now to, for example, separate such lists into multiple single-type lists for the various posts.
Chapter 18 – Patterns
This chapter discusses patterns that allow matching the
structure of data. We’ve seen these in the context of Rust’s
ubiquitous match
keword and constructs like if let
.
Notable takeaways from the early chapter for me were
that if let
and if
can combine freely, that there is
a while let
syntax that can come in handy for, e.g.,
consuming a sequence of optional values and that the
destructuring of tuples used in a few places throughout
the book in for
loops with for (k,v) in ...
, let
expressions like let (a, b) = (3,4)
is also patterns
at work, and that this is even allowed in function parameters
with something like fn takes_pair(&(a,b): &(i32, i32))
.
Comparing how a pattern is used in match
or if let
where the structure might or might not match with how
this is used in let
or with function parameters leads
to refutable patterns (i.e., patterns that might fail
to match the structure of the data at runtime) vs.
irrefutable patterns (i.e., patterns which will always
match at runtime). As expected refutable patterns only
work in places like match
, if let
and while let
where
we have a means to deal with the fact that they didn’t match.
NOTE: There was a nice aside in this chapter about shadowed variables introduced by a pattern. These only become valid at the start of the curly brace of the code block the pattern starts, so writing code like
if let Some(y) = value && y > 30
might use the wrong y! However, as of 2024 the compiler warns that let in this position is unstable.
The chapter then walks through all the various bits of syntax
that can be used to create a pattern. Notable to me were
multiple patterns with the |
operator, e.g., 1 | 2 | 3
.
This comes in vary handy when multiple values require the
same logic in a match
. Similarly you can match ranges.
NOTE: I learned the hard way that it’s a good idea to use inclusive ranges for these cases. For example, you’d want to use
1..=9
to match digits, and not1..9
as the latter would NOT match the 9.
Destructuring allows extracting individual values from
larger data types like tuples or structs. It allows both
testing for values and binding them to variables. And by using
the ..
syntax, we can ignore fields we’re not interested in:
struct Point3 {
x: i32,
y: i32,
z: i32,
}
let p = Point3{x: 0, y: 10, z: 20};
match p {
Point3{x: 0, y, ..} => println!("On y,z plane"),
_ => (),
}
Point3{x: 0, y, ..}
is a refutable pattern that will
match if x == 0, and will destructure the Point3, assigning
y
to a local variable of the same name (if we wanted to
rename, we could say y: local_name
).
The @
operator, allows combining assignment to a variable
with some pattern check, e.g., Point3{x @ 0..=10, y, ..}
would both bind the x value to a local variable and only
mach if x is in range 0..=10.
Destructuring can nest so we can inspect types more deeply if needed:
match msg {
Message::ChangeColor(Color::Rgb(r, g, b)) => {...}
Message::ChangeColor(Color::Hsv(h, s, v)) => {...}
}
Ignoring values can come in handy if we only need some
part of a structure’s information. We can ignore values through _
(ignores a single value), _variable
(assigns to the variable but
shuts up compiler warnings about it being unused) and ..
which
ignores other values a long as it is unambiguous. So (first, .., last)
works, but (.., second, ..)
does not.
Match guards allow attaching if statements to a pattern in a match, which is a nice way to express some more complex conditions:
let num = Some(4);
match num {
Some(x) if x % 2 == 0 => println!("The number {x} is even"),
Some(x) => println!("The number {x} is odd"),
None => (),
}
Chapter 19 – Advanced features
This chapter is a grab bag of things that weren’t discussed elsewhere in the book.
The unsafe keyword allows for code that opts out of some of Rust’s guarantees, as there are safe constructs that do not fit within the defensive bounds of Rust’s compiler rules. I assume C# was the inspiration here.
Unsafe code can deal in raw pointers (speicifically, dereferencing these):
let address = 0x012345usize;
let r = address as *const i32;
It also allows calling unsafe functions, mutating global variables and implementing unsafe traits.
Usafe also plays a role in Language Interop. Rust supports the Foreign Function Interface (FFI) where functions can be marked with their ABI, usually the “C” one. Calls to such functions must happen in unsafe blocks, as Rust can’t reason about what they might do:
extern "C" {
fn abs(input: i32) -> i32;
}
fn main() {
unsafe {
println!("Absolute value of -3 according to C: {}", abs(-3));
}
}
Vice-versa we can ask Rust to make functions available for calls through the C ABI with syntax like:
#[no_mangle]
pub extern "C" fn call_from_c() {
println!("Just called a Rust function from C!");
}
The chapter has a few examples where unsafe might be needed, including mutating a global variable, using union types or having unsafe traits (where at least one of the methods is unsafe, claiming it needs some invariant the compiler can’t check).
Advanced traits gives a fuller discussion of traits, starting with Associated types, like in the Iterator trait:
pub trait Iterator {
type Item; // Associated type
fn next(&mut self) -> Option<Self::Item>;
}
This syntax exists because generics on traits are sometimes too flexible, allowing us to, e.g., implement the same trait multiple times with different types for a single struct (the book touched on this, but it was nice to come up with a concrete example):
struct Some {}
// With generics, Rust treats each concrete instantiation
// of a trait as a separate entity, allowing us to, e.g.,
// implement the "same" trait multiple times on the same
// type.
trait Foo<T> {
fn foo(&self) -> Option<T>;
}
impl Foo<i32> for Some {
fn foo(&self) -> Option<i32> {
Some(5)
}
}
impl Foo<bool> for Some {
fn foo(&self) -> Option<bool> {
Some(true)
}
}
fn main() {
let x = Some{};
// Rust allows multiple methods with the same name
// from traits, and will distinguish based on types, etc.
let b: Option<bool> = x.foo();
let i: Option<i32> = x.foo();
// If the types, etc. aren't enough, Rust allows
// using this syntex to clarify which trait's methods we mean
let i: i32 = Foo<u32>::foo(&x);
let b: bool = Foo<bool>::foo(&x);
// And there is an even wilder syntax that is needed
// if the method doesn't take &self meaning we have no
// type to anchor on:
let z = <SomeType as SomeTrait>::plain_method();
}
Default type parameters allow users of a Generic to elide a type parameter
by giving it a default. I was intrigued by the
impl Add<Meters> for Millimeters
example, where the default type parameter
of trait Add<Rhs=Self>
gets overridden to allow specializing addition of two
different types. Somewhat similar to
C++’s default values
for function arguments (which Rust does not support), the default type parameters
allow later introducing generics without having to update all the users of a trait:
// This trait...
trait Foo {
fn foo(&self) -> u32;
}
// ... could become that
trait Foo<T=u32> {
fn foo(&self) -> T;
}
// Without needing to change
impl Foo for SomeType {}
Supertraits allow for a kind of “inheritance” between traits (remember that Rust does not implement inheritance in the regular OOP sense), where a trait can also assume it’s “supertrait” is implemented for a given type:
trait Stringable {
fn to_string(&self) -> String;
}
trait SuperStringable: Stringable {
fn superstring(&self) -> String {
self.to_string() + " super!"
}
}
NOTE Rust requires a type to implement both traits. So inheritance is a brittle metaphor here. An impl section
impl SuperStringable for MyType
will NOT be allowed to implement theto_string
method. Instead you’ll get a compiler error thatMyType
fails to implement theStringable
trait thatSuperStringable
requires.
The newtype pattern refers to the idea of creating thin wrapper types to “import” foreign types into the current crate, e.g., if we want to implement some external trait on an external type (remember that the “orphan rule” only allows implementing a trait if either the trait or the type is local to your crate):
// Needed because both Vec and fmt::Display are not from this crate.
struct Wrapper(Vec<String>);
// Could implement the Deref trait to allow invoking
// the base type methods without explict unwrapping.
impl fmt::Display for Wrapper {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "[{}]", self.0.join(", "))
}
}
The chapter also discusses avanced uses of the type system,
covering type aliases (pretty much
C++ typedefs)
and comparing that to the newtype pattern. The former just
give a type an alias name, but both are the same types, while
the newtype pattern creates a new (wrapper) type that cannot
be directly assigned, etc (I still like the units example they
had with Meters
and Millimeters
there). Interestingly,
type aliases also support generics:
type Result<T> = std::result::Result<T, std::io::Error>
The ! type is used for functions that never return, and
it allows things like match statements where some arms don’t
return values (e.g., due to a panic!
or continue
).
The automagic Sized
trait denotes all types with a known
size (primitive types, structs of primitive types, pointers, etc).
Counterexamples are str
(strings of different length) or
traits / trait objects. Dynamically Sized Types (DSTs) or
unsized types can only be handled behind references of pointers
(which have a known size). It seems by default, Rust restricts
type parameters in generics to Sized
and there’s a special
?Sized
syntax to also allow accepting unsized types (which
are then forced to come through a reference/pointer).
For advanced functions and closures the book discusses
function pointers. I had already tried earlier if I could
put a function where a closure was expected, which - to my
delight - worked just as expected. For example, we can say
list_of_numbers.map(my_function)
. Turns out functions
have a type fn
(not to be confused with the Fn
trait),
and we can use that to explicitly take function pointers:
fn eff_it(f: fn(i32) -> i32) -> i32 {
f(5) + f(10)
}
But function pointers implement all the Fn
traits, so
unless there’s a specific reason to only accept function
pointers, a better pattern is to accept impl Fn
, which
can handle both.
The chapter wraps up with Macros which come in
both a pattern-matchy variant that directly generates
source code using something similar to a match statement.
These are known as declarative macros
and are how,
e.g. the vec!
macro is implemented (think a little
more disciplined and powerful C-style macros). There
is also a procedural
family of macros, that instead get
to run arbitrary rust code on a TokenStream
. This is
how #[derive(Display)]
and friends are implemented
(I was a little disappointed that this wasn’t a generic
mechanism, but thinking about it, it can’t be as it
often needs to generate code specific to a given type).
I decided to gloss over the macros for now, as they
are rather different in syntax and it felt like this
might be a topic to pick up when a concrete need came up.
Chapter 20 – Let’s serve some web
Alas, the final chapter, putting it all together in a handwritten Web server.
It was nice to see how simple the basic server came together even without a lot of frameworks, but I wasn’t too keen on how the HTTP firstline was mapped wholesale with a match. I decided to go a little fancier and implement a basic handler map, which came together nicely, but quickly ran into trouble once threads entered the picture.
NOTE The issue I ran into was that Rust didn’t like sending my handler map between threads. I had chosen
Fn
trait objects as my handlers, to allow callers to pass both closures and function pointers as handlers. What I didn’t realize is thatFn
didn’t implySend
by default, which caused Rust to consider my overall handler map notSend
-able. Changing the type toFn(...) + Send
, requiring both traits, made things work.
pub mod handlers {
use std::{collections::HashMap, net::TcpStream};
pub type Handler = dyn Fn(TcpStream) + Send;
pub struct HandlerMap {
map: HashMap<String, Box<Handler>>,
not_found: Box<Handler>,
}
impl HandlerMap {
pub fn new(not_found: Box<Handler>) -> HandlerMap {
HandlerMap {
map: HashMap::new(),
not_found: not_found,
}
}
pub fn bind(&mut self, path: &str, handler: Box<Handler>) {
self.map.insert(dbg!(path.to_string()), handler);
}
pub fn lookup(&self, path: &str) -> &Handler {
match self.map.get(dbg!(path)) {
Some(handler) => handler.as_ref(),
None => self.not_found.as_ref(),
}
}
}
}
fn main() {
let mut handler_map = Arc::new(Mutex::new(HandlerMap::new(Box::new(|h| {
respond_with(h, "404.html", 404)
}))));
handler_map
.lock()
.unwrap()
.bind("/", Box::new(|h| respond_with(h, "hello.html", 200)));
handler_map
.lock()
.unwrap()
.bind("/slow", Box::new(|h| serve_slow(h, "hello.html", 200)));
// Same listener code as in the book, but using the handler map to look up
// the path component of the HTTP firstline.
}
The later part of the chapter implementing the thread pool served as a nice
reminder of techniques like Arc<Mutex<...>>
to share data between threads,
using Arc
to share ownership and Mutex
to prevent concurrent modification.
One final part that stood out to me was the discussion of why the code to consume
jobs in the pool wasn’t using while let ... a.lock()...
and how that would
cause things to hang, as while let
and friends hung on to temporaries until
the the code block was done executing, while let x = ... a.lock() ...
would
drop them after assigning the value. This struck me as the kind of sublety that
separates more experienced devs from newer folks.
And that’s it, my first taste of Rust. Guess I have to find some actual projects to play with next…