Advent of Code 2024 - In Rust - Days 1-5
After my recent adventures learning Rust, I decided to follow Advent of Code 2024 in Rust.
It was my first AoC and I came for the puzzles and learning, not for the leaderboard. I’ll chip away at the puzzles at my leisure, in the occasional evening here or a weekend there. This article covers days 1 through 5. I’ll publish later articles as I solve more puzzles (or never, if I decide to pursue an actual project instead or life happens).
My goal here is to share a bit about the challenges I encountered and any insights (or lack thereof) I had on the puzzles and on doing them in Rust. I don’t plan to publish the full source code, so I won’t spoil the fun of figuring things out yourself, but I maybe having some notes on problems I encountered helps someone if they are stuck.
Before we dive in, let me compliment Eric and the AoC team on AoC. It’s clear that a lot of thought has gone into the puzzles, but I also dig the charming story and the fantastic “ASCII with sparkles” presentation.
Day 1 – Historian Hysteria
The first puzzle aks to compute a few things on two lists of numbers, including their pairwise distance and multiplying numbers by their frequency in the second list.
Not much to see here, the main effort went into creating some basic outline of Rust crate with per-day modules and parsing the input. The actual solutions were mainly done with sorting/mapping and creating a frequency map for the second part.
Day 2 – Red-Nosed Reports
This puzzle was a step up in challenge from day 1. It asks to verify sequences of numbers (“reactor reports”) are safe by checking that steps between subsequent elements are within bounds, and that sequences are either monotonically increasing or decreasing, but don’t “change direction”.
For the first part of the puzzle, an interesting realization is that validating the step size can be done on pairs of numbers (or a single delta between numbers), while validating monotonicity requires looking at triplets (or two deltas):
A B
/\ / | \
/ \ / | \
Numbers: 2 4 10 11 12 11
Deltas: +2 +6 +1 +1 -1
| \ /
A B
A: Step changes can be checked with two numbers or one delta, but
B: Monotonicity requires three numbers or two deltas
I ended up futzing around with
itertools
tuple_windows
for a while, but this often turned out somewhat unwieldy
given I both had to examine pairs and triplets, plus code to handle
sequences less than 3 elements. Ultimately, I ended up with a plain
loop for elements 1..input.len()
that checked each element against its
predecessor for step size and simple state variable to check if the
direction changed.
This also helped with the second part of the puzzle, which asked to fix up to one error in a sequence, as I could then use the loop to find the index of the element where the first error occurred, and I could then re-check the sequence with that element removed.
The second interesting realization - following from the earlier observation about pairs/triplets, but that took a moment for me to sink in - was that fixing a sequence might also happen by ignoring/removing the neighbors of an element with an issue (as slope errors with a single dip might be “seen” on a later element than the one that initially caused the issue):
48, 46, 47, 49, 51, 54, 56
^ ^
| Slope change is seen when comparing 46 / 47
|
Can remove this element to fix the issue
Overcoming this issue required adding a “window” where we remove elements surrounding the place where an issue is seen, and re-checking the sequence.
Day 3 – Mull It Over
Day 3 asks to find mul(x,y)
substrings in a given string, multiply their
operands and sum up the results. Of course, the string is full of almost,
but not quite valid variants of the input:
xmul(2,4)%&mul[3,7]!@^do_not_mul(5,5)...
Given all the constraints on the input (esp. the fact that things do not nest, and that the numbers have a fixed ceiling for their digits), a quick solution here would have been to use regular expressions. While there are braces involved here, there is no nesting, etc. so things easily can be recognized by a regular language.
But I figured I’d go and implement a rudimentary parser for the fun of it. This is completely overblown (see above), but it was fun. I wrote a lexer that returned these tokens:
enum Token {
Junk,
Mul,
Lparen,
Rparen,
Comma,
Num(u32),
}
NOTE: I ran into what’s probably a very common blunder for Rust newcomers here. I tried to
match
valid characters in the lexer with an expression like'0'..'9' | 'm' | 'u' | 'l' | ...
. Can you spot the issue? The'0'..'9'
range is exclusive, so this only matches0
through8
, causing hard to debug issues with numbers containing the digit9
. Unittests helped me find the issue. So when you are trying to match ranges, remember to use the inclusive range syntax'0'..='9'
.
Once I had the lexer in place, I created a very simple shift/reduce parser, that pushes tokens onto a stack until we find a valid mul expression on top:
fn parse(&self) -> Vec<Mul> {
let mut result = vec![];
let mut stack = vec![];
for token in self.iter() {
stack.push(token);
if stack.len() >= 6 {
if let [Token::Mul, Token::Lparen, Token::Num(l), Token::Comma, Token::Num(r), Token::Rparen] =
&stack[stack.len() - 6..stack.len()]
{
result.push(Mul(*l, *r))
stack.clear();
}
}
}
result
}
We can clear the stack each time we found an expression to aggressively discard the junk characters we skipped over until finding the most recent mul expression. This isn’t what a regular parser would do (where having elements left on the stack in the end that don’t match back to the top of the tree of the language signals an error), but it works fine here.
The parser turned out to useful in the second part of the problem, where
do()
and don't()
instructions are added that mean some of the mul
s
should be skipped. After adding Do
& Dont
as tokens in my lexer,
I could recognize the expressions in the parser, and add a simple
“multiplications enabled” state that I could use to suppress emitting
Muls in the branch above.
Day 4 – Ceres Search
This puzzle asks to find words in a grid. The first part asks to find the
word XMAS
in written horizontally, vertically, diagonally both
forwards and backwards. I chose to implement a brute force solution that
allowed looking up words according to offset patterns
in the grid and
then simply checked for all 8 possible patterns from each point in the grid,
anchoring on the X
starting the word.
The second part of the puzzle then asks to find X-MAS
patterns, i.e.,
two strings MAS
or SAM
in an X pattern, things like:
M S S S
A A
M S M M
The investment of creating explicit types to handle patterns
in the grid
and lookup methods that could ignore out of bounds accesses paid nicely here,
as I could express both axes of the X as patterns again and implement a similar
brute-force search again (anchoring on the A
in the middle of the pattern).
Day 5 – Print Queue
The first half of this puzzle asks to check print runs like 75,47,61,53,29
against a set of rules like 47|53
and 75|29
that dictate a partial order.
I chose to implement two datatypes, one for the rule (simple) and one for the print run, where the latter would keep a map of the number and position for a constant-time lookup. Omitting the various bits of code to construct, display and parse these types, they key ideas are:
/// An ordering rule. Page 0 must print before Page 1.
#[derive(PartialEq, Debug)]
struct Rule(u64, u64);
/// A print run, with a sequence of page numbers.
///
/// To allow quick lookup of a given pages position, we also keep a HashMap.
#[derive(PartialEq, Debug)]
struct PrintRun {
/// Pages in order.
pages: Vec<u64>,
/// Maps each page # to its position in pages
map: HashMap<u64, usize>,
}
impl PrintRun {
fn new<'a>(iter: impl Iterator<Item = u64>) -> PrintRun {
// Copy items into pages and build a map of (page -> position)
}
fn lookup(&self, page: u64) -> Option<usize> {
self.map.get(&page).copied()
}
fn satisfies(&self, rule: &Rule) -> bool {
match (self.lookup(rule.0), self.lookup(rule.1)) {
(Some(i), Some(j)) => i < j,
_ => true, // If at least one page isn't in the run, ignore the rule.
}
}
fn mid(&self) -> Option<u64> {
return self.pages.get(self.pages.len() / 2).copied();
}
}
Using these, implementing the logic of filtering the PrintRun
s that satisfy
their Rule
s and summing up their mid()
values was easy, and I’m proud to
report that I got the right result first try on this one :-)
The second half of the problem asks to find invalid runs and fix them, giving examples where a single rule violation is fixed with a swap. I toyed with the swapping idea, but already had a hunch that this would run into issues when rules overlap. Here is a counterexample:
97|75
75|47
47,75,97,61,53
A correct ordering for this would look like ...,97,75,47,...
. But if we
naively try to fix things by swapping values rule by rule, “fixing” the
second rule messes up the first:
- 47,75,97,61,53 -> swap 97|75
- 47,97,75,61,53 -> swap 75|47
- 75,97,47,61,53 -> oops
I played with the idea of sorting the list using the
sort_by
method on slices, thinking I could simply return the ordering for rule a|b
if asked
for the pair (a,b)
and any other stable ordering otherwise. But it quickly becomes
apparent that this can’t work either, because the rules may define a transitive order.
In our example, 97,47
do have an order through two rules, but a simple lookup in
the rules won’t find that:
[47,75,97,61,53].sort_by(...) with rules 97|75, 75|47 performed these comparisons on my machine:
75 < 47 # great, matches rule 75|47
97 > 47 # Oops, no such rule (but there is a transitive order through rules 97|75 & 75|47)
Result: 75,47,97,61,53 (violating rule 97|75)
It was obvious by this point that the rules defined a partial order on the elements and toplogical sorting occurred to me, but I somehow kept looking for “simpler” solutions, which didn’t really pan out, but at least helped build a firm understanding why using the actual DAG and topological sorting was needed, after all.
I played with the idea of adding synthetic new rules for each pair in such a chain.
But this will scale quadratically. While 1|2 2|3
only needs one extra rule - 1|3
,
three such rules already require 4 extra rules to cover each pair of numbers.
The AOC inputs are usually not so giant that this might have worked, but given
I wasn’t competing against the clock, I chose to look around for other ideas.
1|2
2|3
3|4
Would need:
1|3
1|4
2|3
2|4
The next idea I toyed around with was inspired by the Union Find data structure. I imagined building trees to define the order, just inserting the rules in whatever order the came in (and making pages children of the page they sort after). When I realized that I’d sometimes have to merge partial trees, this started to sound a lot like Union Find (with the added idea of using the tree hierarchy to denote sort order).
I found this ran into trouble because it “invented” ordering
constraints that didn’t really exist in the source data, which might
conflict with later rules. One key idea I had was that when a rule
reuqired merging “sets” I would just move the whole subtree as a leaf
of the “before” element in the rule, as that would ensure the rule
requiring the merging was fulfilled, while preserving all the
ordering rules we had built up in the subtrees so far. But that can
fall apart like in this example, where this scheme “invents” an
ordering for elements 18
and 20
that isn’t required by a rule
(and I couldn’t come up with a way to then swap these elements that
didn’t risk losing some of the earlier rules encoded in the trees):
Rules: 18|7 20|11 7|11 20|18
18 OOPS!
18 20 18 /
/ / / 7
7 11 7 /
20
/
11
Okay, fine, simpler approaches didn’t really cut it as they either ignored or invented constraints. So I did break out toplogical sorting after all, implementing Kahn’s algorithm.
I ran into one final wrinkle with rule selection at this point. I had assumed
it was fine to only use the rules that were currently violated in a given print
run to fix the order for that run. But it turns out that isn’t enough. I found
counterexamples, where this only yields a partial ordering and thus no
stable sort and a different “middle value” between runs (yay for randomized
HashMap/HashSets, which made this obvious quickly). For example 55,71,52,35,89,78,34,53,51
violates the following rules as-is (i.e., both numbers in the rule are in the list
and in the wrong order):
89|35
53|89
51|89
78|35
53|34
51|71
89|52
78|71
53|35
51|35
53|71
78|89
51|34
51|52
51|53
53|52
78|55
78|52
But, even taken together, these rules leave enough room for ambiguity that both
51,53,34,78,89,52,35,55,71
and 78,51,53,71,34,55,89,52,35
are valid runs
that satisfy all of them (but they have different middle values). Turns out
there is also a rule 89|34 in the full rule set, that happens to have been
satisified in the original run, but now was violated. Once I changed my rule
selection algorithm to pick any rule where both numbers appeared in a given
run (whether violated or not) things finally were working.