Advent of Code 2024 - In Rust - Days 16-20

This is the third article in a series covering Advent of Code 2024 in Rust.

Day 16: Reindeer Maze

This puzzle asks to search for paths in a maze with different costs for going straight (1) vs. taking a 90-degree turn (1000). I thought this looked really simple, completely missing the challenge at first. I figured I could just run Dijkstra’s algorithm, which I already had implemented for day 10’s Hoof It puzzle. But I had built a version that expected a pre-defined graph with fixed edge costs, while our problem here has variable edge costs.

I also briefly thought about directed algorithms like A* but quickly discarded that idea, as the high turning cost would surely wreak havoc with the usual Monte Carlo distance heuristic and as the maze felt too small to actually bother with a directed algorithm.

      .
      B
    ..XC...
      A
      .

Traversing the "node" X in our graph will have different
costs, depending on whether we're moving A-X-B (1) or
A-X-C (1000 + 1)

So the interesting bit here is that we need to track which direction we arrived from at a given node, as that determines the cost(s) of the outgoing edges. I debated updating my Dijkstra algorithm to a variant with a dynamic cost lookup, but decided to just write one from scratch that used (position, arrival direction, cost) tuples as the constituents of the queue, and directly looked up neighboring grid positions when expanding a given node from the frontier. And that worked out quite nicely.

The second part of the puzzle adds a nice twist, where we want to know all shortest paths in the maze (presumably to take a seat along those so we have a good chance of seeing the race in action). Finding all the paths should be a simple modification to the Dijkstra search, to keep looking until we either arrive at the destination with a higher cost than previously, or we run out of nodes to expand. The other change required is to actually keep track of the paths during the search. The first puzzle just needed the cost, but the second part needs to know which tiles a given path crossed through, so we can count tiles in the end. Here’s part of the core search loop showing these ideas in action:

    /// Finds all cheapest paths from start to end.
    ///
    /// Returns an empty set if there is no connection from start to end.
    fn find_paths(&self) -> HashSet<Path> {
        // All best paths arriving at a given point from a given irection.
        let mut reached: HashMap<(Point, Direction), (usize, HashSet<Path>)> = HashMap::new();
        // The cheapest cost of path(s) arriving at the end.
        let mut cheapest = None;
        // Our search frontier (or queue).
        let mut frontier = BinaryHeap::new();
        frontier.push(Path::new(self.start, Direction::East, 0));
        while let Some(head) = frontier.pop() {
            // We've arrived at the end.
            if head.pos == self.end {
                // Stop the search when newly arriving paths become more expensive.
                // As our search expands by cost, we can be sure to have explored
                // all the cheapest paths then.
                if cheapest.is_some() && head.cost > cheapest.unwrap() {
                    break;
                }
                // Indicate that the first path has arrived at the end and
                // remember the cost.
                cheapest = Some(head.cost);
            }

            // OMITTED: A bunch of stuff to expand nodes and check
            // if we have reached intermdiate locations faster.

            // Otherwise add the current path to our set.
            reached
                .get_mut(&(head.pos, head.heading))
                .unwrap()
                .1
                .insert(head.clone());

        }
        // Collect all the paths that arrived at the end.
        let mut paths = HashSet::new();
        for d in [Direction::North, Direction::East, Direction::South, Direction::West] {
            if let Some((_, ps)) = reached.get(&(self.end, d)) {
                paths.extend(ps.iter().cloned());
            }   
        }
        paths

Day 17: Chronospatial Computer

This puzzle asks to implement a very simple virtual machine with 8 opcodes and 3 registers and then run some code against that.

I loved the hilarious details here like “For legacy reasons, this instruction reads an operand but ignores it." which will sound oh so familiar to folks having spent any time writing code for a living…

Given the somewhat complex combo operand semantics, I chose to implement an explicit type to handle the different cases of where an operand might get its value from:

/// An operand, either a literal value
/// or a reference to one of our registers.
#[derive(Debug, Clone, Copy)]
enum Operand {
    Literal(u8),
    RefA,
    RefB,
    RefC,
}

impl Operand {
    /// Parses a literal operand.
    fn literal(value: u8) -> Operand {
        assert!(value <= 7);
        Operand::Literal(value)
    }

    /// Parses a combo operand.
    fn combo(value: u8) -> Operand {
        assert!(value <= 7);
        match value {
            0..=3 => Operand::Literal(value),
            4 => Operand::RefA,
            5 => Operand::RefB,
            6 => Operand::RefC,
            7 => panic!("Reserved combo operand"),
            _ => panic!("Invalid operand value"),
        }
    }

    /// Returns the integer value of this operand.
    ///
    /// Needs a reference to register values in case
    /// the operand refers to one of the registers.
    fn val(&self, reg: &Registers) -> usize {
        match self {
            Operand::Literal(x) => *x as usize,
            Operand::RefA => reg.a,
            Operand::RefB => reg.b,
            Operand::RefC => reg.c,
        }
    }
}

I started out with Operation as an enum type, but that didn’t easily let me write small focused exeuction methods for each operation. Instead, we would have ended up with a single large match construct in the enum type’s execute method. So, I instead chose to use a trait and rely on dynamic dispatch, which allowed each operation struct to have its own execute method in implementing the trait. This ended up with some repetition (e.g., storing the operand byte) in each struct and repeating the operand method, but I liked that this made each Operation a standalone type that we could pass around without external dependencies. The Adv operation is shown below.

trait Operation {
    /// Returns the memorand of the operation (e.g., 'adv' or 'bxl')
    fn memorand(&self) -> String;

    /// Returns the operand of the operation.
    fn operand(&self) -> Operand;

    /// Executes the operation against the machine.
    ///
    /// Returns a human readable description of what happened.
    fn execute(&self, m: &mut Machine) -> String;
}

impl Display for dyn Operation {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "{} {}", self.memorand(), self.operand())
    }
}

#[derive(Debug)]
struct Adv(u8);

impl Operation for Adv {
    fn memorand(&self) -> String {
        return "adv".to_string();
    }

    fn operand(&self) -> Operand {
        return Operand::combo(self.0);
    }

    /// Adv divides register a by 2^operand
    /// and stores the result back into a.
    fn execute(&self, m: &mut Machine) -> String {
        let numerator = m.reg.a;
        let denominator = 1 << self.operand().val(&m.reg);
        m.reg.a = numerator / denominator;
        m.pc += 2;
        format!(
            "adv: &a[={}] / 2^{}[={}] => a={}",
            numerator,
            self.operand(),
            denominator,
            m.reg.a
        )
    }
}

I spent a fair bit of time on nice debug output in the terminal and more time than I care to admit on getting a caret pointing to the current location in the program code. I had expected that the second part of the puzzle might throw a much longer piece of code at us (although, in hindsight, the fact that jnz can, by construction, only jump to one of the first 4 instructions in the code, would have made that difficult). But things came together nicely in the end, and once I fixed a silly bug with the division operations (2**x is 1 << x and NOT 2 << x >.<), I had a working 3-bit computer!

Animated 3-bit computer in the console

The second part of the puzzle had a nice twist waiting. Instead of just throwing more code at us (or maybe coming up with some crazy semantics for combo operand 7), the puzzle asks to figure out the right initialization value for Register A so the program emits a copy of itself! I figured I’d first try and understand what the program did a little better, so I could come up with directed guesses rather than just brute forcing things by running the program with lots of different input values. Thanks to the debug output I had implemented in the first part, it was easy to turn the program into a kind of assembler memnonic format (where ^ means XOR):

bst &a            // b = a % 8
bxl 2             // b ^= 2
cdv &a            // c = a / 2**b
bxc               // b = b ^ c
adv &a            // a /= 8
bxl 7             // b ^= 7
out &b            // Output b % 8    
jnz 0             // Jump to 0, if a != 0

There is one thing we can easily tell from this: register A is serving as a kind of countdown. We keep dividing the value by 8 and stop when it hits zero. Given our desired output is 16 bytes long, this means A would need to start with a value in the range 8**15..8**16 (i.e., 2**45..2**48) that we can divide exactly 15 times before it hits zero. We can also tell that the (low three bits of) register B are where we compute the output, and that B purely depends on the current value of A. But with all the XOR-ing going on, I couldn’t see any obvious way to derive insights here. I decided to try and brute-force things in range 8**15..8**16 and to use multi-threading, so I could play with a different corner of Rust.

My M1 Macbook has 8 cores, so I figured I could run 8 threads that would try different input values. Given we were looking for the smallest value, I decided to interleave things, so threads would start from values 8**15, 8**15 + 1, 8**15 + 2, ... and would try numbers in increments if 8, so we could stop quickly when a value was found without wasting computation on later, larger values. Multi-threading gave rise to two intersting issues: how to figure out if a given value was the winner and how to tell if a thread could stop. But as my parallel solution ran for tens of minutes with no result, I realized that we had a lot of values to check (8**15..8**16 contains ~246 trillion values). With 8 threads I could check ~3.5M / second, so checking all values in that range would take 2-3 years…

Ok, back to the drawing board. I wondered if I could work backwards from a single output to see if we could come up with something. I realized that the last output of the program must come from a value in the range 0..=7 for A (as the value would need to divide to 0 during that run), so I decided to tabulate what output we’d get for those values:

Expected output: ... 5,5,3,0
                           ^
|A     | Output|
|------|-------|
|0     | 5     |
|1     | 4     |
|2     | 5     |
|3     | 7     |
|4     | 1     |
|5     | 0     |  <== Program ends in [0]. 
|6     | 3     |      So A must be 5 in the last iteration
|7     | 2     |

Based on this knowledge, we can now say things about the previous iteration, where A must have a value that both divides into 5 and produces output 3 as the program’s first output. We know that values 40..=47 divide into 5, so we can check which of them will produce a 3 as the first output of the program:

Expected output: ... 5,5,3,0
                         ^

|A      | First Output|
|-------|-------------|
|40     | 7           |
|41     | 1           |
|42     | 5           |
|43     | 3           |    <== Candidate producing ... [3,0]
|44     | 1           |
|45     | 0           |  
|46     | 1           |      
|47     | 3           |    <== Candidate producing ... [3,0]

And we can now continue in the same way, building up a search tree in a dynamic programming fashion. I wondered if we could prune larger candidates immediately, but decided to keep them, as I wasn’t sure if we might hit dead-ends (where, say, none of the values diving into 43 would produce the right output and we’d have to use 47 instead).

           5            [0]
         /   \
        43    47        [3,0]
       /  \  /  \
           ...          [5,3,0]

And this worked out nicely, producing the correct solution with a very limited number of candidates (the final list had only 8 values):

$ cargo run ...
Examining output 0@15 with targets [0] 
  Candidate 5 yields 0!
Examining output 3@14 with targets [5] 
  Candidate 43 yields 3!
  Candidate 47 yields 3!
Examining output 5@13 with targets [43, 47] 
  Candidate 346 yields 5!
  Candidate 378 yields 5!
Examining output 5@12 with targets [346, 378] 
...

Day 18: RAM Run

I’m sure getting a fair bit of mileage out of my 2D grid types. This puzzle asks to simulate bytes “falling” into a 2D grid of RAM at a given speed and asks to find a path to the exit. The first path of the puzzle just asks to run pathfinding in a static maze after the first 1024 bytes have “fallen”.

My guess was that the second part of the puzzle would make use of the speed by having our search try to “outrun” the falling obstacles. This could either allow finding shorter paths, that are only possible if we arrived at a given position before it gets corrupted, or maybe there is no solution at all all the bytes have fallen. I also found myself speculating that we might be asked to compute some minimum speed that would be required to make it to the exit in such a scenario (for which we might try and use some kind of binary search).

But without the time component, the first part of the puzzle ends up being relatiely similar to Day 16’s Reindeer Maze, without the expensive turns, so I simply ran a Dijkstra with uniform cost (i.e., a BFS) on the maze and that yielded the expected result.

The second part of the puzzle ended up asking for the first byte what would end up blocking the path. Given the maze was small (even in an unoptimized build it took ~0.1s), there is an obvious brute force solution that simply drops bytes and runs the full search after each.

Thinking about how we might optimize this, I first thought about computing the shortest path and then just dropping bytes until one landed on this path. That ignores the (very likely) possibility that some other, longer path still exists. But it could be used to avoid some graph searches, by only starting a new search when the known good path becomes impossible. I also thought about treating this as a connectivity problem, where we’d want to run kind of the opposite of a Union-Find algorithm, removing edges until our graph fell apart into more than one connected component. A quick search didn’t find an obvious textbook algorithm for this, but a naive algorithm might want to run a bidirectional search checking if the two nodes we disconnected were still connected via some other path.

I decided to run the brute force algorithm in this case and that took less than 20s in an unoptimized build and about a second in an optimized one.

Day 19: Linen Layout

This puzzle asks to produce patterns from colored towels, effectively trying to see if we can produce a given string by concatenating an arbitrary number of base towel words.

The two associations I had with this problem were Formal languages, where we could treat each pattern as a sentence and verify if a formal langauge on the “words” of the towels would accept it. Or Covering problems where we’d try to “cover” the pattern with the towels.

I decided to stick with the domain of formal languages. If we went with automata, we could construct an NFA to accept the pattern sentences by adding a Kleene Star around each towel word and connecting these sub-automata with ε-edges. Here is an example of such an NFA for the towels r, wr, bwu:

Drawing of an NFA for a few example towels

If we instead go with the idea of a formal grammar, we can imagine building up a parse tree for each input. As we have an infinite supply of towels and there are no rules governing how towels might follow each other, we’re dealing with a regular langugage - the kind of language that a finite automaton can recognize. This means we can simply keep “biting off” (all) possible words on the left (or right) side of the input, which produces a search tree like below. Because we only care whether a solution exists at all, we can employ a depth-first search approach here, hoping to bottom out quickly on one of the branches where we can terminate the search early, but backtracking to earlier in the input when we get stuck.

Available towels: r, wr, b, g, bwu, rb, gb, br
             
             brwrr
           /      \
        b|rwrr    br|wrr    
          |         |
        b|r|wrr   br|wr|r
          |
        b|r|wr|r

I hadn’t done automata in ages, so I decided to try and implement the NFA approach. Here’s two of the key methods to build and run the NFA and I’m proud to report that this worked beautifully:

/// A nondeterminstic finite automaton to accept
/// arbitrary sentences concatenated from a list
/// of words.
/// 
/// Node 0 is the starting node by convention and
/// the only accepting node.
struct NFA {
    nodes: Vec<Node>,
}

impl NFA {
    /// Creates an NFA from a list of words.
    fn from_words(words: Vec<&str>) -> NFA {
        const START: usize = 0;
        let mut nfa = NFA::new();
        for word in words {
            // Ensure there is a path for the given word from
            // START.
            let n = nfa.ensure(START, word);
            // Add an epsilon edge back to the start from
            // the node where word ends.
            nfa.connect(n, START, None);
        }
        nfa
    }

    /// Checks if the automaton accepts the given sentence.
    fn accept(&self, sentence: &str) -> bool {
        const START: usize = 0;
        // Initialize our set of active nodes with the
        // START node and anything reachable from there via
        // only epsilon edges.
        let mut active = HashSet::new();
        active.insert(START);
        active = self.epsilon(&active);
        // Walk over our input, checking for each active
        // node if there is an edge that can consume the
        // character (and then following any epsilon edges).
        for c in sentence.chars() {
            let mut next = HashSet::new();
            for cur in active {
                next.extend(self.neighbor(cur, c));
            }
            active = self.epsilon(&next);
        }
        // We accept the sentence if the START node is
        // in the active set after consuming all input.
        active.contains(&START)
    }
}

The second part of the puzzle asks to count all possible ways we produce a given pattern from the towels. This is tricky to express with the NFA, so I also ended up implementing a variant of the “parse tree” approach outlined earlier. I chose to store all of the towel words in a Trie, which lets us efficiently scan for matches across all of them from a given index in our pattern, and implemented a simple recursive search (given the patterns were reasonably short, so we wouldn’t overflow the stack with tens of towels per pattern).

But this ran way slow, immdiately getting “stuck” on the first pattern. After adding some debug output, I saw that the same sub(-patterns) were checked over and over again, which makes sense. There will be different ways to represent some prefixes, which then means we have to check the same suffix over and over again. So this was another dynamic programming problem. And lo and behold, after adding a simple memoization cache, the program produced the right output in seconds.

Day 20: Race Condition

This puzzle asks to check shortcuts (“cheats”) in a maze with a single path from start to end, where we can jump through one piece of wall. The path cost is uniform (1 “picosecond” per move), and the puzzle asks to find all cheats that save at least 100 picoseconds.

Given there is a single path, I figured we could run a path search first and annotate each tile with the cost, which then allows computing the savings of each cheat without running a new search.

Example: The cheat marked with the ~ has a cost saving of
4 "picoseconds" as it takes us from a tile marked 1 to a tile
marked 5.

#####
#234#
#1~5#
#S#E#
#####

I wondered if that the second part of the puzzle might allow to take multiple such cheats and would ask to pick the optimum (but that smells like a hard problem). Instead the second part sticks with a single cheat per race, but extends the cheat duration to up to 20 picoseconds. That is, programs may now travel through walls (and other tiles) for up to 20 picoseconds. A key property here is that we do NOT care about the exact travel path taken in these 20 picoseconds, but instead keep identifying each cheat simply by its start and end positions.

So this still fits the same basic idea as before where we operate on a precomputed path. But instead of just only examining tiles exactly 2 steps away from a given position as possible landings for a cheat, we’ll now need to look at a diamond-shaped search area of tiles up to 20 steps away.

I realized at this point that I had failed to examine some possible destinations moving with a 90-degree turn for the first part of the puzzle, but these aren’t really interesting at distance 2, as that would just amount to cutting a corner, which will not save any time.

Search pattern for the first part:
....2....
.........
..2.S.2..
.........
....2....

Search pattern for the second part with "radius" 4 (actually 20):
....4....
...434...
..43234..
.432.234.
432.S.234
.432.234.
..43234..
...434...
....4....

In the first part of the puzzle I did not bother with moving the search pattern along the actual path and instead just scanned the whole maze from top-left to bottom-right. I wondered if this would be suboptimal in the second case, in case there might be bits of “dead track”. However, a quick sanity check showed that the input actually used all of the track tiles in the maze, so there weren’t any major efficiency gains found by trying to move along the path or checking for tiles that had infinite cost.

I also wondered if we could somehow make use of the fact that the pattern would overlap a lot. I didn’t see an easy way to simplify the actual cost computation, as the start node would change each time. We could move from computing a “single-source” problem for cheats (S, *) to an “all pairs” problem for a given search area but that would likely be similarly expensive or worse in terms of overall cost. But we can save cost by re-using the work of discovering track nodes in the search area. The actual computation (subtracting two integers) is likely the cheap part, compared to iterating over the diamond and checking which nodes were track. But we could avoid a lot of the latter work, by only updating the edge(s) of the search area. For example, when moving the searc area 1 to the right, we can “forget” the track parts on the left edge, and only need to check and add the ones on the right edge. I figured I’d first try a brute force approach and would come back to this idea if things didn’t run fast enough, though.

Moving the search area one to the right needs to "forget" nodes
marked X (and add the ones marked 4 on the right hand edge)

....X4....
...X434...
..X43234..
.X432.234.
X432.S.234
.X432.234.
..X43234..
...X434...
....X4....

A final problem was how to produce the search area “diamond” in the first place (I figured I’d keep this as some list of displacement vectors, that we could apply to the starting point to fetch all possible destinations). I realized that I could effectively grow this in a kind of flood-fill way by repeatedly applying the same four basic displacement vectors to the outermost points of the previous round:

Paths of length 1: {(-1, 0), (0, -1), (1, 0), (0, 1)}

  for each, add [(-1, 0), (0, -1), (1, 0), (0, 1)]
  discard (0, 0) vectors and members already in the set

Paths of length 2: {(-2, 0), (-1, -1), (-1, 1),
                    (0, -2), (1, -1),
                    (2, 0), (1, 1)
                    (0, 2)}
  
  continue in the same way until we're at length 20,
  create a set union.

I implemented the search diamond solution (which ran quickly even with a naive search at each grid position). However, this returned too many shortcuts. I looked at the example, which should have 3 shortcuts saving 76 picoseconds, but my approach found 9 of them. I first thought I had misunderstood the problem description, thinking that maybe shortcuts had to move in wall tiles only.

As I was thinking about a BFS in the wall tiles, it occurred to me that we could approach the problem from a different direction, by making use of the minimum savings as a way to choose destinations. Instead of driving the search from the maximum shortcut length with the “search diamond”, we could instead sort the track tiles by their cost and choose any tile with cost current + 101 as a possible destination, turning the problem into a question of whether two tiles were separated by some viable path of length <= 20. I didn’t implement this idea, as I had assumed I would need to implement a “wall search” in any case and already had the diamond working.

It turns out that I had forgotten to account for the cost of traversing the shortcut itself, which allowed finding many shortcuts with identical savings on straight runs of the track by taking a shortcut one step later and arriving one step earlier with a longer shortcut. Once I fixed the issue by computing the savings as length of shortcut - (cost at re-entering the track - cost at leaving the track) I got the correct solution. So it turns out I had understood the problem correctly after all, and had missed a small but key piece. Given we can move on any tile, there is a more elegant solution to this, if we combine the above idea of “any track piece +101 picoseconds out” with the Manhattan distance.