June 2, 2018
Processing Trees with Recursion Schemes
A long time ago, I was in touch with a production system whose purpose was to run a piece of data through a decision tree. At every step, the output could be Good, Bad, Move Left, or Move Right; there were no leaves, since at the end you were supposed to always have returned either Good or Bad (this means it would be an error for you to get there).
The nodes in the decision tree could be pretty much whatever: most of them were expert rules for determining certain cases and automatically marking them as good or bad, others were carefully crafted models. Some of these nodes also required access to internal APIs in order to fetch more data concerning what you were processing.
It operated on an environment that changed frequently, which meant that the patterns the tree was looking for had to be constantly adapted, and thus it was a requirement of the system to hotswap the nodes' code.
The system was unable to do hotswapping, for whatever reasons that may be (I’m no expert in Java, but it seems there are several ways to do so). This led someone to create their own scripting language.
The language itself was pretty bad, but what was good about it is that it was really nicely integrated: they had also built version control on top of the nodes' code, and deployment was reasonably smooth if you were working with these sort of nodes.
The problem came when you wanted to do something with the nodes that was not supported within the scripting language, such as accessing APIs. In these cases, you had to go into each server that had the code in it, and replace the compiled Java classes with your new node’s code, to then restart the server, causing it to detect your new code. Suffice it to say, this was a pain, and of course very much error prone.
A few days ago, I was running through Patrick Thomson’s Introduction to Recursion Schemes and this problem came to mind again. I thought it might be a good exercise in learning recursion schemes; and thus here we are. Before we start, you may want to go through at least the first one or two posts in Patrick’s tutorials. Also, I’ll be using Edward Kmett’s recursion-schemes throughout, along with several language extensions and what not; you can see the finished code in my GitHub repository.
To start with, we’ll have our Tree:
|
|
Observation: this type allows you to write trees that never end. We have no way to brace against this in the type system (that I know of), but you can definitely ask for the left
and right
to be strict, which would mean any such program would hang at the moment you try to make such a value, I think.
This makeBaseFunctor
call ensures that we get the additional TreeF
type with the “continuation holes” for the type. And then we’ll have two crucial types:
|
|
This is how we will encode the output of node: every node has to produce a value of type Outcome d
, and we will just check whether it is a movement or an output value that was just produced. Thus, the type of what produces an outcome (which I’m going to call Rule
because it is the name that makes the most sense to me right now) is pretty clear:
|
|
Where we’ll have a constraint in which we require m
to be a Monad
instance. This type, although pretty simple, will give us all the flexibility we need: Tree (Rule m d)
has monadic actions at each node; when we run the monadic action, we’ll get an Outcome d
, which we can check for what to do next.
With these types, the function we need will have a signature that looks like this:
|
|
Almost. You could reach a Failure
node, and thus there’s no way you could possibly produce an Outcome d
, which means you’d be forced to error in some other way. Since using the Monad
instance’s fail
function is considered evil, I just opted for this:
|
|
There is yet another issue with this function signature: if the Rule
throws an exception, then the entire process will interrupt; and we won’t be able to distinguish between an error in the execution and an error in the rule (at least not a priori). Thus, I introduced yet another abstraction, which is that of an Execution
:
|
|
Indeed, an Execution
is just the result from running a Rule. Furthermore, an execute
with a signature as above won’t be able to provide any debug information: we have no knowledge about the path that was taken in the tree. The easy way to solve this is to just output a Tree
:
|
|
The question now is which recursion scheme to use. At first I thought it would be a catamorphism, however, it simply isn’t because the process we want to do is “first run the rule, then make a decision on which way to go”, which is different from “run the two sides' rules, then decide what to do with the current rule”.
After a lot of time spent looking at the type signatures, it became clear that it is an anamorphism. The way I like to think about it is that, at every step, you have to unfold a new Tree
; this requires you to provide the value
for the Node
, and its two sides. The value
corresponds to the Execution d
, and the two sides depend on the rule execution: if the rule said to finish, then both sides are Failure
, if it said to go L
, then just the right
side is Failure
while the left
side holds the next Rule m d
to execute (the other case is symmetric). Thus, we need a transformation of type:
|
|
Coding this is a fairly simple endeavor, as long as you have a monadic anamorphism. This is not as simple as it looks because you have to pick some sequencing for the effects of the Monad
. There is an old closed bug in the recursion-schemes
GitHub repository with a discussion on the complexities. I just used the anaM
definition from the first comment:
|
|
With some additional foo, it turns out to be really easy to write this function down:
|
|