A great thing about Nix (the language) is that it’s tiny. Tiny
enough that besides some `builtins`

and a few NixOS/Nixpkgs centered
features (e.g. paths), we’ve covered, for most intent and purposes,
all of it.

The next article will be about writing a λ-calculus interpreter. In preparation, we’ll focus here on implementing three small projects:

- A prefix/polish expression evaluator (e.g. something that reads
a string like
`+ + 3 * 5 + 2 2 19`

, and interprets it as`3+5*(2+2)+19=42`

); - A Brainf*ck evaluator (if you don’t know what it is, I guess an example wouldn’t help much);
- A mathematical expression evaluator (e.g.
something that reads and interprets
`3+5*(2+2)+19`

);

The code for this series is still available on github.

# Testing

Our code is going to get slightly more complex than before. As a result, and because we’ll often work iteratively, from simple to complex, we’ll want to have ways to gain a firm hand on the stability of the code.

Here’s a simple, generic test module, a Nix implementation of the one described in this article:

```
with builtins;
rec {
apply = f: l: foldl' (x: y: x y) f l;
run1 = {descr ? "", fun, args, expected, indent ? ""} :
let
# This means, if we want only one argument that is a list,
# we have to use a list with a single element: [theSingleListArg].
got = if isList(args) then (apply fun args)
else (fun args);
r = got == expected;
ok = if r then "OK" else "failed!";
in
trace("${indent}${ok}\t:${descr}") (if r then r else
trace("${indent}\tGot:")
trace(deepSeq got got)
trace("${indent}\tExpected:")
trace(deepSeq expected expected)
r)
;
run = foldl' (ok: t: ok && (run1 t)) true;
}
```

```
{ apply = <CODE>; run = <CODE>; run1 = <CODE>; }
```

Here’s how we could write tests for this test module itself:

```
#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
ftests = (import ./ftests.nix);
tests = [
{
descr = "run1 [1 2 3] ≠ [1 2 3 4]";
fun = ftests.run1;
args = {
descr = "(should fail)";
indent = "// ";
fun = _ : [ 1 2 3];
args = [];
expected = [1 2 3 4];
};
expected = false;
}
];
in ftests.run tests
```

```
trace: // failed! :(should fail)
trace: // Got:
trace: «lambda fun @ ./nix/ftests_test.nix:12:16»
trace: // Expected:
trace: [ 1 2 3 4 ]
trace: OK :run1 [1 2 3] ≠ [1 2 3 4]
true
```

** Exercise:** There

*are*other options for testing. Try to find how Nixpkgs’s

`strings.nix`

module is tested for instance.**[+]**

__Solution:__** Note:** I’ll still be using ad-hoc tests for simple pieces of
code. Also, while I’ll be giving you tests in one format,
feel free to use any other
you might prefer. The key point is to have an automated way to make sure
your code behaves as expected, and keeps doing so despite changes.

# Prefix expressions

## Introduction

The polish notation, or prefix notation, is a way to write down mathematical expressions involving binary operators (a binary operator is one that eats two numbers): addition, subtraction, multiplication and division, in our case.

As the name suggests, the idea is to put operators in front of
the numbers they operate on, e.g. `+ 4 5`

instead of `4 + 5`

.
A noticeable side-effect is that expressions became unambiguous
to the point that parentheses become superfluous.
For instance `(3 + 4) * 5`

can be written as `* + 3 4 5`

:
the `*`

expects to eat two numbers, but what follows it is an
operator, so we must first evaluates `+ 3 4`

before we can
process the `*`

, and so forth.

** Exercise:** There’s an engineering practice called
TDD, which
as the name suggests, consists in first writing tests, then writing
code matching those tests. We’re incidentally going to use it now, not
out of ideology, but so that you can manually handle prefixed expressions
before attempting to parse them, and to help you think about how to handle
some special cases.

So, fill the “blanks” of the following tests:

```
{
descr = ''exec ""'';
fun = prefix.exec
args = "";
expected = a;
}
{
descr = ''exec "3"'';
fun = prefix.exec;
args = "3";
expected = b;
}
{
descr = ''exec "+1 + 2 33"'';
fun = prefix.exec;
args = "+1 + 2 33";
expected = c;
}
{
descr = ''exec "*2 +1 + 2hello33"'';
fun = prefix.exec;
args = "*2 +1 + 2hello33";
expected = d;
}
{
descr = ''exec "*2.1 + 1 4.23"'';
fun = prefix.exec;
args = "*2.1 + 1 4.23";
expected = e;
}
{
descr = ''exec "+ 1 2 3"'';
fun = prefix.exec;
args = "+ 1 2 3";
expected = f;
}
{
descr = ''exec "+ 1"'';
fun = prefix.exec;
args = "+ 1";
expected = g;
}
```

**[+]**

__Solution:__** Exercise:** At this point, you may want to stop reading
and spend some time writing a solution.

## Implementation

** Exercise:** Here’s something to warm you up: write a
function

`parseInt s`

which parses an integer from a string `s`

.
You’ll want to use the `ascii.nix`

and `strings.nix`

module from
the previous article.**[+]**

__Solution:__** Exercise:** Let’s make it slightly more difficult: write a

`parseNum s`

function, which can parse integers, but also floating numbers.**[+]**

__Tip:__**[+]**

__Tip:__**[+]**

__Solution:__There are two main approaches to implement the prefix expression parser:

- Either, you pre-parse the input by tokenizing it: you
build on list of tokens, for instance:
Then you may call our previous
`[ "+" "*" "134" "aaa" ".24" ".25" ]`

`parseNum`

on number-looking tokens to get actual numbers from strings; - Or, you go through the input byte-per-byte, and as it was done for the regular expression matcher.

The second option is perhaps less “intuitive”, but actually leads to a much more compact and elegant code, so I’ll focus on this one.

** Exercise:** Try to find rewrite

`parseNum`

so that
it returns not just the parsed number, but the position in `s`

at
which it stopped parsing.**[+]**

__Tip:__**[+]**

__Solution:__** Exercise:** We’ve written about half the code;
you just need to find a way to tie everything together.

**[+]**

__Tip:__**[+]**

__Solution:__# Brainf*ck

## Introduction

Brainf*ck is an esoteric “programming language”. Despite it lack of usability, it remains turing-complete: it’s a great way to get a feel for what a Turing machine is and what it’s capable of.

The language itself consists of eight commands, operating on an infinite memory band, consisting of integer-valued cells, addressed by a single head/pointer. The “current cell” corresponds to the cell being pointed by the single head/pointer. Here’s an example, assuming we can store characters in the cells instead of just numbers:

```
-----------------------------------------------
... |b|1|1|b|#|#|#|#|o|,| |w|o|r|l|d|!|b| ...
----------^------------------------------------
```

As for the available commands:

Command | Description |
---|---|

`<` |
move the head one step to the left |

`>` |
move the head one step to the right |

`+` |
increment the current cell’s value by one |

`-` |
decrement the current cell’s value by one |

`.` |
output the current cell’s value |

`,` |
read one byte; set the current cell’s value to it |

`[` |
if the current cell’s value is zero, jump to the command right after the corresponding `]` ; otherwise move one step to the right |

`]` |
if the current cell’s value is non-zero, jump to the command right after the corresponding `[` ; otherwise, move one step to the right |

** Exercise:** Here’s a sample program. Can you figure
out what it does and how?

```
++++++++++[>+++++++>++++++++++>+++>+<<<<-]
>++.>+.+++++++..+++.>++.<<+++++++++++++++.
>.+++.------.--------.>+.>.
```

**[+]**

__Tip:__**[+]**

__Solution:__** Exercise:** Without looking further, try to write
an implementation. You can test it to some reasonable degree with
the previous program. You may then want to keep reading,
and compare your solution to the one proposed.

Here’s a variant of the same program:

```
>+++++++++[<++++++++>-]<.>++++++[<+++++>-]
<-.+++++++..+++.>>+++++++[<++++++>-]<++.--
----------.<++++++++.--------.+++.------.--
------.>+.>++++++++++.
```

## Memory tape: list zippers

Most of the code should be rather straightforward, especially considering the “mover” object developed at the end of the previous article. But we can be a little fancy regarding the memory tape.

We could define it as a finite, (boring) regular Nix list. We could
even make it “cyclic”, for instance by going to the beginning
once we reach the end via `>`

or the other way around, with `<`

.
Or, we could make it cyclic by creating a “real” cyclic list,
for instance with sets.

But let’s face it, wouldn’t it be much more fun if we could build an “infinite” band? After all, we’ve already saw that we could build infinite lists, for instance, infinite streams of zeroes. Surely, this shouldn’t be too hard.

** Exercise:** Can you figure out a way of moving
left and right in an infinite stream of zeroes? Obviously, we’ll
later want to be able to update our structure locally so as to
contain other numbers than zeroes.

**[+]**

__Tip:__**[+]**

__Tip:__**[+]**

__Solution:__## State

As already shown in a previous exercise, we can implement the state with a set. Then, “updating” the state is performed by having functions returning the updated set/state. Here’s an example of the state to represent a Brainf*ck interpreter:

```
state = {
# Let's use our fancy memory tape
mem = tape zeroes 0 zeroes;
# stdin/stdout ("in" is a reserved keyword)
xin = "";
out = "";
# Note that there's only one error (jump failure)
err = "";
# Command and pointer to the current character
# in cmd
cmd = "";
pcmd = 0;
};
```

## Finishing it

We’re mostly done. A key element that remains is this one:

** Exercise:** Write two functions

`jmpnxt`

and `jmpprv`

who respectively jump to a matching `[`

or `]`

. You want to
take into account the fact that you can have nested loops. The
code for both functions is rather similar: you should be able
to factorize it.**[+]**

__Tip:__**[+]**

__Solution:__** Exercise:** We’re mostly done, we just need to write
a recursive

`exec s`

function, which takes a state as input and
returns it updated.**[+]**

__Solution:__# Mathematical expressions

## Introduction

We’ve started this article with a prefixed expression parser: we’re going to make
things more difficult now by trying to parse regular mathematical
expressions, always involving the four elementary binary operators:
addition, multiplication, subtraction and division. We’ll furthermore
allow the use of the minus/plus to serve as unary operators, e.g. we
should be able to write `-3`

to get minus three, instead of the more
cumbersome `0-3`

.

Note that the interpreter is almost trivial: 95%, 98% of the work and the difficulties are in the parsing.

We’re going to use a technique commonly referred to as
*recursive descent parsing*. The approach is simple enough
to be introduced intuitively; parsing remains a subtle/tricky subject, so
be on your guard. There’s an “old” book on compilers, informally
called the “*Dragon Book*”, as its cover
depicts a knight engaging a dragon: a metaphor for taming the
inherent complexities of compilers.

I propose the following plan of action:

- I’ll start by giving you general advises on how to proceed, so that you can start working on your own: give it a serious try; it’s an important exercise: it captures most of the difficulties you can encounter when parsing a programming language, once you’ve got this under your belt, it’s not much harder to parse more advanced languages;
- In case you’ll need it, I’ll give you a bunch of tests;
- I’ll then sketch a solution, guiding you through the main difficulties;
- And finally, I’ll give you the code.

** Exercise:** Again, I encourage you to give it a try
before diving further. Work incrementally: start by parsing numbers,
numbers in parentheses. Then, try to make unary expressions work
e.g

`-4`

or `--5`

or `+(-67.))`

. Move on to parsing additions
and subtractions. Finish with multiplications/divisions. First
tip below is about the data structures to be used, and some general
guidance on the parsing; the second is a list of tests.**[+]**

__Tip:__**[+]**

__Tip:__## Sketching a solution

Let’s try to work this out, step-by-step. *Recursive descent parsing*
may be an intimidating name, but the principles are rather intuitive. Let’s
start by writing down a few sample expressions (I’ll be using integers here for
simplicity, but you should aim at making it work with floats):

```
1
-3
+5
-(+42)
1 + 3
1 + 3 + 5
1 - 4 + 3
1 + 3 * 5
1 * 3
1 + 3 * (5 + 1)
1 / 3 + 5
```

As mentioned in a previous tip, our goal will be to create trees out of expressions. What kind of trees? Well let’s think this through:

- Isolated numbers will be the leaves of the tree. Each node
of the tree can be represented either by a list or a set, but a
set will be clearer. We could avoid the use of a
`type`

, but this will also contribute to improve readability. For instance, the number`3`

could be represented by the following:`{ type = "Num"; val = 3; }`

- OK, let’s try to represent an unary operation? For instance
`-3`

:That seems reasonable, but it would break on something like`{ type = "Num"; val = -3; }`

`-(3+5)`

, so we need to adjust things a little:On the other hand, we can simply ignore the`{ type = "Unary"; sgn = -1; expr = { ... }; }`

`+`

e.g. in`+3`

or`+(4+5)`

. But we could also use the previous structure with`sgn`

to`+1`

instead of`-1`

. - What about an addition? Well, now things are getting more
interesting. What would you say of the following option?
Hm, consider
`{ type = "Add"; left = 3; right = 4; }`

`(3+4)+5`

: clearly, one of those addition involves a left part which is yet another addition, a tree. Perhaps it’d be better if`left/right`

were trees/nodes:`{ type = "Add"; left = { ... }; right = { ... }; }`

- From there, simply changing the type to
`"Sub"`

",`"Mult"`

and so forth would give us all the other binary operators; - We could try to go this way, but I’ll take a shortcut: it’s customary
to group additions/subtractions together and call either a “term”. By
comparison, multiplications/divisions can be grouped together and either
one is called “factor”. That distinction allows us to express the fact
that there’s a difference in priority: multiplications/divisions should
be interpreted
*before*additions/subtractions. So this would gives us something like:`{ type = "Term"; op = "+"; left = { ... }; right = { ... }; } { type = "Factor"; op = "/"; left = { ... }; right = { ... }; }`

I’ve said that this is a shortcut because we could have made it work with the previous format, but it’s customary to represent things this way, and will help in formulating what follows more directly.

So essentially, a mathematical expression is either:

- an isolated number;
- an unary expression;
- a factor;
- a term.

This is a neat observation: there are no other choices. This means, if we want to operate recursively on an expression, we just have to manage those four cases. It’s the same kind of idea that when we considered lists recursively as being either empty, or a head attached to a tail.

This also means that during the parsing, we’ll be parsing either one at a given moment. Which implies that we’ll have ~four “states”. Now if you recall the regular expression matcher from the previous article, we can “encode” each state by a function. Say:

```
parseNum = ...;
parseUnary = ...;
parseFact = ...;
parseTerm = ...;
# Here's the entry point
parseExpr = ...;
```

Now what should all those functions eat/return? Essentially, they will be eating the input string and a pointer to that string (the current position in the input string), as we did for the Brainf*ck interpreter.

As for the returned values, they should return the new input position and a tree, or an error. While not strictly speaking mandatory, it’s convenient, as we did for the Brainf*ck interpreter, to wrap all the inputs/outputs in a state variable: all those functions should then take both as input and output such a state:

```
start = {
err = null;
s = "";
p = 0;
expr = {};
};
```

OK, so, how to make all those functions work with each other? If we
want to parse a single number, say `4`

, and if `parseExpr`

is our
entry point, then this means that `parseExpr`

should find a way
to call `parseNum`

, as `parseNum`

is responsible for parsing a
number. So let’s say that `parseExpr`

directly calls `parseNum`

.

Now consider a basic addition:

```
1 + 4
```

Clearly this doesn’t work
anymore: we’d parse `1`

and then return, ignoring everything else.
How should we proceed? Well, we must find a way to call `parseTerm`

between `parseExpr`

and `parseNum`

. The following pseudo-code
stills allows us to parse bare integers

```
parseNum = ...;
parseUnary = ...;
parseFact = ...;
parseTerm = s: parseNum s;
# Here's the entry point
parseExpr = s: parseTerm s;
```

But we can now inject more code in `parseTerm`

to detect
and parse an addition:

```
parseNum = ...;
parseUnary = ...;
parseFact = ...;
parseTerm = s: let t = parseNum s; in
if haveAPlusSign s then
let u = parseNum (skipPlus s); in
u // { type = "Term"; op = "+"; left = t.expr; right = u.expr; }
else s
;
# Here's the entry point
parseExpr = s: parseTerm s;
```

Now, what about:

```
1 + 3 + 6
```

Hm. Perhaps we can find a way to
have some sort of recursive call to `parseTerm`

instead of calling
`parseNum`

for the second term? For instance, parsing the previous
expression as:

```
1 + (3 + 6)
```

Okay, but what if we had:

```
1 - 3 + 6
```

Then we would have parsed it as:

```
1 - (3 + 6)
```

Which is clearly *not* what we want. Instead of calling `parseTerm`

directly, we could have an auxiliary function, allowing us to
build a tree with the correct associativity, so as to parse
that last expression as:

```
(1 - 3) + 6
```

How would this work in pseudo-code?

```
parseNum = ...;
parseUnary = ...;
parseFact = ...;
parseTerm = s: let t = parseNum s; in
let aux = t:
if haveAPlusSign s then
let u = parseNum (skipPlus s); in
aux (u // { type = "Term"; op = "+"; left = t.expr; right = u.expr; })
else
t
; in aux t
;
# Here's the entry point
parseExpr = s: parseTerm s;
```

Now, how about parsing multiplications?

```
1 * 3
```

Clearly, we’d need `parseExpr`

to find a way to call `parseFact`

, but
should it be done directly, or for instance, via `parseTerm`

? Consider
for instance:

```
1 + 3 * 5
```

or

```
1 * 42 + 4
```

We’d want the multiplication to be parsed before we do the addition right? Which means that when we’re on an addition, we’d want to parse the left/right hand side of it as either term, number, or factor:

```
(factor|number) + (term|factor|number)
```

So far, we were only considering the case where we had a number on the left and either a term or a number on the right:

```
(number) + (term|number)
```

So, to make it work we simply need to call `parseFactor`

to get us either a factor or a number:

```
parseNum = ...;
parseUnary = ...;
parseFact = s: let t = parseNum s; in
...
;
parseTerm = s: let t = parseFact s; in
let aux = t:
if haveAPlusSign s then
let u = parseFact (skipPlus s); in
aux (u // { type = "Term"; op = "+"; left = t.expr; right = u.expr; })
else
t
; in aux t
;
# Here's the entry point
parseExpr = s: parseTerm s;
```

I’ll stop here because all the major aspects have been developed: parsing unary expressions requires the same kind of thinking; parsing subtractions or divisions are almost trivial. You should also think about error handling, such as premature end of content, missing parenthesis, etc.

Just take it slow, one feature at a time, and write tests to validate what you’ve been doing so far. I would encourage you to write in that order:

`parseNum`

(you may want to revisit the floating number parsing function we wrote earlier);`parseUnary`

;`parseTerm`

;`parseFact`

;

## Finishing it

** Exercise:** So try to fill in the blanks on your own.

**[+]**

__Solution:__# Conclusion

In the next article, we’ll make things more interesting by targeting a more refined language: the λ-calculus. We’ll see that recursive descent parsing, as we’ve used it so far, has some limitations, and how to overcome them.

And, most importantly, we’ll study a *real* interpreter, which
is at the basis of functional programming languages such as Nix.
While the parser will remain of comparable difficulty to the last
one developed here, the interpreter on the other hand will be
more refined (but still minute in comparison to real life
languages, such as Nix).

## Comments

By email, at mathieu.bivert chez: