Lately, we’ve been hanging around a rather theoretical corner of computer science. However interesting and insightful, that aspect must almost always be balanced. For instance, Church booleans/integers are, at least on modern ordinary hardware, impractical compared to fixed-size 32/64 bits integers.

Often in engineering, pragmatism is a decisive way to select between multiple options, an approach which often leads to simple(r/st) choices.

We’ll use some aspects of the design of Nix itself to exemplify this point. Our roadmap for this article will include some more details on laziness, as in lazy evaluation (and not as a variant of “pragmatism”), and genuine Nix lists/sets. Genuine in the sense, not some hand-implemented ones out of closures, but the ones provided by the language.

Again, the code for this series is available on github.

Oak grove (Дубовая роща, 1887), oil on canvas, 193×183 cm by Ivan Ivanovich Shishkin (Ива́н Ива́нович Ши́шкин, 1832-1898) through wikimedia.org – Public domain

Lists

So, here’s how real Nix lists behave:

Lists are written between brackets [];

#!/usr/bin/env -S nix-instantiate --eval
["hello" "world"]

[ "hello" "world" ]

Elements of the list are white-spaces separated; use parentheses to isolate them when needed;

#!/usr/bin/env -S nix-instantiate --eval
[("hel" + "lo") "world"]

[ <CODE> "world" ]

Lists are actually tuples, meaning, list elements don’t have to have the same type:

#!/usr/bin/env -S nix-instantiate --eval
[1 ("hello") (x: 3)]

[ 1 "hello" <CODE> ]

Lists elements are lazily evaluated: in the previous examples, <CODE> precisely indicated code that hasn’t been evaluated:

#!/usr/bin/env -S nix-instantiate --eval
[(1+3) "hi" ("hello"+" "+"world") 3 (x: 3)]

[ <CODE> "hi" <CODE> 3 <CODE> ]

Compare with:

#!/usr/bin/env -S nix-instantiate --eval
let
	# toString can't coerce a function (e.g. (x: 3)) to a
	# string, hence why we're applying it here
	xs = [(1+3) "hi" ("hello"+" "+"world") 3 ((x: 3) 2)];
in
	"${toString xs}"

"4 hi hello world 3 3"

Or:

#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
	xs = [(1+3) "hi" ("hello"+" "+"world") 3 (x: 3)];
in
	deepSeq xs xs

[ 4 "hi" "hello world" 3 <LAMBDA> ]

You may remember append from the last article: in Nix, lists can be concatenated using the ++ operator
```
#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
	xs = []++[1 2]++[]++[3 4]++[5];
in
	"${toString xs}"
```
```
"1 2 3 4 5"
```
Note that, as far as I know, there’s no direct equivalent to our cons, the idiomatic way seems to be to rely on ++, e.g. [x]++xs.

There are two main ways to access individual elements of a list: either via elemAt xs n, which returns the n-th element of the list xs (you may find length xs useful), or via head/tail, which are equivalent to our previous car and cdr:

#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
	map = f: xs: let n = length xs; aux = acc: i:
		if i >= n then
			acc
		else
			aux (acc++[(f (elemAt xs i))]) (i + 1)
		; in aux [] 0;

	map2 = f: xs: if xs == [] then xs
		else [(f (head xs))] ++ (map2 f (tail xs));

	xs = map  (x: 2*x) [1 2 3];
	ys = map2 (y: 2*y) [1 2 3];
in
	trace(deepSeq xs xs)
	trace(deepSeq ys ys)
	"ok"

trace: [ 2 4 6 ]
trace: [ 2 4 6 ]
"ok"

But, in Nix, head/tail are discouraged in favor of elemAt! We’ll see why in a moment.

In a similar vein, Nix has something that almost works like our flatten, a map and a foldl' (there are a few other useful functions, have a look at the builtins documentation):

#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
	# concatLists' parameter *must* be a list of lists, no
	# scalars allowed down there
	xs = concatLists (map (x: if !isList x then [x] else x) [[1 2] 3]);

	# This is a more efficient and more general "variant" of
	# concatLists (equivalent to concatLists (map f xs))
	ys = concatMap (x: if !isList x then [x] else x) [[1 2] 3 [4]];

	# the ' indicates that foldl is *strict* here, meaning, it
	# will systematically evaluate the list entries before processing
	# them (like deepSeq does).
	n = foldl' add 0 (map (x: 2*x) [1 2 3]);
in
	trace(xs)
	trace(ys)
	trace(n)
	"ok"

trace: [ 1 2 3 ]
trace: [ 1 2 3 4 ]
trace: 12
"ok"

Exercise: Nixpkgs, NixOS’s package manager, is built with Nix. It comes with a library of additional functions on lists. Find it and spend some time reading its code. Try to find foldl'’s code. How does it compare to our implementation?

Solution:[+]

Here’s a simple list exercise to warm you up, this time, on real Nix lists. We’ll need this function for the following section.

Exercise: Write a function take n xs which returns the list made out of the first n elements of the list xs. Implement it as a recursive function thrice: once with head/tail, once with elemAt, and once with foldl'.

\[\begin{aligned} (\text{take}\ n\ [x_0\ x_1\ ...\ x_{n-1}\ ...\ \ x_m]) = [x_0\ x_1\ ...\ x_{n-1}] \end{aligned}\]

Solution:[+]

Exercise: Writing simple code is not always easy; often, simplicity is achieved progressively, by taking the time to identify and remove superfluous elements.

“I have made this [letter] longer than usual because I have not had time to make it shorter.”

Blaise Pascal

Most of the code I demonstrate in this series has gone through such a refinement process. For instance, try to simplify the following piece of code; which function is it?

	XXX = f: l:
		if l == [] then
			(f)
		else let
			x = head l;
			xs = tail l;
		in
			if xs == [] then
				(f x)
			else
				foldl' (x: y: x y) (f x) xs;

Solution:[+]

Lazy lists?

A common pattern in lazy languages such as Haskell, is the ability to generate “infinite” lists. The idea is to write infinitely recursive code: in practice, it will only be executed as far as we need to. Remember, this works because laziness will delay the evaluation as much as possible.

For example, the iterate function in Haskell is used to generate infinite lists by iteratively applying a function to a previous iteration, starting at a given value.

\[\begin{aligned} (\text{iterate}\ \phi\ e) = [\phi(e)\ (\underbrace{\phi\circ\phi}_{=:\phi^2})(e))\ \phi^3(e)\ ... ] \end{aligned}\]

A common way to “stop” the infinite process is to use take, to select the first n elements of such an infinite list. Here’s a typical Haskell implementation of iterate with an example:

-- ghc -Wno-tabs -o /tmp/a.out $fn && /tmp/a.out

iter :: (a -> a) -> a -> [a]
iter f e = (f e):(iter f (f e))

main = do
	print (take 5 (iter (*2) 1))

[1 of 1] Compiling Main             ( /tmp/t.hs, /tmp/t.o )
Linking /tmp/a.out ...
[2,4,8,16,32]

Note: While laziness presented as such can be intellectually delightful, its value for engineering is debatable: it can for instance hinder reasoning on the behavior or performances of realistically more complex programs.

Now, let’s try to do the same thing with Nix, this shouldn’t be too hard.

Exercise: Write a version of iterate in Nix (using Nix’s lists).

Tip:[+]

Solution:[+]

The reason why this happens is that Nix’s lists are actually fixed-size arrays. In the same way that we choose to use 32/64 bits fixed-size integers instead of systematically using “bignums” (or Church integers…), Nix’s authors made the choice to used fix-sized arrays instead of unbounded lists (linked lists was another option for instance).

They’ve traded “expressivity” for efficiency. Given the fact that most applications don’t need unbounded lists, it’s a fairly reasonable choice. The same kind of reasoning led the authors to implement foldl' or map directly in C++, without recursivity: we’ve demonstrated earlier that such common patterns can be used as building blocks for more advanced lists manipulation code: the latter will directly benefit from an efficient implementation.

So, is Nix a lazy language? Well, it is, to up to some pragmatic considerations. The links between the list elements aren’t lazily evaluated, but the elements themselves are. Don’t worry if this last point is a bit mysterious, we’ll come back to it shortly.

Exercise: Find the code of foldl'. Try to understand what it does.

Solution:[+]

Exercise: tail'’s usage is discourage is the builtins documentation. Try to find its code, and try to understand what’s happening, and why elemAt is more idiomatic in Nix.

Also try to find how/where lists are stored/created.

Solution:[+]

Exercise: Find the stack used for function calls.

Tip:[+]

Solution:[+]

Exercise: Recall the first list implementation we had in the previous article (you could use any latter ones, but this one is sufficient here):

rec {
	cons    = h: t: (x: if x then h else t);

	nil     = null;
	isEmpty = l: l == nil;
	access  = x: l: if isEmpty l
		then throw "list is empty"
		else l x
	;

	car = access true;
	cdr = access false;
}

Implement a zeroes “function”, which generates an infinite list containing only zeroes (or, implement iterate f n, as you wish). Implement take n xs for such lists. Does it work? Why/why not?

Solution:[+]

Forest distant view (Лесные дали, 1884), oil on canvas, 113×164 cm by Ivan Ivanovich Shishkin (Ива́н Ива́нович Ши́шкин, 1832-1898) through artpoisk.info wikimedia.org – Public domain

Exercise: There’s yet another way to proceed: if you’ve understood the key point of the previous solution, then you should be able to find a way to use use Nix’s lists to implement lazy lists.

Again, suffice to check that calling take on zeroes work; if your list implementation has the same interface as before (cons, car, nil, cdr, etc.), then zeroes and take are already written.

What’s the advantage of this approach compared to the previous one?

Solution:[+]

Exercise: Recall the implementation of Eratosthenes’ sieve we had in the second article of this series, which allows to compute prime numbers up to a given, fixed number:

#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
	foldlnseq = x: f: s: n: let aux = acc: i:
		if i > n then
			acc
		else
			aux (f acc (x i)) (i + 1)
		; in aux s 1;

	# foldln is now a special case of foldlnseq
	foldln = foldlnseq (x: x);

	toFloat = n: n + 0.1 - 0.1;
	isDiv = i: j: ceil(toFloat(i) / j) * j == i;

	sieve = n: foldln (p: i:
		if p i then
			trace("${toString i} is prime!")
			(j : !(isDiv j i) && (p j))
		else
			trace("${toString i} is not prime")
			p
	) (i: i != 1) n;
in
	sieve 50

trace: 1 is not prime
trace: 2 is prime!
trace: 3 is prime!
trace: 4 is not prime
trace: 5 is prime!
trace: 6 is not prime
trace: 7 is prime!
trace: 8 is not prime
trace: 9 is not prime
trace: 10 is not prime
trace: 11 is prime!
trace: 12 is not prime
trace: 13 is prime!
trace: 14 is not prime
trace: 15 is not prime
trace: 16 is not prime
trace: 17 is prime!
trace: 18 is not prime
trace: 19 is prime!
trace: 20 is not prime
trace: 21 is not prime
trace: 22 is not prime
trace: 23 is prime!
trace: 24 is not prime
trace: 25 is not prime
trace: 26 is not prime
trace: 27 is not prime
trace: 28 is not prime
trace: 29 is prime!
trace: 30 is not prime
trace: 31 is prime!
trace: 32 is not prime
trace: 33 is not prime
trace: 34 is not prime
trace: 35 is not prime
trace: 36 is not prime
trace: 37 is prime!
trace: 38 is not prime
trace: 39 is not prime
trace: 40 is not prime
trace: 41 is prime!
trace: 42 is not prime
trace: 43 is prime!
trace: 44 is not prime
trace: 45 is not prime
trace: 46 is not prime
trace: 47 is prime!
trace: 48 is not prime
trace: 49 is not prime
trace: 50 is not prime
<LAMBDA>

Re-write sieve to use an infinite list instead of a bounded sequence, that is sieve should return an infinite list containing all prime numbers. Use take to select only some of them.

Tip:[+]

Solution:[+]

Sets

Here’s a small primer on Nix’s “attribute sets”, which can be assimilated to read-only maps, mapping a string index to an arbitrary, lazy-evaluated expression (number, function, etc.).

Starting with a simple example; note that we can nest sets, and reference a higher set from a lower one, thus creating cyclic structures is possible:

#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
	x = {
		n = 3;
		s = "hello";
		f = (x: x);
		y = {
			m = 4;
			z = x;
		};
		"what" = "ok";
	};
in deepSeq x x

{ f = <LAMBDA>; n = 3; s = "hello"; what = "ok"; y = { m = 4; z = «repeated»; }; }

But having an attribute referring to one’s sibling is not possible:

#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
	x = {
		n = 3;
		m = n+2;
	};
in deepSeq x x

error:
       … while calling the 'deepSeq' builtin
         at ./nix/set/set1.nix:8:4:
            7|         };
            8| in deepSeq x x
             |    ^
            9|

       … while evaluating the attribute 'm'
         at ./nix/set/set1.nix:6:3:
            5|            n = 3;
            6|           m = n+2;
             |   ^
            7|      };

       error: undefined variable 'n'
       at ./nix/set/set1.nix:6:7:
            5|              n = 3;
            6|           m = n+2;
             |       ^
            7|  };

Unless one uses the rec keyword:

#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
	x = rec {
		n = 3;
		m = n+2;
	};
in deepSeq x x

{ m = 5; n = 3; }

Or, use something that starts to look like an “object”:

#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
	x = {
		n = 3;
		m = 2;
		k = 1;
		l = 3;

		# There are a few equivalent ways to access the values
		# corresponding to a given attribute in a set.
		f = (this: let s = "l"; in
			this."n" + (getAttr "m" this) + this.k + this."${s}"
		);
	};
in x.f(x)

Hm, that’s essentially a getter. How about setters? Well, Nix is a lazy language: sets are immutable, so our setter would need to return the updated set:

#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
	x = {
		n = 3;
		m = 4;
		set = (this: f: v: {
			"${f}" = v;
			m      = this.m;
		});
	};
	y = (x.set x "n" 42);
in
	deepSeq y y

{ m = 4; n = 42; }

But repeating all fields is tedious (and error-prone: didn’t we just forgot to copy our set for instance?), so we can use the s1 // s2 operator : it creates a new set starting from s1, and then add all entries from s2, replacing existing ones:

#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
	x = {
		n = 3;
		m = 4;
		set = (this: f: v: this // {
			"${f}" = v;
		});
	};
	y = (x.set x "n" 42);
in
	deepSeq y y

{ m = 4; n = 42; set = <LAMBDA>; }

The inherit keyword can help to include elements from another set:

#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
	z = {
		greet = w: "hello, "+w;
		add   = a: b: a + b;
	};
	a = 3;
	x = {
		inherit a;
		inherit (z) greet add;
		n = 3;
		m = 4;
		set = (this: f: v: this //{
			"${f}" = v;
		});
	};
	y = (x.set x "n" 42);
in
	deepSeq y y

{ a = 3; add = <LAMBDA>; greet = <LAMBDA>; m = 4; n = 42; set = <LAMBDA>; }

Exercise: We previously discussed three different implementations of take n xs. The last one relied on foldl', and used a list of two items as an accumulator. Try to re-implement that last version, but using a set instead of a list as an accumulator.

This is a simple exercise to practice basic set manipulation, nothing fancy.

Solution:[+]

Exercise: Use sets to try to implement lazy lists, as we previously did/attempted. Again, use either zeroes or iterate, and take to break free from the infinite recursion.

Solution:[+]

“Modules”

Let’s recall our most basic hello world:

#!/usr/bin/env -S nix-instantiate --eval
"hello world"

"hello world"

As we’ve already discussed, the value printed by Nix corresponds to the (single) value returned by the script. A “module” can be considered as a script that evaluates to a set containing a bunch of functions.

Such a set can then be imported from another .nix file via import. One can import the whole module, or just parts of it:

# Note that rec is mandatory here
rec {
	fib = n:
		if n == 0 || n == 1 then
			n
		else
			fib(n - 1)+fib(n - 2)
	;
}

#!/usr/bin/env -S nix-instantiate --eval
with builtins;
let
	m = (import ./mod-fib.nix);
	f = (import ./mod-fib.nix).fib;
	xs = [ (m.fib 10) (f 10) ];
in
	deepSeq xs xs

[ 55 55 ]

We’ll see a few examples in the following section.

Noon. Neighborhoods of Moscow (Полдень. В окрестностях Москвы, 1869), oil on canvas, 111×80 cm by Ivan Ivanovich Shishkin (Ива́н Ива́нович Ши́шкин, 1832-1898) through wikimedia.org – Public domain

Strings, bytes

The next two articles will involve writing small parsers. Thus, we’ll need to know how to process strings. But Nix’s strings weren’t really designed for what we’re going to inflict upon them.

So, we’ll start here by understanding their limitations, and write a few functions to help us with more sophisticated code in the future.

Exercise: I’ve previously pointed you to Nixpkgs' lists.nix; there’s also a strings.nix, containing a bunch of functions, and showing you how idiomatic Nix’s strings handling is to be done. So have a look at it, don’t hesitate to read Nix’s language documentation either.

Again, reading code really is key to an efficient progress.

Exercise: Now, we’d want to access the ASCII code corresponding to a single character. For instance, if we want to parse a number, say "12345", the common approach is to rely on the ASCII codes, and to be able to detect whether a character corresponds to a digit.

So write isNum c, which returns true iff the character c corresponds to a digit, by first implementing two functions toInt c and toChar i, which respectively return the ASCII code corresponding to a character c, and the character corresponding to the ASCII code i.

Write also a function isWhite c, which returns true iff c is a white character (space, newline, etc.)

Store all those functions in an ascii.nix module.

Tip:[+]

Solution:[+]

Exercise: Write the following functions, and store them in a strings.nix module:

charAt s n, which works like elemAt xs n but for strings instead of lists;
eat1 s, which returns s, without its first character (we now have a head/tail for strings/byte arrays, don’t we?);
skipWhites s n, which returns an index m corresponding to the first non-white character in s after the n-th position.

Solution:[+]

Exercise: Here’s an interesting exercise for an article entitled “Pragmatism, laziness”.

The goal is to re-implement a delightfully simple recursive regular expression matcher, written by Rob Pike (Go, UTF-8, Sam, etc.) for the book The Practice of Programming. Brian Kernighan (Unix, AWK, etc.), despite his decades of work in computer science, and his considerable contributions to the field, said of it:

I was amazed by how compact and elegant this code was when Rob Pike first wrote it – it was much smaller and more powerful than I had thought possible

There’s little reason to doubt Kernighan’s humility: over-thinking is a real problem in computer science, and the “best” solutions often are deceptively simple. The same goes in other domains I guess: think how bothersome it was not to have a zero in math, or how simple relativity’s core principles are.

Parenthesis aside, the regular expression matcher should support:

* : the character preceding the star is expected to be matches zero or more times; e.g. a* matches "", "aaaa" or "a";
. : matches any character; e.g. foo.*bar matches "foobar", or "foo!bar";
^ and $ : which are markers corresponding to beginning/end of line, e.g. ^foo or bar$ both match "foobar" but not "barfoo".

Tip:[+]

Solution:[+]

“OOP”, state

A common exercise in Lisp education is to write a small object system using closures, where “object” is to be understood as in your typical mainstream programming language, wrapping variables and methods on such variables (not in the Smalltalk sense of message passing). This is all rather similar to what we’ve done earlier.

For instance in the SICP §3.1, a basic “object” is defined as a closure whose argument corresponding to a method to be called on this “object”. Upon calling some methods, they may alter the underlying object: Scheme/Lisp are impure languages, meaning, there’s a notion of altering a variable. While we can’t do that with Nix, you should now have some workarounds in mind.

Note: While the SICP sticks to a very simple example, it’s actually possible to build very refined object systems in Scheme/Lisp by leveraging their macro systems, hence allowing for instance to express inheritance, class/instance variables, etc. This is again a feature we don’t have access to with Nix, at least directly.

Exercise: Try to implement such a small “mover” object. It should have two instance variables: a string and a pointer to that string (an integer), and the following methods:

prev: move one byte backward;
next: move one byte forward;
get: retrieve current byte;
setp q: set the pointer to the (integer value) q;
tail: returns the string located after the current position of the pointer.

Finally, write a chain o xs method allowing to iteratively call the methods described in the list xs, using an object o as the first target.

Tip:[+]

Solution:[+]

Teutoburg forest (Тевтобургский лес, 1865), oil on canvas by Ivan Ivanovich Shishkin (Ива́н Ива́нович Ши́шкин, 1832-1898) through wikiart.org – Public domain

Conclusion

We’re now ready to start writing more interesting pieces of code. In the following article, we’re going to study a few simple parsers/interpreters (Brainf*ck and two mathematical expressions interpreters), where we’ll use techniques that we’ll just developed in the current article.

We’ll then conclude this series with one last article, focusing on writing a λ-calculus interpreter, where we’ll apply what we’re about to learn about parsing, and what we’ve already learnt about closures.

In the series:

On Nix's language: Pragmatism, laziness (4/6)

Comments

By email, at mathieu.bivert chez:

On Nix's Language: Pragmatism, Laziness

Table Of Contents

In the series:

Lists

Lazy lists?

Sets

“Modules”

Strings, bytes

“OOP”, state

Conclusion

In the series:

Comments