Lately, we’ve been hanging around a rather theoretical corner
of computer science. However interesting and insightful, that
aspect must almost always be balanced. For instance, Church
booleans/integers are, at least on modern ordinary hardware,
impractical compared to fixed-size 32/64 bits integers.
Often in engineering, pragmatism is a decisive way to select between
multiple options, an approach which often leads to simple(r/st) choices.
We’ll use some aspects of the design of Nix itself to exemplify this point.
Our roadmap for this article will include some more details on laziness,
as in lazy evaluation (and not as a variant of “pragmatism”), and genuine Nix
lists/sets. Genuine in the sense, not some hand-implemented ones out of
closures, but the ones provided by the language.
Again, the code for this series is available on github.
Note that, as far as I know, there’s no direct equivalent to our
cons, the idiomatic way seems to be to rely on ++, e.g. [x]++xs.
There are two main ways to access individual elements
of a list: either via elemAt xs n, which returns
the n-th element of the list xs (you may find
length xs useful), or via
head/tail,
which are equivalent to our previous car and cdr:
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let map = f: xs: let n = length xs; aux = acc: i:
if i >= n then acc
else aux (acc++[(f (elemAt xs i))]) (i +1)
; in aux [] 0;
map2 = f: xs: if xs == [] then xs
else [(f (head xs))] ++ (map2 f (tail xs));
xs = map (x: 2*x) [123];
ys = map2 (y: 2*y) [123];
in trace(deepSeq xs xs)
trace(deepSeq ys ys)
"ok"
trace: [246]trace: [246]"ok"
But, in Nix, head/tail are discouraged in favor of elemAt!
We’ll see why in a moment.
In a similar vein, Nix has something that almost works like
our flatten, a map and a
foldl' (there are a few other useful functions,
have a look at the builtins documentation):
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let# concatLists' parameter *must* be a list of lists, no# scalars allowed down there xs = concatLists (map (x: if!isList x then [x] else x) [[12] 3]);
# This is a more efficient and more general "variant" of# concatLists (equivalent to concatLists (map f xs)) ys = concatMap (x: if!isList x then [x] else x) [[12] 3 [4]];
# the ' indicates that foldl is *strict* here, meaning, it# will systematically evaluate the list entries before processing# them (like deepSeq does). n = foldl' add 0 (map (x: 2*x) [123]);
in trace(xs)
trace(ys)
trace(n)
"ok"
trace: [123]trace: [1234]trace: 12"ok"
Exercise: Nixpkgs, NixOS’s package manager, is built with Nix. It comes
with a library of additional functions on lists. Find it and spend some time
reading its code. Try to find foldl'’s code. How does it compare to our
implementation?
Reading code written by experts is a great way to learn how to write
idiomatic code, regardless of the language. Strings and lists are often
a great introductory read for beginners. But reading code written by
less experienced programmers is also insightful in its own way, just
don't spend all your time feeding yourself out of awful code...
Regarding foldl''s code, this is a trick question: it's
implemented directly in C++ (Nix is an interpreted language, the interpreter
is written in C++), so you won't find it written in Nix. There is
however an implementation of foldl in
lists.nix.
And this one differs from ours in that it relies on elemAt
rather than using a head/tail pattern.
You may now be wondering:
Why is foldl' written in C++?
Why do experts use elemAt over head/tail
in a functional language to manipulate lists?
And that's what we're going to progressively answer in the next section.
The quick answer is that, those are pragmatic decisions: on one hand, it's
much less idiomatic in a functional setting to do so, but on the other
hand, it allows for a simple and reasonably efficient implementation, without
requiring too much from the user: integer-indexed arrays really aren't
that impressive, when compared to the mathematical jargon necessary to
understand some abstractions in Haskell for example...
Here’s a simple list exercise to warm you up, this time, on real Nix
lists. We’ll need this function for the following section.
Exercise: Write a function take n xs which returns the
list made out of the first n elements of the list xs. Implement
it as a recursive function thrice: once with head/tail, once with elemAt,
and once with foldl'.
There is an implementation of take
in Nixpkg's
lists.nix,
so you shouldn't have had much to do here. ;-)
If you haven't met it before, take a moment to compare it with the ones below.
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let# with head/tail take = n: xs:
if xs == [] || n ==0then []
else [(head xs)]++(take (n - 1) (tail xs))
;
# with elemAt and an auxiliary function take2 = n: xs: let m = length xs; aux = acc: i:
if i >= m then acc
else aux (acc++[(elemAt xs i)]) (i +1)
; in aux [] 0;
# and with foldl' take3 = n: xs: elemAt (foldl' (acc: x:
let m = elemAt acc 0;
ys = elemAt acc 1;
inif m >= n then acc
else [(m +1) (ys++[x])]
) [0 []] xs) 1;
xs = take 5 [12];
ys = take2 2 [12345];
zs = take 0 [];
ts = take3 3 [12345];
us = take3 9 [12345];
in trace(deepSeq xs xs)
trace(deepSeq ys ys)
trace(deepSeq zs zs)
trace(deepSeq ts ts)
trace(deepSeq us us)
"ok"
We may all agree that the foldl'-based implementation
is a bit clumsy. As a comparison, JavaScript's
reduce
would have made such a choice more reasonable, as reduce
provides its parameter function with the current index:
However, the idea of encoding more state in an accumulator,
for use either in fold, or more generally, in recursive
functions, will be very useful to us soon. Here, it's just one
specific additional integer, but you could imagine to store much
more singular data points. In which case by the way, a set is
often a preferable option.
More on that later.
Exercise: Writing simple code is not
always easy;
often, simplicity is achieved progressively, by taking the time
to identify and remove superfluous elements.
“I have made this [letter] longer than usual because I have not
had time to make it shorter.”
Most of the code I demonstrate in this series has gone through
such a refinement process. For instance, try to simplify the following
piece of code; which function is it?
XXX = f: l:
if l == [] then (f)
elselet x = head l;
xs = tail l;
inif xs == [] then (f x)
else foldl' (x: y: x y) (f x) xs;
It's apply f xs, and it was the first version
of it I've written for this series of articles, so unflattering,
"real life code".
apply = f: l: foldl' (x: y: x y) f l;
Lazy lists?
A common pattern in lazy languages such as Haskell, is
the ability to generate “infinite” lists. The idea is to write
infinitely recursive code: in practice, it will only
be executed as far as we need to. Remember, this works because
laziness will delay the evaluation as much as possible.
For example, the iterate function in Haskell
is used to generate infinite lists by iteratively applying a function to
a previous iteration, starting at a given value.
A common way to “stop” the infinite process is to use take, to
select the first n elements of such an infinite list. Here’s a typical Haskell
implementation of iterate with an example:
-- ghc -Wno-tabs -o /tmp/a.out $fn && /tmp/a.outiter:: (a -> a) -> a -> [a]
iter f e = (f e):(iter f (f e))
main=do print (take 5 (iter (*2) 1))
[1 of 1] Compiling Main ( /tmp/t.hs, /tmp/t.o )Linking /tmp/a.out ...
[2,4,8,16,32]
Note: While laziness presented as such can be intellectually
delightful, its value for engineering is debatable: it can for
instance hinder reasoning on the behavior or performances of
realistically more complex programs.
Now, let’s try to do the same thing with Nix, this shouldn’t be too hard.
Exercise: Write a version of iterate in Nix (using Nix’s lists).
You're expected to fail here.
More precisely, you're expected to meet an error: stack overflow
(possible infinite recursion).
Note: Well, Nix's error messages have improved in the
past few months: it's currently error: stack overflow; max-call-depth exceeded.
It might change again in the future, but hopefully you get the idea.
In case you're wondering, what's a
stack overflow:
when you perform a function call, you need to save the state of execution somewhere,
for instance, the values of all your variables. Then, when you return from
a function call, you can then restore the state you were in previously.
When calling a function, conceptually, the state is pushed on
a stack. When you come back from a function call, the state you want
to restore is pop'd from the same stack.
For the record, things are somehow similar in assembly: you
have a "hardware stack", and specific instructions to manipulate
it. While there are quite some differences in the details, the
main idea remains similar.
Whether in assembly or not, that stack always has some kind of maximal
size: a stack overflow precisely is about exceeding that maximal
size.
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let iterate = f: n: [f n]++(iterate f (f n));
take = n: xs:
if n ==0then []
else [(head xs)]++(take (n - 1) (tail xs))
;
xs = take 5 (iterate (x: 2*x) 1);
in deepSeq xs xs
error:
… while calling the 'deepSeq' builtin
at ./nix/list/iterate.nix:13:2:
12| in
13| deepSeq xs xs
| ^
14|
… while calling the 'head' builtin
at ./nix/list/iterate.nix:9:6:
8| else 9| [(head xs)]++(take (n - 1)(tail xs)) | ^
10| ;
error: stack overflow; max-call-depth exceeded
at ./nix/list/iterate.nix:4:26:
3| let
4| iterate = f: n: [f n]++(iterate f (f n));
| ^
5| take = n: xs:
The reason why this happens is that Nix’s lists are actually
fixed-size arrays. In the same way that we choose to use
32/64 bits fixed-size integers instead of systematically
using “bignums” (or Church integers…),
Nix’s authors made the choice to used fix-sized arrays instead of
unbounded lists (linked lists was another
option for instance).
They’ve traded “expressivity” for efficiency. Given the fact
that most applications don’t need unbounded lists, it’s a fairly
reasonable choice. The same kind of reasoning led the authors
to implement foldl' or map directly in C++, without recursivity:
we’ve demonstrated earlier that such common patterns can be used as
building blocks for more advanced lists manipulation code: the
latter will directly benefit from an efficient implementation.
So, is Nix a lazy language? Well, it is, to up to some pragmatic
considerations. The links between the list elements aren’t lazily
evaluated, but the elements themselves are. Don’t worry if this last
point is a bit mysterious, we’ll come back to it shortly.
Exercise: Find the code of foldl'. Try to understand
what it does.
For now at least, the code can be found on github,
in
src/libexpr/primops.cc, function prim_foldlStrict.
All the comments are mine.
/*
* Input: **args is an array containing three elements, corresponding to
* the arguments provided to foldl':
* args[0] : the function we want to call at each step
* args[1] : the "nul" value, or initial accumulator value
* args[2] : the list to fold
*
* Output: v on the other hand correspond to the return value of the foldl' (it's
* a reference (~pointer) where the returned value is expected to be stored)
*
* We don't really need to concern ourselves with state/pos.
*/staticvoidprim_foldlStrict(EvalState & state, const PosIdx pos, Value ** args, Value & v)
{
/*
* Those play two roles:
* 1. Force evaluation of args[0]/args[2] (again,
* this is a strict function);
* 2. Assert that they have the proper types (an
* exception is thrown otherwise).
*/ state.forceFunction(*args[0], pos);
state.forceList(*args[2], pos);
/*
* "If the list is not empty"
*/if (args[2]->listSize()) {
/*
* vCur corresponds to the accumulator: we're initializing
* it with args[1], which contains the nul value (the initial
* accumulator value).
*/ Value * vCur = args[1];
/*
* "For each element of the list"
*/for (auto [n, elem] : enumerate(args[2]->listItems())) {
/*
* Create a "Value" to hold the accumulator and the
* current element: those will be the arguments we'll
* pass to args[0], the function being applied at each step.
*/ Value * vs []{vCur, elem};
/*
* This one is interesting: we're testing to see if
* we're on the last iteration of the loop (on the last
* element of the list). If we are, vCur now points to
* the same address as the return value v.
*
* vCur's previously pointed location is still preserved
* in vs.
*
* Otherwise, we allocate a new value. Why do we do that?
* Well, because Nix is a pure language: there are no
* assignments, so "updates" are encoded by creating new
* values.
*/ vCur = n == args[2]->listSize() -1?&v : state.allocValue();
/*
* Finally, we call the function
*/ state.callFunction(*args[0], 2, vs, *vCur, pos);
}
/*
* Remember, the "'" in foldl' mentions that this is a strict
* implementation, meaning, it will force the evaluation of
* all its arguments / of all the list.
*
* We also force here the evaluation of the returned value.
*/ state.forceValue(v, pos);
} else {
/*
* The list is empty, there's no reason to go through it.
*
* We must still force the evaluation of args[1], which
* is the initial accumulator value.
*/ state.forceValue(*args[1], pos);
/*
* And, we're simply "returning" this value.
*/ v =*args[1];
}
}
Exercise:tail'’s usage is discourage is the
builtins documentation. Try to
find its code, and try to understand what’s happening, and why
elemAt is more idiomatic in Nix.
Also try to find how/where lists are stored/created.
So the idea is that we're re-creating a new list each time
we call tail, and that we're manually setting each individual
element, one at a time, hence why it's qualified as being \(O(n)\).
I'm not sure why they don't reuse the previous list though (I'm
confident there's a good reason, but either I don't have enough
time or aren't clever enough to figure it out).
(The comment is not mine.)
/* Return a list consisting of everything but the first element of
a list. Warning: this function takes O(n) time, so you probably
don't want to use it! */staticvoidprim_tail(EvalState & state, const PosIdx pos, Value ** args, Value & v)
{
state.forceList(*args[0], pos);
if (args[0]->listSize() ==0)
state.debugThrowLastTrace(Error({
.msg = hintfmt("'tail' called on an empty list"),
.errPos = state.positions[pos]
}));
state.mkList(v, args[0]->listSize() -1);
for (unsignedint n =0; n < v.listSize(); ++n)
v.listElems()[n] = args[0]->listElems()[n +1];
}
state.mkList() actually calls the mkList
method for the Value type. If you look carefully, you'll
see that they've made a special case for small lists (1 or 2 elements),
for which they have a small array, which avoids them the need
for memory allocation (slow).
// src/libexpr/eval.cc
void EvalState::mkList(Value & v, size_t size)
{
v.mkList(size);
if (size >2)
v.bigList.elems = (Value **) allocBytes(size *sizeof(Value *));
nrListElems += size;
}
// src/libexpr/value.hh
structValue{
...
public: ...
union {
// For curiosity:
NixInt integer;
bool boolean;
...
// And here are the lists
struct {
size_t size;
Value ** elems;
} bigList;
Value * smallList[2];
...
}
...
inlinevoid mkList(size_t size)
{
clearValue();
if (size ==1)
internalType = tList1;
elseif (size ==2)
internalType = tList2;
else {
internalType = tListN;
bigList.size = size;
}
}
...
Value ** listElems()
{
return internalType == tList1 || internalType == tList2 ? smallList : bigList.elems;
}
}
% grep -Rn 'stack overflow'src/libmain/shared.hh:113:/* Install a SIGSEGV handler to detect stack overflows. */
src/libmain/shared.hh:116:/* Pluggable behavior to run in case of a stack overflow.
src/libmain/stack.cc:16: /* Detect stack overflows by comparing the faulting address with
src/libmain/stack.cc:49: /* Install a SIGSEGV handler to detect stack overflows. This
src/libmain/stack.cc:72: char msg[]="error: stack overflow (possible infinite recursion)\n";
src/libmain/shared.cc:216: /* Register a SIGSEGV handler to detect stack overflows. */
doc/manual/src/release-notes/rl-1.6.md:64: - If a stack overflow occurs in the Nix evaluator, you now get a
The src/libmain/stack.cc seems to be what we're after.
There is, in some cases, an attempt to detect stack overflows by
catching SIGSEGV ("segmentation fault"). SIGSEGV is a signal(3),
an old and primitive way for UNIX processes to communicate. This signal
is sent by the kernel to a process that tries to access an out
of reach memory segment (while the reality is more complicated nowadays,
think, a random process trying to read segments owned by a SQL database
containing sensitive data).
It's important to note that it implies Nix doesn't use an ad-hoc
stack for function calls, and instead rely on the "hardware stack"
we briefly mentioned earlier: a Nix function call is mapped to a
C++ function call (more or less, there may be more than one, but you
get the idea): if the C++ code recurses too far, it'll exhaust
the hardware stack, thus forcing the kernel to terminate the
Nix interpreter.
SIGSEGV typically happens when we're going out of bounds in an array,
but because the stack is implemented as a portion of memory (a "segment"),
it's also triggered upon infinite recursion. Consider the following C
program as an example:
Because the error is likely common when using Nix, its authors tried
to distinguish both cases so as to provide a more useful error message: an
infinite recursion caused by Nix users should be perceived differently
than an internal error in Nix's C++ code.
If you want to dig deeper, you can confirm further our previous hypothesis
regarding the fact that Nix doesn't use an ad-hoc stack for evaluation by
looking at
src/libexpr/eval.cc (evaluation code),
and at src/libexpr/nixexpr.hh, which contains the main data
structures of interest to understand the evaluation code.
Exercise: Recall the first list implementation we had
in the previous article
(you could use any latter ones, but this one is sufficient here):
rec {
cons = h: t: (x: if x then h else t);
nil =null;
isEmpty = l: l == nil;
access = x: l: if isEmpty l
thenthrow"list is empty"else l x
;
car = access true;
cdr = access false;
}
Implement a zeroes “function”, which generates an infinite list
containing only zeroes (or, implement iterate f n, as you wish).
Implement take n xs for such lists. Does it work? Why/why not?
As already evoked, a reason why Nix's lists fail to be
"fully lazy" is because the links between list items aren't
lazily evaluated. But when we use functions to encode
such links, because Nix's function call are lazily evaluated,
then we get lazy lists:
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
with (import./list.nix);
with (import./list-lib.nix);
let zeroes = cons 0 (zeroes);
take = n: xs: let aux = acc: xs: i:
if i == n then acc
else aux (cons (car xs) acc) (cdr xs) (i +1)
; in aux nil xs 0;
xs = take 5 zeroes;
in trace(print xs)
"ok"
trace: [0, 0, 0, 0, 0]"ok"
Exercise: There’s yet another way to proceed: if you’ve understood
the key point of the previous solution, then you should be able
to find a way to use use Nix’s lists to implement lazy lists.
Again, suffice to check that calling take on zeroes work;
if your list implementation has the same interface as before
(cons, car, nil, cdr, etc.), then zeroes and
take are already written.
What’s the advantage of this approach compared to the previous one?
As for the advantage: when using closure-based lists, we "can't", or
at least, "it's complicated" to have lists containing functions,
because we "can't" distinguish between regular functions and those
used to tie items together.
There are no such complications here.
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let nil =null;
cons = h: t: [h t];
car = xs: elemAt xs 0;
cdr = xs: elemAt xs 1;
# And that's exactly the code we have for the# previous version: just the list implementation# has changed. zeroes = cons 0 (zeroes);
take = n: xs: let aux = acc: xs: i:
if i == n then acc
else aux (cons (car xs) acc) (cdr xs) (i +1)
; in aux nil xs 0;
xs = take 5 zeroes;
in deepSeq xs xs
[0[0[0[0[0 null ]]]]]
Exercise: Recall the implementation of Eratosthenes’ sieve
we had in the
second article
of this series, which allows to compute prime numbers up to a given, fixed
number:
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let foldlnseq = x: f: s: n: let aux = acc: i:
if i > n then acc
else aux (f acc (x i)) (i +1)
; in aux s 1;
# foldln is now a special case of foldlnseq foldln = foldlnseq (x: x);
toFloat = n: n +0.1 - 0.1;
isDiv = i: j: ceil(toFloat(i) / j) * j == i;
sieve = n: foldln (p: i:
if p i then trace("${toString i} is prime!")
(j : !(isDiv j i) && (p j))
else trace("${toString i} is not prime")
p
) (i: i !=1) n;
in sieve 50
trace: 1 is not prime
trace: 2 is prime!
trace: 3 is prime!
trace: 4 is not prime
trace: 5 is prime!
trace: 6 is not prime
trace: 7 is prime!
trace: 8 is not prime
trace: 9 is not prime
trace: 10 is not prime
trace: 11 is prime!
trace: 12 is not prime
trace: 13 is prime!
trace: 14 is not prime
trace: 15 is not prime
trace: 16 is not prime
trace: 17 is prime!
trace: 18 is not prime
trace: 19 is prime!
trace: 20 is not prime
trace: 21 is not prime
trace: 22 is not prime
trace: 23 is prime!
trace: 24 is not prime
trace: 25 is not prime
trace: 26 is not prime
trace: 27 is not prime
trace: 28 is not prime
trace: 29 is prime!
trace: 30 is not prime
trace: 31 is prime!
trace: 32 is not prime
trace: 33 is not prime
trace: 34 is not prime
trace: 35 is not prime
trace: 36 is not prime
trace: 37 is prime!
trace: 38 is not prime
trace: 39 is not prime
trace: 40 is not prime
trace: 41 is prime!
trace: 42 is not prime
trace: 43 is prime!
trace: 44 is not prime
trace: 45 is not prime
trace: 46 is not prime
trace: 47 is prime!
trace: 48 is not prime
trace: 49 is not prime
trace: 50 is not prime
<LAMBDA>
Re-write sieve to use an infinite list instead of a bounded sequence,
that is sieve should return an infinite list containing all prime numbers.
Use take to select only some of them.
Note that we've just re-imported reverse from an earlier
exercise, simply to have the list in ascending order. Again, we can
reuse the exact same code because it only relies on cons,
car, etc.
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let# Any lazy list implementation would do, as# long as it respect our "API". nil =null;
isEmpty = xs: xs == nil;
cons = h: t: [h t];
car = xs: elemAt xs 0;
cdr = xs: elemAt xs 1;
take = n: xs: let aux = acc: xs: i:
if i == n then acc
else aux (cons (car xs) acc) (cdr xs) (i +1)
; in aux nil xs 0;
reverse = xs: let aux = acc: xs:
if isEmpty xs then acc
else aux (cons (car xs) acc) (cdr xs)
; in aux nil xs
;
toFloat = n: n +0.1 - 0.1;
isDiv = i: j: ceil(toFloat(i) / j) * j == i;
sieve =let aux = p: i:
if i !=1&& p i then# This is the most "correct" variant for the accumulated# closure; but both would work OK in this context. cons i (aux (j : (j <= i ||!(isDiv j i)) && (p j)) (i +1))
else aux p (i +1)
; in aux (i: i !=1) 1;
xs = reverse (take 10 sieve);
in deepSeq xs xs
[2[3[5[7[11[13[17[19[23[29 null ]]]]]]]]]]
Sets
Here’s a small primer on Nix’s “attribute sets”, which
can be assimilated to read-only maps, mapping a string index
to an arbitrary, lazy-evaluated expression (number, function,
etc.).
Starting with a simple example; note that we can nest sets,
and reference a higher set from a lower one, thus creating
cyclic structures is possible:
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let x = {
n =3;
s ="hello";
f = (x: x);
y = {
m =4;
z = x;
};
"what"="ok";
};
in deepSeq x x
{ f = <LAMBDA>; n = 3; s ="hello"; what ="ok"; y ={ m = 4; z = «repeated»; }; }
But having an attribute referring to one’s sibling is not
possible:
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let x = {
n =3;
m = n+2;
};
in deepSeq x x
error:
… while calling the 'deepSeq' builtin
at ./nix/set/set1.nix:8:4:
7| };
8| in deepSeq x x
| ^
9|
… while evaluating the attribute 'm' at ./nix/set/set1.nix:6:3:
5| n = 3;
6| m = n+2;
| ^
7| };
error: undefined variable 'n' at ./nix/set/set1.nix:6:7:
5| n = 3;
6| m = n+2;
| ^
7| };
Unless one uses the rec keyword:
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let x =rec {
n =3;
m = n+2;
};
in deepSeq x x
{ m = 5; n = 3; }
Or, use something that starts to look like an “object”:
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let x = {
n =3;
m =2;
k =1;
l =3;
# There are a few equivalent ways to access the values# corresponding to a given attribute in a set. f = (this: let s ="l"; in this."n"+ (getAttr "m" this) + this.k + this."${s}" );
};
in x.f(x)
9
Hm, that’s essentially a getter. How about setters?
Well, Nix is a lazy language: sets are immutable, so
our setter would need to return the updated set:
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let x = {
n =3;
m =4;
set = (this: f: v: {
"${f}"= v;
m = this.m;
});
};
y = (x.set x "n"42);
in deepSeq y y
{ m = 4; n = 42; }
But repeating all fields is tedious (and error-prone: didn’t we just
forgot to copy our set for instance?), so we can
use the s1 // s2 operator : it creates a new set starting
from s1, and then add all entries from s2, replacing
existing ones:
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let x = {
n =3;
m =4;
set = (this: f: v: this // {
"${f}"= v;
});
};
y = (x.set x "n"42);
in deepSeq y y
{ m = 4; n = 42; set = <LAMBDA>; }
The inherit keyword can help to include elements from another
set:
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let z = {
greet = w: "hello, "+w;
add = a: b: a + b;
};
a =3;
x = {
inherit a;
inherit (z) greet add;
n =3;
m =4;
set = (this: f: v: this //{
"${f}"= v;
});
};
y = (x.set x "n"42);
in deepSeq y y
{ a = 3; add = <LAMBDA>; greet = <LAMBDA>; m = 4; n = 42; set = <LAMBDA>; }
Exercise: We previously discussed three different implementations
of take n xs. The last one relied on foldl', and used a list of
two items as an accumulator. Try to re-implement that last version, but
using a set instead of a list as an accumulator.
This is a simple exercise to practice basic set manipulation, nothing fancy.
As you can see, the code is more compact/clearer. We'll rely on
this idea of using a set to encode much more sophisticated states
in later exercises.
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let take = n: xs: (foldl' (acc: x:
if acc.m >= n then acc
else { m = (acc.m +1); r = acc.r ++ [x]; }
) { m =0; r = []; } xs).r;
xs = take 5 [12];
ys = take 9 [12345];
in trace(deepSeq xs xs)
trace(deepSeq ys ys)
"ok"
trace: [12]trace: [12345]"ok"
Exercise: Use sets to try to implement lazy lists,
as we previously did/attempted. Again, use either zeroes
or iterate, and take to break free from the infinite
recursion.
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let nil =null;
isEmpty = xs: xs == nil;
cons = x: xs: {
car = x;
cdr = xs;
};
car = xs: xs.car;
cdr = xs: xs.cdr;
zeroes = cons 0 (zeroes);
# Actually we would need to reverse the output, or to use# append instead of cons (more expensive), but it's good# enough to demonstrate our point. take = n: xs: let aux = acc: xs: i:
if i == n then acc
else aux (cons (car xs) acc) (cdr xs) (i +1)
; in aux nil xs 0;
xs = take 5 zeroes;
in deepSeq xs xs
{ car = 0; cdr ={ car = 0; cdr ={ car = 0; cdr ={ car = 0; cdr ={ car = 0; cdr = null; }; }; }; }; }
As we’ve already discussed, the value printed by Nix corresponds
to the (single) value returned by the script. A “module” can
be considered as a script that evaluates to a set containing
a bunch of functions.
Such a set can then be imported from another .nix file via
import. One can import the whole
module, or just parts of it:
# Note that rec is mandatory hererec {
fib = n:
if n ==0|| n ==1then n
else fib(n - 1)+fib(n - 2)
;
}
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let m = (import./mod-fib.nix);
f = (import./mod-fib.nix).fib;
xs = [ (m.fib 10) (f 10) ];
in deepSeq xs xs
[5555]
We’ll see a few examples in the following section.
Strings, bytes
The next two articles will involve writing small parsers.
Thus, we’ll need to know how to process strings. But
Nix’s strings weren’t really designed for what we’re going to
inflict upon them.
So, we’ll start here by understanding their limitations, and
write a few functions to help us with more sophisticated
code in the future.
Exercise: I’ve previously pointed you to Nixpkgs'
lists.nix;
there’s also a
strings.nix,
containing a bunch of functions, and showing you how idiomatic
Nix’s strings handling is to be done. So have a look at it, don’t
hesitate to read Nix’s language documentation either.
Again, reading code really is key to an efficient progress.
Exercise: Now, we’d want to access the ASCII code
corresponding to a single character. For instance, if we want
to parse a number, say "12345", the common approach is
to rely on the ASCII codes, and to be able to detect whether
a character corresponds to a digit.
So write isNum c, which returns true iff the character
c corresponds to a digit, by first implementing two functions
toInt c and toChar i, which respectively return
the ASCII code corresponding to a character c, and the character
corresponding to the ASCII code i.
Write also a function isWhite c, which returns true iff
c is a white character (space, newline, etc.)
This is an easy exercise, compared to what we've
done earlier. The trick here is to understand how to do
such things idiomatically with Nix, which doesn't really
come equipped for such tasks.
The already mentioned
strings.nix,
from Nixpkgs, contains a method charToInt precisely
doing that kind of stuff.
Note that we don't really need to rely on ASCII codes,
even to implement a number parsing function; we could
simply associate the corresponding number to each digit
character directly.
We're reusing Nixpkg's idea of having a set to encode
the ASCII table. The main difference is that
we're splitting it into sub-tables, so as to categorize
characters. We could refine the split, and reach something
close to C's
ctype.h,
but we won't need it.
Note that rtable can for the most part be automatically
generated by a simple shell script. Heck, table can too.
There's no builtin way to directly index a single character. However, one can use
substring start len s.
Note that this is really a byte indexing, you won't
get unicode characters or anything fancy. Note also
that substring or stringLength
count bytes.
While such functions are rather trivial, their main
advantage is to improve readability. We'll use them later on,
so keep this module around.
with builtins;
with (import./ascii.nix);
rec {
charAt = s: n: substring n 1 s;
eat1 = s: substring 1 ((stringLength s) - 1) s;
skipWhites = s: n: if isWhite (charAt s n) then skipWhites s (n +1) else n;
}
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
with (import./strings.nix);
let s ="hello\t\t world";
in trace(charAt "hello"4)
trace(charAt "hello"5)
trace(eat1 "hhello")
# Raises an error; that's fine though# trace(charAt "hello" (-1)) trace(substring (skipWhites s 5) 5 s)
"ok"
trace: o
trace:
trace: hello
trace: world
"ok"
Exercise: Here’s an interesting exercise for an article entitled
“Pragmatism, laziness”.
The goal is to re-implement a delightfully simple recursive regular
expression matcher, written by Rob Pike
(Go, UTF-8, Sam, etc.)
for the book The Practice of Programming.
Brian Kernighan (Unix, AWK, etc.),
despite his decades of
work in computer science, and his considerable contributions to the field, said
of it:
I was amazed by how compact and elegant this code was when Rob Pike first
wrote it – it was much smaller and more powerful than I had thought possible
There’s little reason to doubt Kernighan’s humility: over-thinking is a
real problem in computer science, and the “best” solutions often are
deceptively simple. The same goes in other domains I guess: think how bothersome
it was not to have a zero in math, or how simple relativity’s
core principles are.
Parenthesis aside, the regular expression matcher should support:
* : the character preceding the star is expected to be matches zero
or more times; e.g. a* matches "", "aaaa" or "a";
. : matches any character; e.g. foo.*bar matches "foobar", or "foo!bar";
^ and $ : which are markers corresponding to beginning/end of line, e.g.
^foo or bar$ both match "foobar" but not "barfoo".
Here are a few tests on which you can try your implementation.
I would recommend to start validating simple cases, and progressively
add features. Having tests will help make sure that while
adding features, you're not breaking anything that used to work.
Bear in mind that it took Pike, an experienced professional programmer, a
hour or two to develop a solution, which amounts to about 30 lines of C.
The main idea behind Pike's design is to use some sort of
state machine,
where each state is encoded as a recursive function. One switch from
one state to another by calling another function, and we decide
to switch from one state to another depending on the current
character / position in the regexp.
"Backtracking" is naturally handled via recursivity.
OK so this is the last tip before giving you the original C code: here
are the functions/states (xs is the string to test and
re is the regexp to match):
matchstar y xs re;
matchhere xs re;
match xs re (entry point);
Perhaps it can be useful to make a little graph, draw the states,
and see what kind of conditions would require to switch from one
state to another, and thus, what kind of arrows would connect
the states.
/* match: search for regexp anywhere in text */intmatch(char*regexp, char*text)
{
if (regexp[0] =='^')
returnmatchhere(regexp+1, text);
do { /* must look even if string is empty */if (matchhere(regexp, text))
return1;
} while (*text++!='\0');
return0;
}
/* matchhere: search for regexp at beginning of text */intmatchhere(char*regexp, char*text)
{
if (regexp[0] =='\0')
return1;
if (regexp[1] =='*')
returnmatchstar(regexp[0], regexp+2, text);
if (regexp[0] =='$'&& regexp[1] =='\0')
return*text =='\0';
if (*text!='\0'&& (regexp[0]=='.'|| regexp[0]==*text))
returnmatchhere(regexp+1, text+1);
return0;
}
/* matchstar: search for c*regexp at beginning of text */intmatchstar(int c, char*regexp, char*text)
{
do { /* a * matches zero or more instances */if (matchhere(regexp, text))
return1;
} while (*text !='\0'&& (*text++== c || c =='.'));
return0;
}
To be fair, this is a re-write from Pike's code: my
original attempt (from memory) had a few bugs, and overall
could only be improved by making it closer to the original
code.
I've been using eat1 here, which is likely a bit less
efficient that relying on indices. The code though is a bit
clearer without having two additional parameters to each
function I guess.
On the other hand, using indices could help to build on this
matcher, for instance to grab all matches, and/or implement
a substitution mechanism.
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
with (import./string/strings.nix);
let matchstar = y: xs: re: let x = charAt xs 0; inif x ==""thenfalseelseif matchhere xs re thentrueelseif x == y || y =="."then matchstar y (eat1 xs) re
elsefalse;
matchhere = xs: re:
let x = charAt xs 0;
r = charAt re 0;
s = charAt re 1;
inif r ==""thentrueelseif s =="*"then matchstar r xs (eat1 (eat1 re))
elseif r =="$"&& s ==""then x ==""elseif x !=""&& (r =="."|| r == x) then matchhere (eat1 xs) (eat1 re)
elsefalse ;
match1 = xs: re:
let# Note that charAt returns "" when there's not enough bytes x = charAt xs 0;
r = charAt re 0;
inif x ==""&& r !=""thenfalseelseif matchhere xs re thentrueelse match1 (eat1 xs) re
;
match = xs: re: let r = charAt re 0; inif r =="^"then matchhere xs (eat1 re)
else match1 xs re;
xmatch = s: re: if match s re then"true"else"false";
in trace(''match("", ""): ${xmatch """"}'')
trace(''match("foobar", ""): ${xmatch "foobar"""}'')
trace(''match("foobar", "foo"): ${xmatch "foobar""foo"}'')
trace(''match("foobar", "bar"): ${xmatch "foobar""bar"}'')
trace(''match("foobar", "baz"): ${xmatch "foobar""baz"}'')
trace(''match("barbaz", "baz"): ${xmatch "barbaz""baz"}'')
trace(''match("barbaz", ".ar"): ${xmatch "barbaz"".ar"}'')
trace(''match("barbaz", "ar."): ${xmatch "barbaz""ar."}'')
trace(''match("aaa", "b*"): ${xmatch "aaa""b*"}'')
trace(''match("aaa", "b.*"): ${xmatch "aaa""b.*"}'')
trace(''match("foobar", "b.*"): ${xmatch "foobar""b.*"}'')
trace(''match("foobar", "o.*a"): ${xmatch "foobar""o.*a"}'')
trace(''match("foobar", "^foo"): ${xmatch "foobar""^foo"}'')
trace(''match("foobar", "^bar"): ${xmatch "foobar""^bar"}'')
trace(''match("foobar", "oob$"): ${xmatch "foobar""oob$"}'')
trace(''match("foobar", "bar$"): ${xmatch "foobar""bar$"}'')
trace(''match("foobar", "o.*z*"): ${xmatch "foobar""o.*z*"}'')
trace(''match("foobar", "o.*z.*"): ${xmatch "foobar""o.*z.*"}'')
trace(''match("fo^bar", "o^b"): ${xmatch "fo^bar""o^b"}'')
trace(''match("foobar", "o^b"): ${xmatch "foobar""o^b"}'')
trace(''match("foobar", "o$b"): ${xmatch "foobar""o$b"}'')
"ok"
A common exercise in Lisp education is to
write a small object system using closures,
where “object” is to be understood
as in your typical mainstream programming language, wrapping
variables and methods on such variables (not in the
Smalltalk sense of message passing). This is
all rather similar to what we’ve done
earlier.
For instance in the SICP §3.1, a basic “object” is
defined as a closure whose argument corresponding to a method to
be called on this “object”. Upon calling some methods, they may
alter the underlying object: Scheme/Lisp are impure languages,
meaning, there’s a notion of altering a variable. While we can’t
do that with Nix, you should now have some workarounds in mind.
Note: While the SICP sticks to a very simple example,
it’s actually possible to build very refined object systems in
Scheme/Lisp by leveraging their macro systems, hence allowing
for instance to express inheritance, class/instance variables, etc.
This is again a feature we don’t have access to with Nix, at
least directly.
Exercise: Try to implement such a small “mover”
object. It should have two instance variables: a string and a
pointer to that string (an integer), and the following methods:
prev: move one byte backward;
next: move one byte forward;
get: retrieve current byte;
setp q: set the pointer to the (integer value) q;
tail: returns the string located after the current
position of the pointer.
Finally, write a chain o xs method allowing to iteratively
call the methods described in the list xs, using an object
o as the first target.
The goal of this exercise is mainly to prepare you for the next
article's exercises. While we won't be using this piece of code,
we'll often rely on a similar technique.
Because Nix is an impure language, we cannot alter the "state
variables", directly. The key idea here then is that, as we've already
hint upon earlier in this article a few times, every "method"
must return a new object, corresponding to the altered
object, in addition to its optional, other, return values.
This require you to define some ad-hoc, arbitrary conventions, to
encode such peculiar output. You also have some freedom of choice
regarding chain's input. More
generally, I could have constrained the exercise a bit more, but
I think it's good that you have room to experiment.
#!/usr/bin/env -S nix-instantiate --evalwith builtins;
let apply = f: l: foldl' (x: y: x y) f l;
charAt = s: n: substring n 1 s;
sAfter = s: n: substring n ((stringLength s) - n) s;
fst = xs: elemAt xs 0;
snd = xs: elemAt xs 1;
isList = xs: typeOf xs =="list";
wrap = x: if!isList x then [x] else x;
mkStep = o: r: { o = o; r = r; };
mkWith = f: s: p: mkStep (mkMover s p) (f s p);
mkMover = s: p: c:
if c =="prev"then mkWith charAt s (p - 1)
elseif c =="next"then mkWith charAt s (p +1)
elseif c =="get"then mkWith charAt s p
elseif c =="setp"then mkWith charAt s
elseif c =="after"then mkWith sAfter s p
elsethrow"unknown command ${c}";
# Code could be simpler were we not trying to# keep track of intermediate results. chain = o: xs: let go = rs: n: o:
if n == length xs then mkStep o rs
elselet x = elemAt xs n;
ret = apply (o (fst x)) (snd x);
in go (rs ++ [ret.r]) (n +1) ret.o
; in go [] 0 o;
# Convenience fmtMoves = map (x: let cmd =if isList x then fst x else x;
args =if! (isList x) || length x ==1then [] else wrap (snd x);
in [cmd args]);
A = chain (mkMover "hello world"0) (fmtMoves [
"get""next""next""next""next""prev" ["setp"6] "after" ]);
in trace(deepSeq A.r A.r)
"ok"
trace: ["h""e""l""l""o""l""w""world"]"ok"
Conclusion
We’re now ready to start writing more interesting pieces of code. In the
following article,
we’re going to study a few simple parsers/interpreters (Brainf*ck and two mathematical
expressions interpreters), where we’ll use techniques that we’ll just developed in
the current article.
We’ll then conclude this series with one
last article, focusing on
writing a λ-calculus interpreter, where we’ll apply what we’re about to learn
about parsing, and what we’ve already learnt about closures.
Comments
By email, at mathieu.bivert chez: