2009-12-24

Functional Macramé

What is the meaning of an expression like

let rec e = f … e …

known Haskell is as "knot-tying"? As Lloyd Allison explains:

A circular program creates a data structure whose computation depends upon itself or refers to itself. The technique is used to implement the classic data structures circular and doubly-linked lists, threaded trees and queues, in a functional programming language. These structures are normally thought to require updatable variables found in imperative languages…

But first, a bit of motivation. OCaml allows for the construction of cyclic values going through a data constructor. For instance, the following is legal:

let rec ones = 1 :: ones in ones

as the expression is a cons cell. This can be extended naturally to mutually recursive values. For instance, a grammar can be built with:

type 'a symbol = Terminal of 'a | Nonterminal of 'a symbol list list

let arith_grammar =
  let rec s  = Nonterminal [[add]]
  and add    = Nonterminal [[mul]; [add; Terminal "+"; mul]]
  and mul    = Nonterminal [[term]; [mul; Terminal "*"; term]]
  and term   = Nonterminal [[number]; [Terminal "("; s; Terminal ")"]]
  and number = Nonterminal [[digit]; [digit; number]]
  and digit  = Nonterminal (List.map (fun d -> [Terminal (string_of_int d)]) (range 0 9))
  in s

This is a perfectly valid recursive value in OCaml and it is a direct translation from the code in this article. The problem is that an implementation of the Omega monad for breadth-first enumeration of a list of lists requires laziness in an essential way. I'll use a stub (mock?) sequence type:

module Seq = struct
  type 'a cons = Nil | Cons of 'a * 'a t
   and 'a t  = 'a cons Lazy.t
  let rec map f q = lazy (match Lazy.force q with
  | Nil -> Nil
  | Cons (x, q) -> Cons (f x, map f q))
  let rec of_list l = lazy (match l with
  | [] -> Nil
  | x :: xs -> Cons (x, of_list xs))
end

Unfortunately this doesn't work:

type 'a symbol = Terminal of 'a | Nonterminal of 'a symbol Seq.t Seq.t

let rule ss = Nonterminal (Seq.map Seq.of_list (Seq.of_list ss))

let arith_grammar =
  let rec s  = rule [[add]]
  and add    = rule [[mul]; [add; Terminal "+"; mul]]
  and mul    = rule [[term]; [mul; Terminal "*"; term]]
  and term   = rule [[number]; [Terminal "("; s; Terminal ")"]]
  and number = rule [[digit]; [digit; number]]
  and digit  = rule (List.map (fun d -> [Terminal (string_of_int d)]) (range 0 9))
  in s

The dreaded "This kind of expression is not allowed as right-hand side of `let rec'" error raises its ugly head: rule is a function, not a constructor, and OCaml rightly complains that it cannot lazily evaluate a bunch of strict definitions involving computation. In a lazy language, in contrast, the right-hand-side expression is not evaluated until it is needed. So, again, what is the meaning of an expression like

let rec e = f … e …

When the value of e is required, the function f is called with an unevaluated e. If it doesn't use it, the result is well-defined; if it does, this results in a recursive call to f. In a strict language we must be explicit in the delaying and forcing of thunks:

let f' … e … =
  …
  if e_is_needed then … Lazy.force e …
  …
in
let rec e = lazy (f' … e …)

Alas, if f' itself is lazy, the expected code won't work:

let f' … e … = lazy (
  …
  if e_is_needed then … Lazy.force e …
  …
)
in
let rec e = f' … e …

because lazy works syntactically as a constructor in OCaml, again we're told that "This kind of expression is not allowed as right-hand side of `let rec'". This means that we cannot use knot-tying with lazy abstract data types like infinite lists and streams without going explicitly through lazy.

Stepping back and taking a little distance from the problem at hand, let's revisit Allison example of circular lists. He gives essentially this example:

let circ p f g x =
  let rec c = build x
  and build y = f y :: if p y then c else build (g y)
  in c

(which is equivalent to an unfold followed by a knot-tying). This unsurprisingly doesn't work, but using lazy as outlined above does:

let circ p f g x =
  let rec c = lazy (build x)
  and build y = Seq.Cons (f y, if p y then c else lazy (build (g y)))
  in c

and the knot is tied by actually making a reference to the value as desired:

let x = circ (fun _ -> true) id id 0 in let Seq.Cons (_, y) = Lazy.force x in x == y ;;
- : bool = true

A bit of lambda-lifting to make the binding and value recursion distinct and separate gives:

let circ p f g x =
  let rec build y c = Seq.Cons (f y, if p y then c else lazy (build (g y) c)) in
  let rec c = lazy (build x c) in c

This hints at what appears to be a limitation of strict languages, namely that circular computations seem to require explicit binding management in an essential way, either imperative like in this code or by using a method like Dan Piponi's Löb functor. Applying this technique to our grammar makes for tedious work: all the mutually recursive references must be lambda-lifted, and the knot tied simultaneously through lazy:

let rule ss = Lazy.force (Seq.map Seq.of_list (Seq.of_list ss))

let arith_grammar =
  let make_expr   exp add mul trm num dig = rule [[add]]
  and make_add    exp add mul trm num dig = rule [[mul]; [add; Terminal "+"; mul]]
  and make_mul    exp add mul trm num dig = rule [[trm]; [mul; Terminal "*"; trm]]
  and make_term   exp add mul trm num dig = rule [[num]; [Terminal "("; exp; Terminal ")"]]
  and make_number exp add mul trm num dig = rule [[dig]; [dig; num]]
  and make_digit  exp add mul trm num dig = rule (List.map (fun d -> [Terminal (string_of_int d)]) (range 0 9))
  in let
  rec exp = Nonterminal (lazy (make_expr   exp add mul trm num dig))
  and add = Nonterminal (lazy (make_add    exp add mul trm num dig))
  and mul = Nonterminal (lazy (make_mul    exp add mul trm num dig))
  and trm = Nonterminal (lazy (make_term   exp add mul trm num dig))
  and num = Nonterminal (lazy (make_number exp add mul trm num dig))
  and dig = Nonterminal (lazy (make_digit  exp add mul trm num dig)) in
  exp

This can become unworkable pretty quickly, but is a solution! Note that the type of sequences forces me to use an explicit evaluation discipline: rule must return an evaluated expression, but the evaluation itself is delayed inside the Nonterminal constructor.

Allison's paper ends with an alternative for strict imperative languages like Pascal: using an explicit reference for the circular structure, something like this:

let f' … e … =
  …
  if e_is_needed then … Ref.get e …
  …
in
let e_ref = Ref.make () in
let e = f' … e_ref … in
Ref.set e_ref e

where Ref.make has type unit -> 'a, that is, it is magic. Unfortunately, Xavier Leroy himself stated that Obj.magic is not part of the OCaml language :-) (although a quick look shows many a would-be apprentice at work). And in this case it is true that no amount of magic wold make this work in the general case since e_ref must refer to an otherwise dummy value of the appropriate type which gets overwritten with the final result, in effect reserving memory of the necessary size. In specific cases, however, this can be made to work with a bit of care.

Merry Christmas!

No comments: