2010-09-14

Parametric Modular Programming

Generic programming as a concept popularized by Stepanov's STL is concerned with the uniform expression of algorithms decoupled from the specific data structures they operate on. By parametric modular programming I mean the uniform expression of a particular algorithm decoupling it from the specific shape of or rules obeyed by the objects it manipulates. It resolves the same driving forces behind the GoF Strategy pattern but statically, at the time of compilation. As an example, it allows me to do the following:

# T.enumerate_to T.is_connected 10 ;;
- : int list = [1; 1; 1; 3; 4; 12; 24; 66; 159; 444]

(cf A070765)

# P.enumerate_to P.is_connected 10 ;;
- : int list = [1; 1; 2; 5; 12; 35; 107; 363; 1248; 4460]

(cf A000104)

# H.enumerate_to H.is_connected 10 ;;
- : int list = [1; 1; 3; 7; 22; 81; 331; 1435; 6505; 30086]

(cf A018190) with the same implementation by just varying the underlying logic. I estimate that I spent a quarter of the total effort to extend my old code to enumerate polyominos actually refactoring and generalizing the algorithm, with most of the effort trying to come up with a good topology for the polyiamonds. I've used the technique in the past and the only mildly nontrivial lesson I take away from it is that coming up with and enforcing good logical invariants for the parameters is more crucial than having a uniform interface for them. This is where almost all the creativity goes into.

For this program, I started with the same foundation I used the last time. I chose a functional, point-free style supported by the use of functional sets, for which I need a number of utility functions:

let (%) f g x = f (g x)
let id x = x
let const k _ = k

let rec nest n f x =
  if n == 0 then [ ] else
  if n == 1 then [x] else
  x :: nest (pred n) f (f x)

and extensions to the standard Set functor:

module FSet (O : Set.OrderedType) = struct
  include Set.Make(O)
  let collect f s = fold (union % f) s empty
  let map     f s = fold (add   % f) s empty
  let of_list l   = List.fold_right add l empty
end

collect is akin to monadic bind, except for the fact that sets do not form monads (but note that indeed map f ≡ collect (singleton % f)). By definition, ominos are connected finite subsets of the Platonic tilings of the plane obeying certain further restrictions. As such, they underlie a lattice on ℝ². By a suitable change of coordinates this lattice can be taken as ℤ² or a subset thereof. Hence I represent cells by their integer coordinates:

module Cell = struct
  type cell = int * int

Cell sets are lexicographically ordered by coordinate:

  module Set = FSet(struct
    type t = cell
    let compare (i, j) (i', j') =
      let c = compare j j' in
      if c <> 0 then c else
      compare i i'
    end)

I need a small number of generic operations on cells or on their sets:

  let origin = (0, 0)
  let of_points = Set.of_list
  let to_points = Set.elements

  let shift (i, j) (di, dj) = (i + di, j + dj)

  let extents cells =
    Set.fold (fun (i, j) (mi, mj, xi, xj) ->
      (min mi i, min mj j, max xi i, max xj j)
    ) cells (max_int, max_int, min_int, min_int)

The simplest transform that "normalizes" a set of cells into a coordinate-independent set is a translation that takes its axis-aligned bounding box to the origin:

  let normalize cells =
    let (mi, mj, _, _) = extents cells in
    Set.map (shift (-mi, -mj)) cells

This will not work if the underlying lattice has more structure than ℤ², but it's a minimal initial default. Finding a connected component of a set of cells requires an adjacency structure, here given by the function expand:

  let find_component expand cells =
    let neighbors = Set.inter cells % expand in
    let rec visit pending visited =
      if Set.is_empty pending then visited else
      let cell      = Set.choose pending in
      let pending   = Set.remove cell pending in
      let unvisited = Set.diff (neighbors cell) visited in
      visit (Set.union unvisited pending) (Set.add cell visited)
    in visit (Set.singleton (Set.choose cells)) Set.empty
end

This is a standard graph search but made non-deterministic by the choice of a set to represent the visit schedule. This ends the module. Now the structure of the omino is represented by a module signature giving both the geometric and the topological structure of the plane graph:

module type TOPO = sig
  val origin     : Cell.cell
  val neighbors  : Cell.cell  -> Cell.Set.t
  (* The exterior of a cell must include its neighborhood *)
  val exterior   : Cell.cell  -> Cell.Set.t
  (* Transform a set of cells in general position to a canonical position *)
  val normalize  : Cell.Set.t -> Cell.Set.t
  (* Plane symmetries, that is, Dn *)
  val symmetries : (Cell.cell -> Cell.cell) list
end

The comments hint at the algebraic properties that the implementations must obey. neighbors gives the connectivity of the omino, while exterior is required to conform to it by returning a superset of the former. normalize choses a representative from the class of ominos translated around the lattice. symmetries gives the plane symmetries of the underlying shape. Every implementation must make sure that these symmetries are dihedral Dn. The simplest way to do this is to show that they are finitely represented by two generators rot and flip, that is, that rotnflip² ≡ rot % flip % rot % flipid. With this, the generation of the actual ominos is straightforward and concise. An omino is, of course, a set of canonical tiles obeying the given topology:

module Omino (T : TOPO) = struct
  type omino = Cell.Set.t

Omino sets are trivial:

  module Set = FSet(struct
    type t = omino
    let compare = Cell.Set.compare
  end)

The symmetries of an omino are given by the symmetries of the underlying shape, modulo translations. Given that ominos are ordered as sets, their sets are canonical (since OCaml's Sets are ordered) and choose-ing one of them as representative of the equivalence class is arbitrary but deterministic:

  let symmetries cells =
    Set.of_list (List.map (fun f -> T.normalize (Cell.Set.map f cells)) T.symmetries)

  let canon = Set.choose % symmetries

This is the crucial difference between ominos and cell sets: the former have more structure as given by the ADT:

  let of_points = canon % Cell.of_points

In other words, canon has type Cell.Set.t → Omino.omino. To extend an omino we consider all the exterior neighbors of each of their cells that make the resulting omino obey a given predicate:

  let extensions (p : omino -> bool) omino =
    let candidates cell     = Cell.Set.diff (T.neighbors cell) omino
    and extend     cell     = canon (Cell.Set.add cell omino) in
    let extend_if  cell set =
      let omino = extend cell in
      if p omino then Set.add omino set else set
    in
    let extend_at  cell set = Cell.Set.fold extend_if (candidates cell) set in
    Cell.Set.fold extend_at omino Set.empty

  let extend_set p = Set.collect (extensions p)

The code is to be read as English, except perhaps for extend_at which accumulates into set the valid extensions of omino around cell.

I'm particularly interested in simple ominos. By definition, an omino is simple or of genus 0 iff its exterior is connected:

  let is_connected omino =
    let exterior = Cell.Set.diff (Cell.Set.collect T.exterior omino) omino in
    Cell.Set.equal exterior (Cell.find_component T.neighbors exterior)

Generating and enumerating the ominos is trivial:

  let monomino         = of_points [T.origin]
  let ominos_to    p n = nest n (extend_set p) (Set.singleton monomino)
  let enumerate_to p   = List.map Set.cardinal % ominos_to p
end

This was the easy part, as the code I had was essentially correct and just required refactoring (as a side benefit, I think it has greatly gained in clarity. Compare with the old version). The next part is to provide the topological structures for each kind of omino. The polyominos fall out of the old implementation:

module PolyominoTopo = struct
  let origin     = Cell.origin
  let normalize  = Cell.normalize
  let neighbors  =
    let deltas = [1, 0; 0, 1; -1, 0; 0, -1]
    in fun cell -> Cell.Set.of_list (List.map (Cell.shift cell) deltas)
  let exterior   =
    let deltas = [1, 0; 1, 1; 0, 1; -1, 1; -1, 0; -1, -1; 0, -1; 1, -1]
    in fun cell -> Cell.Set.of_list (List.map (Cell.shift cell) deltas)

  (* D4: rot^4 = flip^2 = rot.flip.rot.flip = id *)
  let rot  (i, j) = (-j,  i)
  let flip (i, j) = ( i, -j)
  let symmetries = [
    id; rot; rot % rot; rot % rot % rot;
    flip; rot % flip; rot % rot % flip; rot % rot % rot % flip
  ]
end

Polyominos live exactly in ℤ², so I can use the default behavior in Cell. The neighbors of a cell form a Von Neumann neighborhood, and its exterior forms a Moore neighborhood. It's trivial to check that rot and flip generate D₄. Interestingly enough, under a suitable change of coordinates (given by the basis {(√3, 1)/2, (0, 1)}) hexominos also live in ℤ², so that each cell is oriented such that the Y axis is a double apothegm:

module HexominoTopo = struct
  let origin     = Cell.origin
  let normalize  = Cell.normalize
  let neighbors  =
    let deltas = [1, 0; 0, 1; -1, 1; -1, 0; 0, -1; 1, -1]
    in fun cell -> Cell.Set.of_list (List.map (Cell.shift cell) deltas)
  let exterior   = neighbors

  (* D6: rot^6 = flip^2 = rot.flip.rot.flip = id *)
  let rot  (i, j) = (-j, i + j)
  let flip (i, j) = ( j, i    )
  let symmetries = [
    id; rot; rot % rot; rot % rot % rot;
      rot % rot % rot % rot; rot % rot % rot % rot % rot;
    flip; rot % flip; rot % rot % flip; rot % rot % rot % flip;
      rot % rot % rot % rot % flip; rot % rot % rot % rot % rot % flip;
  ]
end

Each hexagonal cell is much more connected than square or triangular cells, to the point that the neighbors of a cell are exactly its exterior. The basis rotation took me some time to discover, but again, checking that rot and flip generate D₆ is very easy. Now for something completely different.

Polyiamonds are in a sense dual to hexominos, in more than one way: either each hexagon corresponds to one triangle or to six. In the first case alternating vertices form one half of the tiling, the other half corresponding to opposing vertices where three hexagons meet. Conway calls them deep and shallow vertices in connection with spherical lattices. Since lattice points correspond to shape centers, this requires a direct sum of the hexagonal lattice with itself shifted by ⅓. Rescaling by 3, we have the subset of ℤ² where coordinates ij ≡ 0 or 1 (mod 3); this does not form a lattice (as 2×(3i + 1, 3j + 1) doesn't belong to it). Crucially, the underlying symmetry group is D₃ which keeps deep and shallow triangles (i.e., those pointing left or right) separated, missing out on mirror symmetries. The alternative is to subdivide each hexagon into six triangles and rescale by 3. The resulting subset of ℤ² has coordinates ij ≡ 1 or 2 (mod 3). Again, this is not a lattice but a direct sum of two lattices, but the underlying symmetry is D₆ with transforms of odd order switching between one copy and the other.

The upshot of this disquisition is that, since the triangular "lattice" is formed of two copies or halves, the topology must be aware of the half each shape lives in. Since machine arithmetic is not modular arithmetic, I need a helper function:

module TriominoTopo = struct
  let rem x y =
    let r = x - y * (x / y) in
    if r >= 0 then r else if y >= 0 then r + y else r - y

This is the Euclidean remainder of x and y which is always in [0, |y|). With this I can tell copies apart:

  let is_deep (i, j) =
    let c = rem i 3 in
    assert (c = rem j 3);
    match c with
    | 1 -> true
    | 2 -> false
    | _ -> assert false

The predicate is a good place to make extra careful that I don't make any errors in the following. The origin cannot be (0, 0) since it doesn't belong to either copy. Thus, the monomino must be one of (1, 1) and its translates or (-1, -1) and its translates. I choose arbitrarily the former:

  let origin = (1, 1)

Since -1 ≡ 2 (mod 3) and vice-versa, the deltas for shallow points are the negatives of those for deep points:

  let neighbors =
    let deltas = [1, 1; -2, 1; 1, -2]
    in fun (i, j as c) -> if is_deep c
      then Cell.Set.of_list (List.map (fun (di, dj) -> (i + di, j + dj)) deltas)
      else Cell.Set.of_list (List.map (fun (di, dj) -> (i - di, j - dj)) deltas)

  let exterior =
    let deltas = [
      -2, 1; -3, 0; -2, -2; 0, -3; 1, -2; 3, -3;
      4, -2; 3, 0; 1, 1; 0, 3; -2, 4; -3, 3
    ] in fun (i, j as c) -> if is_deep c
      then Cell.Set.of_list (List.map (fun (di, dj) -> (i + di, j + dj)) deltas)
      else Cell.Set.of_list (List.map (fun (di, dj) -> (i - di, j - dj)) deltas)

Since the axis-aligned bounding rectangle of a set of centers doesn't necessarily have opposite vertices in either lattice (minimal example: {(1, 1), (-1, 2)}), normalizing a set of cells is a bit delicate. Given that cells are totally ordered, their sets have a unique minimum element which can be taken as the origin, either (1, 1) or (-1, -1) depending on whether it is deep or shallow:

  let normalize cells =
    let (di, dj as c) = Cell.Set.min_elt cells in
    if is_deep c
      then Cell.Set.map (Cell.shift (-di + 1, -dj + 1)) cells
      else Cell.Set.map (Cell.shift (-di - 1, -dj - 1)) cells

The symmetries can be borrowed from the underlying hexagonal lattice:

  let symmetries = HexominoTopo.symmetries
end

It only remains to instantiate the ominos with their corresponding topologies:

module P = Omino(PolyominoTopo)
module H = Omino(HexominoTopo)
module T = Omino(TriominoTopo)

As you can see, the intelligence is neatly segregated to different modules. The price to pay is that knowledge of the data structures (in this case, sets of cells) is shared among them, and changing representations must affect the entire program. Also, and perhaps more problematic is that the enumeration doesn't take into account any special characteristics of the underlying topology except its connectivity structure. Knowing that there are many neighbors of an hexagon requires a refined pruning strategy for searching hexominos; this code doesn't know about that and it's possible that it cannot know, in principle. All in all I'm very satisfied with the result.

2010-09-01

Rewriting the Rules

The last article on pa_do sparked an interesting conversation with its author, Christophe Troestler, and reader bluestorm. It's true that pa_infix can establish optional rewrite rules for operators as a side-effect (or as an "added bonus" in the creator's words) of it installing new symbols with given arity, precedence and associativity. Given that rewrite rules are purely syntactical artifacts, there is no function associated with an operator unless explicitely defined. Of course, once defined they can be re-defined, and it is responsability of authors and users to keep both semantics (reduction and rewrite) in sync.

On the other hand, rewrite rules open the door to interesting optimizations before and beyond what the compiler sees and can perform. In the case of an extension for functional application and composition, the optimization is obvious: the identity function can be stripped when encountered in an applicative expression. Here is the syntax extension pa_compose.ml with these built-in optimizations (Edit: I've expanded a bit the check for the identity to catch references to the qualified module):

open Camlp4.PreCast
open Pa_do
open Pa_infix
module L = Level

let is_id = function
| <:ident< $uid:"Compose"$.$lid:"id"$ >>
| <:ident<                 $lid:"id"$ >> -> true
| _                                      -> false

let () =
  let expr x f _loc = match f with
  | <:expr< $id:f$ >> when is_id f -> x
  | f -> <:expr< $f$ $x$ >>
  in infix "|>" ~expr (L.binary (L.Higher L.assignment) ~assoc:L.LeftA)

let () =
  let expr f x _loc = match f with
  | <:expr< $id:f$ >> when is_id f -> x
  | f -> <:expr< $f$ $x$ >>
  in infix "%%" ~expr (L.binary (L.Higher L.assignment) ~assoc:L.RightA)

let () =
  let expr f g _loc = match f, g with
  | <:expr< $id:f$ >>, g when is_id f -> g
  | f, <:expr< $id:g$ >> when is_id g -> f
  | f, g ->
    let x = Delimited_overloading.new_lid () in
    <:expr< fun $lid:x$ -> $f$ ($g$ $lid:x$) >>
  in infix "%" ~expr (L.binary (L.Higher L.disjunction) ~assoc:L.RightA)

The great thing about CamlP4 is that it extends the syntax of OCaml to allow for writing matchings on the AST directly. The rules match explicitly on the lowercase identifier "id" and optimize away the application or the composition. In order to give executable semantics to the rewritten syntax, it is necessary to provide an implementation for the operators. Here is compose.ml:

external id : 'a -> 'a = "%identity"
let ( |> ) x f   = f x
let ( %% ) f x   = f x
let ( %  ) f g x = f (g x)

and here is its interface, compose.mli:

external id : 'a -> 'a = "%identity"
val ( |> )  : 'a -> ('a -> 'b) -> 'b
val ( %% )  : ('a -> 'b) -> 'a -> 'b
val ( %  )  : ('b -> 'c) -> ('a -> 'b) -> ('a -> 'c)

Both are utterly trivial, but necessary in order that the following:

let test9 () =
  [succ; succ; pred; id; succ] |> List.fold_left (%) id

works as intended. The following program, use.ml is a non-exhaustive test of the syntax extension:

(* Test of composition operators *)
open Compose

let test1 () =
  [1; 2; 3] |> List.map succ

let test2 () =
  List.map succ %% [1; 2; 3]

let test3 () =
  let x = ref 7 in
  x := !x |> succ

let test4 () =
  let x = ref 7 in
  x := succ %% !x

let test5 () =
  [1; 2; 3] |> List.map succ |> List.fold_left (+) 0 |> succ

let test6 () =
  succ %% List.fold_left (+) 0 %% List.map succ %% [1; 2; 3]

let test7 () =
  succ % List.fold_left (+) 0 % List.map succ %% [1; 2; 3]

let test8 () =
  [1; 2; 3] |> succ % List.fold_left (+) 0 % List.map succ

let test9 () =
  [succ; succ; pred; id; succ] |> List.fold_left (%) id

let test10 () =
  succ % Compose.id %% 10

let test11 () =
  id % succ %% 10

let test12 () =
  id % succ % id % pred %% 10

let test13 () =
  10 |> id

let test14 () =
  id %% 10

let test15 () =
  id % Compose.id % id %% 10

let () = Printf.printf "%d (should be 9)\n" (test9 () 7)

A quick-and-dirty Makefile to build the extension, its library, and the test program (provided you have findlib and pa_do installed, of course):

OCAMLC=    ocamlfind c   -w A
OCAMLOPT=  ocamlfind opt -w A -inline 10
PA_DO=     -package pa_do -syntax camlp4o

EXE=       .exe
PROG=      use$(EXE) use.opt$(EXE)
PACKAGE=   META compose.cmi compose.cmo compose.cmx pa_compose.cmo

ALL: $(PROG)

use$(EXE): compose.cmo use.cmo
    $(OCAMLC) -o $@ $^
use.opt$(EXE): compose.cmx use.cmx
    $(OCAMLOPT) -o $@ $^

pa_compose.cmo: pa_compose.ml
    $(OCAMLC) -c $(PA_DO) -ppopt q_MLast.cmo $<

pa_compose.cmx: pa_compose.ml
    $(OCAMLOPT) -c $(PA_DO) -ppopt q_MLast.cmo $<

use.cmo: use.ml
    $(OCAMLC) -c $(PA_DO) -ppopt pa_infix.cmo -ppopt pa_compose.cmo $<

use.cmx: use.ml
    $(OCAMLOPT) -c $(PA_DO) -ppopt pa_infix.cmo -ppopt pa_compose.cmo $<

use.pre.ml: pa_compose.cmo use.ml
    camlp4 -o $@ -parser o -parser op -printer o -I `ocamlfind -query pa_do` \
        pa_do.cmo pa_infix.cmo pa_compose.cmo use.ml

# Dependencies

compose.cmo: compose.cmi
compose.cmx: compose.cmi
pa_compose.cmo:
pa_compose.cmx:
use.cmo: pa_compose.cmo compose.cmi
use.cmx: pa_compose.cmo compose.cmi

# Phony targets

clean:
    rm -f *.cm* *.o *.obj *~

distclean: clean
    rm -f use.pre.ml $(PROG)

install: $(PACKAGE)
    ocamlfind install pa_compose $(PACKAGE)

uninstall:
    ocamlfind remove pa_compose

# Rules

.SUFFIXES: .cmo .cmi .cmx .ml .mli

.ml.cmo:
    $(OCAMLC) -c $<

.mli.cmi:
    $(OCAMLC) -c $<

.ml.cmx:
    $(OCAMLOPT) -c $<

(replace the spaces by tabs!) And a META file to package it as a ocamlfind extension:

name = "pa_compose"
version = "1.0"
description = "Rewrite rules for functional composition"
requires = "camlp4 pa_do pa_do.infix"
archive(syntax) = "@pa_do/pa_infix.cmo pa_compose.cmo"
archive(byte,toploop) = "@pa_do/pa_infix.cmo compose.cmo pa_compose.cmo"
archive(byte) = "compose.cmo"
archive(native) = "compose.cmx"

If you build the pre-processed example with make use.pre.ml, you'll see how the ids get rewritten away. Once built and installed, it can be directly used from the top-level:

$ ocaml
        Objective Caml version 3.12.0

Findlib has been successfully loaded. Additional directives:
  #require "package";;      to load a package
  #list;;                   to list the available packages
  #camlp4o;;                to load camlp4 (standard syntax)
  #camlp4r;;                to load camlp4 (revised syntax)
  #predicates "p,q,...";;   to set these predicates
  Topfind.reset();;         to force that packages will be reloaded
  #thread;;                 to enable threads

/usr/local/lib/ocaml/str.cma: loaded
/usr/local/lib/ocaml/dynlink.cma: loaded
# #require "pa_compose";;
C:/cygwin/usr/local/lib/ocaml/camlp4: added to search path
C:/cygwin/usr/local/lib/ocaml/site-lib/pa_do: added to search path
/usr/local/lib/ocaml/camlp4/camlp4o.cma: loaded
C:/cygwin/usr/local/lib/ocaml/site-lib/pa_do/pa_do.cmo: loaded
C:/cygwin/usr/local/lib/ocaml/site-lib/pa_do/pa_do_top.cma: loaded
C:/cygwin/usr/local/lib/ocaml/site-lib/pa_compose: added to search path
C:/cygwin/usr/local/lib/ocaml/site-lib/pa_do/pa_infix.cmo: loaded
C:/cygwin/usr/local/lib/ocaml/site-lib/pa_compose/compose.cmo: loaded
C:/cygwin/usr/local/lib/ocaml/site-lib/pa_compose/pa_compose.cmo: loaded
        Camlp4 Parsing version 3.12.0


# 3 |> succ |> succ |> id |> pred |> id |> succ ;;
- : int = 5

Of course, the functions themselves exist:

# open Compose ;;
# id ;;
- : 'a -> 'a = <fun>
# ( % ) ;;
- : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b = <fun>
# ( |> ) ;;
- : 'a -> ('a -> 'b) -> 'b = <fun>

Happy rewriting!