2007-10-05

(Re)Visiting Expressions

Functional languages in the ML family (y compris Haskell) lend themselves naturally to interpreting the abstract syntax tree of a domain-specific language, since the former is quite naturally represented with an algebraic data type. You will see this phenomenon everywhere; to the point that I think that Greenspun's Tenth is not really an axiom but a corollary to the rather plain fact that destructuring an ADT amounts to interpreting it.

The analogous way to do this in an object-oriented language is to use the Interpreter and Visitor patterns in conjunction. I'll develop here a mini-language in Java, and I'll abstract out the different traversal strategies over both the expression and value trees by using Visitors, which will be generic on the type of the result generated.

Every language is contingent. Instead of inventing one I'll simply take Lennart Augustsson's domain-specific language of expressions:

E ::= n ∈ Z | E + E | EE | EE : E

that is, an expression can be an integer constant, an addition, a comparison or a conditional. The result of evaluating such an expression is a value, either integer or Boolean:

V ::= n ∈ Z | b ∈ B

Visitors are used to parameterize traversals by decoupling of the traversal logic from the structure of the tree. That is, instead of adding methods to each class in a composite hierarchy, we plug in different strategies to a single interface.

Note: in traditional Object-Oriented languages the Visitor pattern calls for the accumulation of intermediate results in an instance variable; here, we'll take advantage of parametric polymorphism in Java to apply stateless Visitors whenever possible.

In our little language, a value is an abstract type that accepts a value Visitor:

abstract class Val {
  public abstract <T> T accept(ValVisitor<T> visitor);

This Visitor will produce a result of a generic type, hence the method must be generic on the type parameterizing it. The possible concrete values are integer and Boolean values:

  public static final class Int extends Val {
    public final int value;
    private Int(int value) { this.value = value; }
    public <T> T accept(ValVisitor<T> visitor) {
      return visitor.visitInt(this);
    }
  }

  public static final class Bool extends Val {
    public final boolean value;
    private Bool(boolean value) { this.value = value; }
    public <T> T accept(ValVisitor<T> visitor) {
      return visitor.visitBool(this);
    }
  }

Both are inner to Val and neither can be extended or instantiated directly. They both accept the Visitor by dispatching to the appropriate method, and forcing it to recur into their structure.

These are all the possibilities for Val, so we seal it against extension; also, we have to provide the static factories to allow our code to create values:

  private Val() { }

  public static Val ofInt(int value) { return new Int(value); }
  public static Val ofBoolean(boolean value) { return new Bool(value); }
}

This finishes the encoding of values. Now, a value Visitor must in turn discriminate between value types:

interface ValVisitor<T> {
  T visitInt(Val.Int value);
  T visitBool(Val.Bool value);
}

As a concrete example, this is a simple value printer implemented as a Visitor:

final class ShowVal implements ValVisitor<Void> {
  public Void visitInt(Val.Int value) {
    System.out.printf("%d : int", value.value);
    return null;
  }
  public Void visitBool(Val.Bool value) {
    System.out.printf("%b : bool", value.value);
    return null;
  }
}

Since there isn't really a meaningful return value from the act of printing (other than the side-effect of showing the text to the output), we use the java.lang.Void type. We are forced to return null since Void has no instances; we could have created instead a singleton type Unit, paralleling what Haskell or ML do.

An expression in our DSL is, analogously to values, an abstract type that accepts an expression Visitor:

abstract class Exp {
  public abstract <T> T accept(ExpVisitor<T> visitor);

This Visitor is also generic on the return type. For each expression kind we have a concrete type. Constant expressions are simple:

  public static final class ConI extends Exp {
    public final int value;
    private ConI(int value) { this.value = value; }
    public <T> T accept(ExpVisitor<T> visitor) {
      return visitor.visitConI(this);
    }
  }

(Again, the classes corresponding to each type constructor are sealed.) Both additions and comparisons are binary operations on expressions:

  public static final class Add extends Exp {
    public final Exp left, right;
    private Add(Exp left, Exp right) { this.left = left; this.right = right; }
    public <T> T accept(ExpVisitor<T> visitor) {
      return visitor.visitAdd(this);
    }
  }

  public static final class Leq extends Exp {
    public final Exp left, right;
    private Leq(Exp left, Exp right) { this.left = left; this.right = right; }
    public <T> T accept(ExpVisitor<T> visitor) {
      return visitor.visitLeq(this);
    }
  }

A conditional has three sub-expressions: one for the expression being tested, one for the expression to be evaluated if the test is true, and another if it is false:

  public static final class Cond extends Exp {
    public final Exp test, ifTrue, ifFalse;
    private Cond(Exp test, Exp ifTrue, Exp ifFalse) {
      this.test = test;
      this.ifTrue = ifTrue;
      this.ifFalse = ifFalse;
    }
    public <T> T accept(ExpVisitor<T> visitor) {
      return visitor.visitCond(this);
    }
  }

The definitions mirror exactly the abstract syntax outlined before. Finally, we seal the abstract Exp class and provide static factories:

  public static Exp con(int value) { return new ConI(value); }
  public static Exp add(Exp left, Exp right) { return new Add(left, right); }
  public static Exp leq(Exp left, Exp right) { return new Leq(left, right); }
  public static Exp cond(Exp test, Exp ifTrue, Exp ifFalse) { return new Cond(test, ifTrue, ifFalse); }
}

We now need to define the type of the expression Visitor:

interface ExpVisitor<T> {
  T visitConI(Exp.ConI value);
  T visitAdd(Exp.Add value);
  T visitLeq(Exp.Leq value);
  T visitCond(Exp.Cond value);
}

Again, a simple application of the Visitor is an expression printer:

final class ShowExp implements ExpVisitor<Void> {
  public Void visitConI(Exp.ConI value) {
    System.out.printf("ConI(%d)", value.value);
    return null;
  }
  public Void visitAdd(Exp.Add value) {
    System.out.print("Add(");
    value.left.accept(this);
    System.out.print(",");
    value.right.accept(this);
    System.out.print(")");
    return null;
  }
  public Void visitLeq(Exp.Leq value) {
    System.out.print("Leq(");
    value.left.accept(this);
    System.out.print(",");
    value.right.accept(this);
    System.out.print(")");
    return null;
  }
  public Void visitCond(Exp.Cond value) {
    System.out.print("Cond(");
    value.test.accept(this);
    System.out.print(",");
    value.ifTrue.accept(this);
    System.out.print(",");
    value.ifFalse.accept(this);
    System.out.print(")");
    return null;
  }
}

It now remains to write the code that will evaluate expressions down to a value. I'll present the evaluation Visitor in parts. First, since it is obvious that it will return Vals, it will be parameterized by that interface:

final class EvalExp implements ExpVisitor<Val> {

Evaluating a constant is simply a matter of extracting the value of the expression and wrapping it into an integer value:

  public Val visitConI(Exp.ConI value) {
    return Val.ofInt(value.value);
  }

Evaluating compound expressions is complicated by the fact that not all combination of types are meaningful. For instance, to evaluate an addition we must check that the addends evaluate to integer values before adding them. We begin by evaluating both sides of the sum; as noted in the comment, this corresponds to call-by-value. We then apply a value Visitor to the evaluated left-hand, which will in turn apply another value Visitor to the right-hand, which will finally return the sum:

  public Val visitAdd(Exp.Add value) {
    // Call-by-value semantics
    final Val left = value.left.accept(this);
    final Val right = value.right.accept(this);
    return left.accept(new ValVisitor<Val>() {
      public Val visitInt(final Val.Int left) {
        return right.accept(new ValVisitor<Val>() {
          public Val visitInt(final Val.Int right) {
            return Val.ofInt(left.value + right.value);
          }

This works if both operands evaluate to integers. If, however, any value does not, it is a type error:

          public Val visitBool(Val.Bool _) {
            throw new IllegalArgumentException("Bad right argument to Add");
          }
        });
      }
      public Val visitBool(Val.Bool _) {
        throw new IllegalArgumentException("Bad left argument to Add");
      }
    });
  }

(The name of the arguments are unimportant). The evaluation of a comparison is exactly analogous to that for the sum, except that, in the end, the result of the comparison is a boolean value:

  public Val visitLeq(Exp.Leq value) {
    final Val left = value.left.accept(this);
    final Val right = value.right.accept(this);
    return left.accept(new ValVisitor<Val>() {
      public Val visitInt(final Val.Int left) {
        return right.accept(new ValVisitor<Val>() {
          public Val visitInt(final Val.Int right) {
            return Val.ofBoolean(left.value <= right.value);
          }
          public Val visitBool(Val.Bool _) {
            throw new IllegalArgumentException("Bad right argument to Leq");
          }
        });
      }
      public Val visitBool(Val.Bool _) {
        throw new IllegalArgumentException("Bad left argument to Leq");
      }
    });
  }

Again, we use call-by-value by evaluating first both sides and then chaining visitors on the resulting values. Evaluating a conditional expression is simpler, fortunately. In contrast to the previous calls, a condition must lazily evaluate the correct branch depending on the result of evaluating the test. After doing that, it must visit the value and type-check that it is Boolean:

  public Val visitCond(final Exp.Cond value) {
    final Val test = value.test.accept(this);
    return test.accept(new ValVisitor<Val>() {
      public Val visitInt(Val.Int _) {
        throw new IllegalArgumentException("Bad test argument to Cond");
      }
      public Val visitBool(Val.Bool test) {
        return test.value
          ? value.ifTrue.accept(EvalExp.this)
          : value.ifFalse.accept(EvalExp.this);
      }
    });
  }
}

Note: the evaluator has a type hole, in that a conditional can evaluate to different types depending on the condition. The interpreter which inspired this one had exactly the same type unsoundess. We will see how phantom types can help us to avoid these (and other) kinds of type errors.

As it is, we can use this code to evaluate some simple expressions. We will use a test class to do this from the main entry point. But first, we show a helper method to print the expression, evaluate it and print the result. Since the evaluation can fail for an ill-typed expression, the method handles the exception and informs the programmer:

public final class PhantomTest {
  private static void eval(Exp e) {
    try {
      e.accept(new ShowExp());
      System.out.print(" = ");
      e.accept(new EvalExp()).accept(new ShowVal());
      System.out.println();
    } catch (IllegalArgumentException ex) {
      System.out.println();
      System.out.println(ex.getMessage());
    }
  }

Now we create some expressions in the main entry point and evaluate them:

  public static void main(String[] args) {
    Exp e;
    e = Exp.cond(Exp.leq(Exp.con(3), Exp.con(5)), Exp.add(Exp.con(1), Exp.con(2)), Exp.con(0));
    eval(e);
    e = Exp.cond(Exp.con(1), Exp.con(1), Exp.con(1));
    eval(e);
  }
}

The first expression is 3 ≤ 5 → 1 + 2 : 0, which obviously evaluates to 3:

Cond(Leq(ConI(3),ConI(5)),Add(ConI(1),ConI(2)),ConI(0)) = 3 : int

The second expression is 1 → 1 : 1, which is obviously ill-typed as the condition is an integer:

Cond(ConI(1),ConI(1),ConI(1)) =
Bad test argument to Cond

Next we'll see how we can add safety to our interpreter using phantom types.

No comments: