2007-10-04

Algebraic Data Types and Java Generics (III)

See also Part I and Part II.

As remarked in many a monad introduction, monads are especially suited to containing and managing side-effects. This is because in general monads structure semantics: little packages of special functionality neatly tucked away and generalized under a unit, bind interface.

One typical gripe of imperative programming is resource mana'gement: how to structure the code to make sure that every resource consumed is accounted for. In the days of yore, we C programmers had to ensure that our code didn't leak memory: bytes allocated dynamically had to be freed exactly once no matter what the code path executed.

Of course, most modern languages incorporate a garbage collector that makes most if not all issues of dynamic memory allocation automatically solvable. There are other kinds of resources, however, that must still be manually managed. Typically, files and streams must be correctly opened and closed even in the face of exceptions, lest some data be corrupted and/or lost.

Also, in the absence of some sort of arbitration mechanism, concurrent file reading or writing by various actors is sure to lead to race conditions, where the data accessed depends on the scheduling order among the actors. If the operations they execute depend on the data read, the execution of the overall program is not deterministic.

Both problems could be solved if we had a way to ensure that:

  1. all files are closed after we're done with them
  2. nobody retains a reference to a closed file
  3. at most one user has access to a particular file

All three conditions can be met if we somehow make plain file handles (or streams) inaccessible. A monad is ideal for this: it guarantees that its type statically enforces (i.e., in compiler time) that these properties are upheld:

public class Input<X> implements Monad<X, Input<X>> {

The idea is to encapsulate operations on input files as a function from the stream to the result of type X of the processing:

  private final Function<BufferedReader, X> source;

  private Input(Function<BufferedReader, X> source) {
    this.source = source;
  }

The construction of Input monads will be managed by bind. It takes a function f which turns values read into another Input, and threads the input stream among them, in order:

  public <Y> Input<Y> bind(final Function<X, ? extends Monad<Y, ?>> f) {
    return new Input<Y>(new Function<BufferedReader, Y>() {
      public Y apply(BufferedReader in) {
        return ((Input<Y>) f.apply(source.apply(in))).source.apply(in);
      }
    });
  }

Unfortunately, covariance of the parameter type means that the bound function returns a type that is too general, and a down-cast is required to force the result into a Input. This code is not type-safe as it is. Note that the result of f is destructured in order to supply to it the same input stream; once this is done, it is repackaged as an Input monad of the correct (return) type.

The unit converts a delayed value into the Input monad that discards the stream and instead of its contents always returns this value:

  public static <X> Input<X> unit(final Function<Void, X> source) {
    return new Input<X>(new Function<BufferedReader, X>() {
      public X apply(BufferedReader _) { return source.apply(null); }
    });
  }

The reason why the encapsulated value is delayed is that it usually is the final result, or summary, of processing all the data read. Lastly, we need a way to actually make all this work on a file opened for input. This is as simple as opening the file, passing it to the processing function and returning the result, making sure that the code cleans after itself:

  public X run(String name) throws IOException {
    BufferedReader in = null;
    try {
      in = new BufferedReader(new FileReader(name));
      return source.apply(in);
    } catch (UncheckedException e) {
      throw (IOException) e.getCause();
    } finally {
      if (in != null)
        in.close();
    }
  }

The problem with this is that performing input might throw an exception, but the Function interface is pure. We need to encapsulate the exception in an unchecked exception, then unwrap it and pass it along.

  private static final class UncheckedException extends RuntimeException {
    public UncheckedException(Throwable cause) {
      super(cause);
    }
  }

Unfortunately, there is not much that we can do with Input, as the constructor is private and the only way to bind to an input is by having one in the first place. The simplest Input we can think of is the one that reads characters from the file and returns them, one by one, unchanged:

  public static final Input<Integer> read =
    new Input<Integer>(new Function<BufferedReader, Integer>() {
      public Integer apply(BufferedReader in) {
        try { return in.read(); }
        catch (IOException e) { throw new UncheckedException(e); }
      }
    });

Another common way of reading text files is line by line. In this case, the result of the Input operation is a String with the text of the line:

  public static final Input<String> readLine =
    new Input<String>(new Function<BufferedReader, String>() {
      public String apply(BufferedReader in) {
        try { return in.readLine(); }
        catch (IOException e) { throw new UncheckedException(e); }
      }
    });

The simplest example that I can think of is a char counter. Or a line counter. Thanks to the type wizardry, I can use the same code to do both, or indeed, to count any number of items of type X:

public class InputCounter<X> {
  private int count = 0;

At the end of the stream of items, we need to return the count:

  private final Input<Integer> finalCount =
    Input.unit(new Function<Void, Integer>() {
      public Integer apply(Void _) { return count; }
    });

This Input processor will delay looking at it until the last moment. We also need to get the next item and update the running count:

  private Input<Integer> partialCount;

We still didn't say how we'll count these mythical items: we need a way to parameterize the Input reader, and we also need a sentinel for the last item read (in other words ageneric end-of-file predicate):

  public InputCounter(Input<X> reader, final Function<X, Boolean> sentinel) {

With this, we bind the reader to a function that checks if there are any more items or not:

    this.partialCount = reader.bind(new Function<X, Input<Integer>>() {
      public Input<Integer> apply(X item) {

If not (that is, if we're at end-of-file), it returns the final count:

        if (sentinel.apply(item))
          return finalCount;

Otherwise, we increment and return the partial count.

        else {
          count++;
          return partialCount;
        }
      }
    });
  }

With this, it can run the Input on the supplied file:

  public int count(String file) throws IOException {
    count = 0;
    return partialCount.run(file);
  }
}

Let's use this in a test. We count both characters and lines by constructing the InputCounter with appropriate parameters:

public class InputTest {
  private static final Function<Integer, Boolean> isLastChar =
    new Function<Integer, Boolean>() {
      public Boolean apply(Integer ch) { return ch == -1; }
    };

  private static final Function<String, Boolean> isLastLine =
    new Function<String, Boolean>() {
      public Boolean apply(String l) { return l == null; }
    };

  public static void main(String[] args) {
    if (args.length != 1) {
      System.err.println("usage - java InputTest <file>");
      System.exit(2);
    }

    try {
      final int nchars = new InputCounter<Integer>(Input.read, isLastChar).count(args[0]);
      final int nlines = new InputCounter<String>(Input.readLine, isLastLine).count(args[0]);
      System.out.printf("File \"%s\": %d chars, %d lines\n", args[0], nchars, nlines);
      System.exit(0);
    } catch (IOException e) {
      System.err.println(e.getMessage());
      System.exit(1);
    }
  }
}

The monad ensures that no file remains open, even in the event of an exception.

No comments: