2008-06-16

When there's nothing on the other side

Much has been written of how a Maybe monad translates cleanly into generic Java or C#. I have done so in this blog. The fact remains that, more often than not, it is not even necessary to wrap your code in a monad to avoid testing an object reference for null. There is a pattern language of object-oriented programming called null object. Whether it is well-known or not, I can't say; anecdotal evidence suggests it isn't as much as it should be.

The idea behind it is simple: since polymorphic dispatch amounts to a conditional on the run-time type of the receiver, translate a conditional into two subclasses with a method whose body is the respective block in both branches of the original if. This pattern, when specialized to the case of the reference != null case calls for the creation of a special, do-nothing subclass of the object in question with a method whose body is empty. This is entirely analogous to having a sentinel node in a complicated data structure that allows trading time complexity for space complexity. In modern object-oriented languages, tracing just-in-time compilations might or might not make the use of a null object faster than a conditional.

Some well-used data structures lend themselves to this usage without having to code especially around them. If you'd like an example transliterated into a slogan, I could do worse than saying: an empty list is never null. Let me illustrate with a post.

Over at his blog, Dare Obasanjo is raving about how cool is functional programming, C# style. Not that I disagree, far from it; moreover, I must disclose that it is easier to criticize code other people's code than it is writing (and exposing!) code that is above criticism. Some of my earlier posts would have benefited of such a review. The first snippet he presents is:

public void UpdateAllFeeds(bool force_download)
{
   List<SubscriptionRootNode> rootNodes = this.GetAllSubscriptionRootNodes();
   if (rootNodes != null)
   {
      if (_timerRefreshFeeds.Enabled)
         _timerRefreshFeeds.Stop();
      _lastUnreadFeedItemCountBeforeRefresh = rootNodes.Sum(n => n.UnreadCount);
      FeedSources.Sources.ForEach(s => s.RefreshFeeds(force)); 
   }
}

What I first noticed was not the intent behind this code, or how it illustrated his main point, or anything else really beyond the big honkin' if (rootNodes != null). On a list reference. Returned from this.GetAllSubscriptionRootNodes(), a method in the receiver! If you ever wondered what could be a poster-example, a paradigm for cargo cult programming, I can think of no better example.

Essentially, the cost of ensuring that this.GetAllSubscriptionRootNodes() never returns null is nil. This can be documented and enforced by an invariant in a language that supports them, by an assertion in the body of a client method, or by one or more tests in the relevant harness. It simply amounts to writing something akin to:

private List<SubscriptionRootNode> GetAllSubscriptionRootNodes()
{
   List<SubscriptionRootNode> result = new List<SubscriptionRootNode>();
   // fill the list according to some logic...
   return result;
}

Repeat with me: the result of new is guaranteed in C# and in Java never to be the null reference. Hence, the result of the method is, by trivial dataflow analysis, never null. Hence, the test is never necessary.

Now, you might counter with something along the lines of "yeah, but what if some condition makes frobbing the buzzwits inapplicable? Can't I return null to signal upstream code to not frob the buzzwits?" Well, yes, but we're in "you're better off rethinking that" territory. Back to the original code, maybe GetAllSubscriptionRootNodes is really written like this:

private List<SubscriptionRootNode> GetAllSubscriptionRootNodes()
{
   if (/* some hairy condition */)
      // we don't want to touch the subscriptions right now...
      return null;

   List<SubscriptionRootNode> result = new List<SubscriptionRootNode>();
   // fill the list according to some logic...
   return result;
}

Then the original client code would be justified in testing for null. There are at least three ways out of this: the simple one is to, instead of null, return the empty list:

private List<SubscriptionRootNode> GetAllSubscriptionRootNodes()
{
   List<SubscriptionRootNode> result = new List<SubscriptionRootNode>();

   if (/* some hairy condition */)
      // we don't want to touch the subscriptions right now...
      return result;

   // fill the list according to some logic...
   return result;
}

The client code would then be something like this:

public void UpdateAllFeeds(bool force_download)
{
   List<SubscriptionRootNode> rootNodes = this.GetAllSubscriptionRootNodes();

   int totalUnread = rootNodes.Sum(n => n.UnreadCount);
   if (totalUnread != 0)
   {
      if (_timerRefreshFeeds.Enabled)
         _timerRefreshFeeds.Stop();
      _lastUnreadFeedItemCountBeforeRefresh = totalUnread;
      FeedSources.Sources.ForEach(s => s.RefreshFeeds(force)); 
   }
}

You might now say, "yes, but I really need to stop the feeds before I can count the unread posts in the root feeds, or else all manner of nasty things might happen!" Like, for instance, having an approximate count of unread feeds (or what have you). This might not be bad in itself, but you're entitled to your precisions in your own code. This brings me to the second, not-so-simple solution: Instead of conflating two unrelated results into a single return value ("either a condition makes getting the root nodes inapplicable, or here they are those root nodes"), split the two functionalities into different methods:

private bool CanUpdateSubscriptions()
{
   return !/* some hairy condition */;
}

private List<SubscriptionRootNode> GetAllSubscriptionRootNodes()
{
   List<SubscriptionRootNode> result = new List<SubscriptionRootNode>();
   // fill the list according to some logic...
   return result;
}

Aside: I'm well aware that I could be embarking in a gigantic straw man, and the original code is simply wrong as opposed to subtly wrong. I mean, it might be that the original intent behind the null test was cargo cultishness. Bear with me, because I feel this point deserves being made.

Now, the client code is:

public void UpdateAllFeeds(bool force_download)
{
   if (!CanUpdateSubscriptions())
     return; // early, return often
   
   List<SubscriptionRootNode> rootNodes = this.GetAllSubscriptionRootNodes();
   if (_timerRefreshFeeds.Enabled)
      _timerRefreshFeeds.Stop();
   _lastUnreadFeedItemCountBeforeRefresh = rootNodes.Sum(n => n.UnreadCount);
   FeedSources.Sources.ForEach(s => s.RefreshFeeds(force)); 
}

As you can see, splitting the intent behind return-list-except-for-null-as-a-flag into a bona-fide predicate lets the client logic to be simpler. In fact, the code testing for applicability could be moved further upstream, to —for example— disable menu items or other UI elements.

The third approach, and the one I feel should be preferred whenever possible, would be to make sure that all methods are idempotent; in this case, if there aren't root subscriptions to update, make sure that the calls to stop synchronization and to refresh the sources succeed without unwanted side-effects. This would in effect make the following code:

public void UpdateAllFeeds(bool force_download)
{
   List<SubscriptionRootNode> rootNodes = this.GetAllSubscriptionRootNodes();
   if (_timerRefreshFeeds.Enabled)
      _timerRefreshFeeds.Stop();
   _lastUnreadFeedItemCountBeforeRefresh = rootNodes.Sum(n => n.UnreadCount);
   FeedSources.Sources.ForEach(s => s.RefreshFeeds(force)); 
}

always correct and applicable, no matter what state the object is in. Now I realize it might not always be possible, or desirable, to enforce such a strong invariant, especially in the early stages of development; however, the general heuristic principle that says that you should be lenient in what you accept and strict in what you emit leads to more robust, widely reusable code.

As a side note, the other heuristic principle that says that you should not break symmetry without good reason to do so makes me think that, after updating all feeds, the refresh timer should be re-enabled. Can it be that the code has a bug?

No comments: