Feature Literals

In the current open source release of Gosu there is a new feature called, er, feature literals. Feature literals provide a way to statically refer to the features of a given type in the Gosu type system. Consider the following Gosu class:

  class Employee {
    var _boss : Employee as Boss
    var _name : String as Name
    var _age : int as Age

    function update( name : String, age : int ) {
      _name = name
      _age = age
    }
  }

Given this class, you can refer to its features using the '#' operator (inspired by the Javadoc @link syntax):

  var nameProp = Employee#Name
  var ageProp = Employee#Age
  var updateFunc = Employee#update(String, int)

These variables are all various kinds of feature references. Using these feature references, you can get to the underlying Property or Method Info (Gosu’s equivalents to java.lang.reflect.Method), or use the feature references to directly invoke/get/set the features.

Let’s look at the using the nameProp above to update a property:

  var anEmp = new Employee() { :Name = "Joe", :Age = 32 }
  print( anEmp.Name ) // prints "Joe"
  
  var nameProp = Employee#Name

  nameProp.set( anEmp, "Ed" )
  print( anEmp.Name ) // now prints "Ed"

You can also bind a feature literal to an instance, allowing you to say “Give me the property X for this particular instance“:

  var anEmp = new Employee() { :Name = "Joe", :Age = 32 }
  var namePropForAnEmp = anEmp#Name

  namePropForAnEmp.set( "Ed" )
  print( anEmp.Name ) // prints "Ed"

Note that we did not need to pass an instance into the set() method, because we bound the property reference to the anEmp variable.

You also can bind argument values in method references:

  var anEmp = new Employee() { :Name = "Joe", :Age = 32 }
  var updateFuncForAnEmp = anEmp#update( "Ed", 34 )
  
  print( anEmp.Name ) // prints "Joe", we haven't invoked 
                      // the function reference yet
  updateFuncForAnEmp.invoke()
  print( anEmp.Name ) // prints "Ed" now

This allows you to refer to a method invocation with a particular set of arguments. Note that the second line does not invoke the update function, it rather gives you a reference that you can use to evaluate the function with later.

Feature literals support chaining, so you could write this code:

  var bossesNameRef = anEmp#Boss#Name

Which refers to the name of anEmp‘s boss.

You can convert method references to blocks quite easily:

  var aBlock = anEmp#update( "Ed", 34 ).toBlock()

Finally, feature references are parameterized on both their root type and the features type, so it is easy to say “give me any property on type X” or “give me any function with the following signature”.

So, what is this language feature useful for? Here are a few examples:

  1. It can be used in mapping layers, where you are mapping between properties of two types
  2. It can be used for a data-binding layer
  3. It can be used to specify type-safe bean paths for a query layer

Basically, any place you need to refer to a property or method on a type and want it to be type safe, you can use feature literals.

Ronin, an open source web framework, is making heavy use of this feature. You can check it out here:

  http://ronin-web.org

Enjoy!


Two Ways to Design A Programming Language

Method One:

Have a legend of the field think deeply and make precisely reasoned arguments for features.

Method Two:

Look at code-gen features in IntelliJ and figure out how to avoid needing them for your language:

Implicit Type Casting

IntelliJ has a macro for the common pattern of checking the type of something and then immediately downcasting/crosscasting the expression to that type:

instanceof

Java:

  Object x = aMethodThatReturnsAnObject();
  if( x instanceof List ) {
    System.out.println( "x is a List of size " + ((List) x).size() );
  } 

Gosu:

  var x : Object = aMethodThatReturnsAnObject()
  if( x typeis List ) {
    // hey, you already told us x was a list.  Why make you cast?
    print( "x is a List of size ${x.size()}" )
  } 

Delegation

IntelliJ has a wizard to assist you in delegating all the implementations of an interface to a field:

delegation

Gosu:

class MyDelegatingList implements List {

  delegate _delegateList represents List

  construct( delegateList : List ) {
    _delegateList = delegateList
  }

  override function add( Object o ) : boolean {
    print( "Called add!" )
    return _delegateList.add( o )
  }

  // all other methods on List are automatically
  // delegated to _delegateList
}

I omit the java version out of respect for your eyes.

I should note, Gosu is the new name for GScript, our internal programming language


Automation of Burn-up and Burn-down charts using GScript and Entrance

I have always found that the burn-up and burn-down charts are very informative and fit to the iterative story-based development very well. Every project that I work on, I will try to figure out different ways to generate burn-up and burn-down charts.

Two months ago, I took the job on putting Platform on Sprints. After some consideration, I have decided to follow the setup that I have for the AF team, creating the stories in the form of JIRA issues. However, the chart generation that I had for AF team was still semi-manual, which means that it takes a couple of minutes to download, and a couple of minutes to update the stats every morning. The worst part is that when I get busy or sick, I will forget.

So my first action item was to figure out how to generate the same kind of charts with the push of a button. The idea seems to be easy enough:

  1. Figure out the search criteria to retrieve all the JIRA issues of the backlog
  2. Count the issues that are in different states
  3. Update the data with the counts, and check it into Perforce.
  4. Refresh the chart with the updated data

Number one and two were actually not that hard, because Guidewire GScript has a nice WebServices support. With a few tries, I was able to count the beans.

Here is an example of the data generated. I think you get the idea just looking at it.

Date,Day,Closed,Deferral Requested,In QA,Open Stories,Open New Features,Open Bugs
10/09/2009,41,55,0,1,13,7,40
10/12/2009,42,55,0,1,14,7,40
10/13/2009,43,56,0,0,14,7,40
10/14/2009,44,56,0,0,16,7,41
10/15/2009,45,56,0,0,21,8,42
10/16/2009,46,58,0,1,19,8,42
10/19/2009,47,58,0,2,28,8,42
10/20/2009,48,58,0,6,26,8,42
10/21/2009,49,58,0,6,26,8,42
10/22/2009,50,58,0,7,25,8,44

Number three took less time but a bit of research because Perforce Java library’s API is not exactly straightforward.

It took me a while to figure out how to do the last one. After looking into JFreeChart and Google Chart API, I eventually turned to my dear friend, Tod Landis, who is also my partner at Entrance, and he quickly drafted an entrance script for me. Based on it, I was able to write a template that can be used for all teams within a few hours.


PLOT
  very light yellow  area,
  light yellow filled circles and line,
  DATALABELS
  very light orange  area,
  light orange filled circles and line,
  very light red  area,
  light red filled circles and line,
  very light blue area,
  light blue filled circles and line,
  very light gray area,
  dark gray filled circles and line,
  very light green area,
  dark green filled circles and line,
  all AXISLABELS
WITH
  ZEROBASED
  TITLE "Sprint"
  TITLE X "Days"
  TITLE Y "Points"
  SCALE Y 0 100 0
  CLIP
  legend
  gridlines
  collar
  no sides
SELECT
  `Open Bugs`,
  `Open Bugs`,
  date,
  `Open New Features`,
  `Open New Features`,
  `Open Stories`,
  `Open Stories`,
  `In QA`,
  `In QA`,
  `Deferral Requested`,
  `Deferral Requested`,
  `Closed`,
  `Closed`,
  day
from report;

Please note this is the final PLOT script, there are other SQLs run before this to import the data into the MySQL database, sum up the data to produce a stacked chart, and even out the labels.

And I now have this chart generated automatically every morning with the help of a windows scheduler.


I Am Hate Method Overloading (And So Can You!)

My hatred of method overloading has become a running joke at Guidewire. My hatred is genuine, icy hot, and unquenchable. Let me explain why.

First Principals
First of all, just think about naming in the abstract. Things should have good names. A good name is unique and easy to understand. If you have method overloading, the name of a method is no longer unique. Instead, the real name of the method is the sane, human chosen name, plus the fully qualified name of each argument’s type. Doesn’t that just seem sort of insane? If you are writing a tool that needs to refer to methods, or if you are just trying to look up a method reflectively, you have to know the name, plus all the argument types. And you have to know this even if the method isn’t overloaded: you pay the price for this feature even when it isn’t used.

Maybe that strikes you as a bit philosophical. People use method overloading in java, so there must be some uses for it. I’ll grant that, but there are better tools to address those problems.

In the code I work with day to day, I see method overloading primarily used in two situations:

Telescoping Methods
You may have a function that takes some number of arguments. The last few arguments may not be all that important, and most users would be annoyed in having to figure out what to pass into them. So you create a few more methods with the same name and fewer arguments, which call through to the “master” method. I’ve seen cases where we have five different versions of a method with varying numbers of arguments.

So how do I propose people deal with this situation without overloading? It turns out to be a solved problem: default arguments. We are (probably) going to support these in the Diamond release of GScript:

  function emailSomeone( address:String, subject:String, body:String,
                         cc:String=null, logToServer:boolean=false, 
                         html:boolean = false ) {
    // a very well done implementation
  }

A much cleaner solution. One method, with obvious syntax and, if your IDE is any good, it will let you know what the default values of the option arguments are (unlike method overloading.)

True Overloading
Sometimes you truly want a method to take two different types. A good example of this is the XMLNode.parse() method, which can take a String or a File or an InputStream.

I actually would probably argue with you on this one. I don’t think three separate parse methods named parseString(), parseFile() and parseInputStream() would be a bad thing. Code completion is going to make it obvious which one to pick and, really, picking a unique name isn’t going to kill you.

But fine, you insist that I’m a terrible API designer and you *must* have one method. OK, then use a union type (also probably available in the Diamond release of GScript):

  function parse( src:(String|File|IOStream) ) : XMLNode {
    if( src typeis String ) {
      // parse the string
    }
    ...
  }

A union type lets you say “this argument is this type or that type.” It’s then up to you to distinguish between them at runtime.

You will probably object that this syntax is moderately annoying, but I’d counter that it will end up being fewer lines of code than if you used method overloading and that, if you really want a single function to handle three different types, you should deal with the consequences. If it bothers you too much, just pick unique names for the methods!

So?
Let’s say you accept my alternatives to the above uses of method overloading. You might still wonder why I hate it. After all, it’s just a feature and a pretty common one at that. Why throw it out?

To understand why I’d like to throw it out, you have to understand a bit about how the GScript parser works. As you probably know, GScript makes heavy use of type inference to help developers avoid the boilerplate you find in most statically typed languages.

For example, you might have the following code:

  var lstOfNums = {1, 2, 3}
  var total = 0
  lstOfNums.each( \ i -> { total = total + i  } )

In the code above, we are passing a block into the each() method on List, and using it to sum up all the numbers in the list. ‘i‘ is the parameter to the block, and we infer it’s type to ‘int’ based on the type of the list.

This sort of inference is very useful, and it takes advantage of context sensitive parsing: we can parse the block expression because we know the type of argument that each() expects.

Now, it turns out that method overloading makes this context sensitive parsing difficult because it means that when you are parsing an expression there is no guarantee that there is a single context type. You have to accept that there may be multiple types in context when parsing any expression.

Let me explain that a bit more. Say you have two methods:

  function foo( i : int ) {
  }
  
  function foo( i : String ) {
  }

and you are attempting to parse this expression:

  foo( someVar )

What type can we infer that the context type is when we parse the expression someVar? Well, there isn’t any single context type. It might be an int or it might be a String. That isn’t a big deal here, but it becomes a big deal if the methods took blocks, or enums or any other place where GScript does context type sensitive parsing. You end up having lists of context types rather than a single context type in all of your expression parsing code. Ugly.

Furthermore, when you have method overloading, you have to score method invocations. If there is more than one version of a method, and you are parsing a method invocation, you don’t know which version of the method you are calling until after you’ve parsed all the arguments. So you’ve got to run through all the argument types and see which one is the “best” match. This ends up being some really complicated code.

Complexity Kills

Bitch, bitch, moan, moan. Just make it work, you say. If the java guys can do it, why can’t you? Well, we have made it work (for the most part.) But there’s a real price we pay for it.

I’m a Berkeley, worse-is-better sort of guy. I think that simplicity of design is the most important thing. I can’t tell you how much more complicated method overloading makes the implementation of the GScript parser. Parsing expressions, parsing arguments, assignability testing, etc. It bleeds throughout the entire parser, its little tentacles of complexity touching places you would never expect. If you come across a particularly nasty part of the parser, it’s a good bet that it’s there either because of method overloading or, at least, is made more complicated by it.

Oh, man up! you say. That’s the parser’s and tool developer’s problem, not yours.

Nope. It’s your problem too. Like Josh Bloch says, projects have a complexity budget. When we blow a big chunk of that budget on an idiotic feature like method overloading, that means we can’t spend it on other, better stuff.

Unfortunately, because GScript is Java compatible, we simply can’t remove support for method overloading. If we could though, GScript would have other, better features and, more importantly, fewer bugs.

That is why I am hate method overloading. And so can you.


I hate to beat a dead horse and all . . .

. . . but this is pretty egregious.  I got so sick of writing my own partition code in Java (since I’m so used to being able to do it easily in GScript) that I pushed it out into a utility method so I wouldn’t have to rewrite the same code over and over.  Thanks to the overhead of Java generics, anonymous inner classes, and type declarations, I’m not sure it was even a win.  Now my code looks like:

  Map<String, List<FormPatternLookup>> partitionedLookups =
    CollectionUtil.partition(lookups,
    new CollectionUtil.Partitioner<FormPatternLookup, String>() {
      public String partitionKey(FormPatternLookup formPatternLookup) {
        return formPatternLookup.getState() +
          ";;" + formPatternLookup.getUWCompanyCode();
      }
  });

Ugh.  If I had written this code in GScript, it would have been:

  var partitionedLookups = lookups.partition(\l -> l.State +
    ";;" + l.UWCompanyCode)

I mean, honestly . . . that’s pretty brutal.


The Necessity of Type Inference

Type inference is a subject near and dear to our heart here at Guidewire; one of the primary features of GScript from the very start has been type inference of local variables, and it’s proven over the years to be one of the more invaluable features in the language.  So invaluable, in fact, that it’s hard for me to stomach the prospect of using statically-typed languages without type inference.

For GScript, we’ve actually taken a relatively limited approach to type inference, at least compared to languages like Haskell or OCaml; we only infer the types of local variables or private member variables that have an initializer statement.  Method parameters and return types and non-private mamber variables always have to be explicitly typed, as do variables that have no initializer.  It might seem like that doesn’t buy you a whole lot, but in reality it makes your code tighter and more flexible without sacrificing any safety.

The code elimination part is fairly obvious if you try it out.  In Java, creating a new generified Map might look like:

Map<String, String> myMap = new HashMap<String, String>();

In GScript, it’s:

var myMap = new HashMap<String, String>()

The astute reader will notice that those don’t do exactly the same thing, but I’ll get to why that’s not really a problem later on.  In Java you can lessen the pain somewhat by using an IDE that automatically extracts variables or by ignoring the compiler warnings and dropping the type parameters on the right-hand side, but relying constantly on the IDE to do every little thing is a bit painful.  If you need to go back and change things, rather than just changing the type you have to use an IDE refactoring to make sure it changes the variable declarations.  Situations where you assign to the result of a method call are even more of a waste in Java:

Map<String, String> myMap = myMethodCall();

instead of:

var myMap = myMethodCall()

One of the classic developer refrains is Don’t Repeat Yourself, and explicit type declarations for things that are obviously inferrable clearly violates that.  If you want to change the type returned by myMethodCall(), you again have to be sure to use an IDE refactoring to make sure all the variables are changed, and you could end up needing to touch plenty of files to propagate it through.  The redundant type declarations add friction to your code base that not only make initial code creation harder but also make subsequent changes more difficult.

There’s a more subtle way in which type inference enables your code base to be flexible, though:  it provides some measure of duck-typing in the right situations.  For example, if you have the code:

var myList = myMethodCall()
print(myList[0])

It doesn’t matter whether myMethodCall() returns a List<String> or a String[] or any other variant as long as the subsequent calls are still valid.  Drop-in replacements for classes or interface extraction become much easier as a result of type inference, since the changes don’t have to propagate as heavily through the code.

You could go further with type inference than we have and attempt to infer the types themselves, infer union types, or infer method return types, but we’ve chosen to stop at variables for a few reasons.  The first one is that method parameters, return types, and variables exposed outside of the current class all create a contract between two parts of your code, and in those cases you generally want to control that contract more explicitly and use interfaces like List and Map instead of concrete implementation classes like ArrayList and HashMap.  Local variables and private member variables, by definition, don’t leak outside of a fairly limited scope, so it’s generally not worth worrying about whether they’re interfaces or implementation classes.  Secondly, it simplifies the rules about when explicit type declarations are needed; we could try to infer method return types but there would be some arbitrary conditions where that broke down thanks to cyclic dependencies between classes and methods and variables, and it wouldn’t be at all obvious to the user when or why.  Having a simple set of rules, we hope, makes that less confusing.  Lastly, anything more aggressive, like inferring parameter types based on usages, would require a completely different type system from what we have in GScript and a completely different direction for the language.  That’s not to say it’s a bad thing (I think a lot of what OCaml does with types is quite cool, for example), it’s just a radical departure from the more traditional Java/C++/C lineage than we want to make.

So given all the advantages, why does Java still avoid adding some amount of type inference in?  I honestly don’t know.  If anything should have provided the impetus to add type inference, it was the introduction of generics, which have caused countless redundant keystrokes over the years.  Personally I can’t construct a credible case against it, and I haven’t really heard one presented to me, though I do hear two arguments fairly often.

The first argument is the “use interfaces over impl classes” argument.  Yes, every good Java programmer knows that you want to use more generic interface types like List and Map instead of concrete implementation types like ArrayList or HashMap to protect the flexibility of your implementation and to increase encapsulation.  And that’s true, up to a point:  the point at which your implementation is exposed to the outside world.  For local variables and private member variables, however, it’s all implementation details at that point.  If you’re assigning the variable to the result of a method call, that call should be coded to return a List instead of an ArrayList.  If you’re passing the variable out to a method call, the parameter should be typed as a List instead of an ArrayList, so there’s no encapsulation leak there.  So the only thing you protect against by using interface types for local variables is against using implementation-specific methods without realizing it on objects created locally within that method or within the class.  That’s just not much of a danger in my book; you’re protecting a method or class against changes to itself that couldn’t possibly affect anything outside of itself, which seems a wee bit silly:  if I find that I’m using methods on ArrayList that aren’t on List, and I change the local variable I’m using so it’s a LinkedList, I’ll just deal with the fallout right then in that method.  It’s certainly not worth avoiding type inference to try to “protect” against that situation.

The second argument is that type inference makes the code harder to read.  This one has a little more credence, but most of the time it’s pretty obvious what’s going on.  New instance creation is always obvious, and in most other cases the variable is well-enough named, the method being called is well-enough named, or subsequent usages are obvious enough to make it easy to figure out.  The worst case is that you have to dig one level in to see what type a method is declared to return.  Any decent modern IDE will basically do this for you, though:  code-completing on the variable will tell you what type it is, bringing up the javadoc for the method being called will tell you, clicking through to the method will tell you, etc.  In our IDE, we’ve even added a shortcut, Control-T, that shows you the type of any variable or expression at any point in the code.  Even then, it’s not that much different than looking at a variable somewhere that’s been declared elsewhere:  you don’t repeat the type information on every usage and somehow people find a way to muddle through.  Yet somehow the redundant information on the declaration is critical and can’t be lived without?  Again, it seems like a post hoc justification for being afraid of something different, and in practice it’s not that much of an issue:  without any IDE it’s not that much different from what you have to do now to deal with variables and method calls you aren’t familiar with, and with an IDE it’s a complete non-issue.

To me the issue is pretty clear-cut: a type system is a tool to help catch errors early and to improve the ability for tools to understand and manipulate code, but it does so at a heavy cost in terms of verbosity and inflexibility.  Dynamically typed languages obviously have a huge advantage on those two fronts.  Type inference is a way to preserve the benefits of static typing while reducing its overhead, and at this point in the history of language, compiler, and IDE development it should be a part of every modern language, including Java.


Why I’m Not a Fan of Java’s Auto-Unboxing

Starting with Java 1.5, the Java compiler started automatically taking care of converting boxed versions of primitive types into primitive types where necessary, and vice-versa.  While I’m generally in favor of anything that makes programming languages easier to use, provided it doesn’t overly-complicate things, I’ve never been a huge fan of the way it was done.

Most languages either have C or C++ style primitives or object-style numbers and booleans, but Java has always had an odd mix of the two.  The easiest thing for a programmer is to simply go all the way to the object model and be done with it; my assumption is that Java didn’t do that both for the sake of continuity with C and for the sake of raw performance.

But unfortunately, the solution of auto-converting between the two worlds doesn’t really solve the problem that there are differences there that you have to be aware of.  The main difference is around null; a primitive type can’t be null, and a boxed type can, so every time you unbox an object you have to worry about how to handle null values.  If you do the unboxing by hand, at least you might have to think about it, and if you don’t at least it’ll be obvious what the error is.  But the auto-unboxing both manages to not handle null for you and manages to completely hide the errors when they do happen, which is basically a huge lose-lose in my book.  I managed to spend far too long the other day trying to figure out why the following line of code was throwing an NPE:

modifier.setEligible(!pattern.getDisplayEligibility());

My instinct is that an NPE is caused by a method call or field reference on a null object, so clearly either “modifier” or “pattern” has to be null here, right?  So after staring hard at the code for several minutes to try to figure out how that could be possible I had to go to the trouble of setting everything up in the debugger, walking through the code . . . and finding out neither was null.  Huh?  Of course, not at all obvious from reading the code is the fact that getDisplayEligibility() returns a Boolean rather than a boolean, and in this case it was returning null, meaning that the ! operator was throwing the NPE.

Normally, if you tried to apply ! to an object you’d get a compile error, but thanks to the magic of auto-unboxing Java now has all sorts of exciting ways to throw NPEs that you wouldn’t think to find.  Right after fixing that bug I ran into another, very similar error:

public boolean isScheduleRate() {
  return getPattern().getScheduleRate();
}

Again, getScheduleRate() actually returns a Boolean instead of a boolean, so if it’s null the return statement will NPE.  Combined with the way Java implemented the new for loop, that means that instead of an NPE only being possible on the . operator or [] references, you now have to look out for !, return, arithmetic operators like + and -, and for loops.

In GScript we, for better or worse, auto-coerce null to false or 0 depending on the context, which also requires you to understand that that can happen, but at least prevents the sort of hidden NPEs that the current Java approach causes.  It probably behaves like you’d expect most of the time and is generally consistent with how other languages behave.