The Necessity of Type Inference

Type inference is a subject near and dear to our heart here at Guidewire; one of the primary features of GScript from the very start has been type inference of local variables, and it’s proven over the years to be one of the more invaluable features in the language.  So invaluable, in fact, that it’s hard for me to stomach the prospect of using statically-typed languages without type inference.

For GScript, we’ve actually taken a relatively limited approach to type inference, at least compared to languages like Haskell or OCaml; we only infer the types of local variables or private member variables that have an initializer statement.  Method parameters and return types and non-private mamber variables always have to be explicitly typed, as do variables that have no initializer.  It might seem like that doesn’t buy you a whole lot, but in reality it makes your code tighter and more flexible without sacrificing any safety.

The code elimination part is fairly obvious if you try it out.  In Java, creating a new generified Map might look like:

Map<String, String> myMap = new HashMap<String, String>();

In GScript, it’s:

var myMap = new HashMap<String, String>()

The astute reader will notice that those don’t do exactly the same thing, but I’ll get to why that’s not really a problem later on.  In Java you can lessen the pain somewhat by using an IDE that automatically extracts variables or by ignoring the compiler warnings and dropping the type parameters on the right-hand side, but relying constantly on the IDE to do every little thing is a bit painful.  If you need to go back and change things, rather than just changing the type you have to use an IDE refactoring to make sure it changes the variable declarations.  Situations where you assign to the result of a method call are even more of a waste in Java:

Map<String, String> myMap = myMethodCall();

instead of:

var myMap = myMethodCall()

One of the classic developer refrains is Don’t Repeat Yourself, and explicit type declarations for things that are obviously inferrable clearly violates that.  If you want to change the type returned by myMethodCall(), you again have to be sure to use an IDE refactoring to make sure all the variables are changed, and you could end up needing to touch plenty of files to propagate it through.  The redundant type declarations add friction to your code base that not only make initial code creation harder but also make subsequent changes more difficult.

There’s a more subtle way in which type inference enables your code base to be flexible, though:  it provides some measure of duck-typing in the right situations.  For example, if you have the code:

var myList = myMethodCall()

It doesn’t matter whether myMethodCall() returns a List<String> or a String[] or any other variant as long as the subsequent calls are still valid.  Drop-in replacements for classes or interface extraction become much easier as a result of type inference, since the changes don’t have to propagate as heavily through the code.

You could go further with type inference than we have and attempt to infer the types themselves, infer union types, or infer method return types, but we’ve chosen to stop at variables for a few reasons.  The first one is that method parameters, return types, and variables exposed outside of the current class all create a contract between two parts of your code, and in those cases you generally want to control that contract more explicitly and use interfaces like List and Map instead of concrete implementation classes like ArrayList and HashMap.  Local variables and private member variables, by definition, don’t leak outside of a fairly limited scope, so it’s generally not worth worrying about whether they’re interfaces or implementation classes.  Secondly, it simplifies the rules about when explicit type declarations are needed; we could try to infer method return types but there would be some arbitrary conditions where that broke down thanks to cyclic dependencies between classes and methods and variables, and it wouldn’t be at all obvious to the user when or why.  Having a simple set of rules, we hope, makes that less confusing.  Lastly, anything more aggressive, like inferring parameter types based on usages, would require a completely different type system from what we have in GScript and a completely different direction for the language.  That’s not to say it’s a bad thing (I think a lot of what OCaml does with types is quite cool, for example), it’s just a radical departure from the more traditional Java/C++/C lineage than we want to make.

So given all the advantages, why does Java still avoid adding some amount of type inference in?  I honestly don’t know.  If anything should have provided the impetus to add type inference, it was the introduction of generics, which have caused countless redundant keystrokes over the years.  Personally I can’t construct a credible case against it, and I haven’t really heard one presented to me, though I do hear two arguments fairly often.

The first argument is the “use interfaces over impl classes” argument.  Yes, every good Java programmer knows that you want to use more generic interface types like List and Map instead of concrete implementation types like ArrayList or HashMap to protect the flexibility of your implementation and to increase encapsulation.  And that’s true, up to a point:  the point at which your implementation is exposed to the outside world.  For local variables and private member variables, however, it’s all implementation details at that point.  If you’re assigning the variable to the result of a method call, that call should be coded to return a List instead of an ArrayList.  If you’re passing the variable out to a method call, the parameter should be typed as a List instead of an ArrayList, so there’s no encapsulation leak there.  So the only thing you protect against by using interface types for local variables is against using implementation-specific methods without realizing it on objects created locally within that method or within the class.  That’s just not much of a danger in my book; you’re protecting a method or class against changes to itself that couldn’t possibly affect anything outside of itself, which seems a wee bit silly:  if I find that I’m using methods on ArrayList that aren’t on List, and I change the local variable I’m using so it’s a LinkedList, I’ll just deal with the fallout right then in that method.  It’s certainly not worth avoiding type inference to try to “protect” against that situation.

The second argument is that type inference makes the code harder to read.  This one has a little more credence, but most of the time it’s pretty obvious what’s going on.  New instance creation is always obvious, and in most other cases the variable is well-enough named, the method being called is well-enough named, or subsequent usages are obvious enough to make it easy to figure out.  The worst case is that you have to dig one level in to see what type a method is declared to return.  Any decent modern IDE will basically do this for you, though:  code-completing on the variable will tell you what type it is, bringing up the javadoc for the method being called will tell you, clicking through to the method will tell you, etc.  In our IDE, we’ve even added a shortcut, Control-T, that shows you the type of any variable or expression at any point in the code.  Even then, it’s not that much different than looking at a variable somewhere that’s been declared elsewhere:  you don’t repeat the type information on every usage and somehow people find a way to muddle through.  Yet somehow the redundant information on the declaration is critical and can’t be lived without?  Again, it seems like a post hoc justification for being afraid of something different, and in practice it’s not that much of an issue:  without any IDE it’s not that much different from what you have to do now to deal with variables and method calls you aren’t familiar with, and with an IDE it’s a complete non-issue.

To me the issue is pretty clear-cut: a type system is a tool to help catch errors early and to improve the ability for tools to understand and manipulate code, but it does so at a heavy cost in terms of verbosity and inflexibility.  Dynamically typed languages obviously have a huge advantage on those two fronts.  Type inference is a way to preserve the benefits of static typing while reducing its overhead, and at this point in the history of language, compiler, and IDE development it should be a part of every modern language, including Java.

3 Comments on “The Necessity of Type Inference”

  1. Raoul Duke says:

    then there’s the whole thing of going all Hindley-Milner and ending up with a system that gives you really freaked out error messages becaus it magically figured out some dope smoking inferred types for everything long after it should have given up because you wrote something incorrectly a while back. so i hear that people end up manually writing a fair bit of type annotations in Haskell, O’Caml to prevent the system from getting *too* far off course.

    not sure how that relates to the GScript/Scala level of inference.

  2. Alan Keefer says:

    That’s a big reason why we’ve kept the type inference model as simple as possible and only used it to infer the types of variables, and that type is always the type of the expression used as the initializer. That’s obviously something the compiler already knows in any statically-typed, since it’ll complain if you have a mismatch, and it never really does anything unexpected or incomprehensible as a result.

    What some other languages like OCaml do is essentially synthesize types based on the usages, and that gives you some cool things but also adds a huge layer of complexity that you need to understand in order to deal with them.

    For better or worse, I think imperative languages with simple type systems match how people think about the programming pretty well, so with GScript we’ve basically gone in the direction of taking something familiar and improved upon it by adding type inference, closures, and some other language features to make it more useful rather than going in a radically different direction.

  3. Raoul Duke says:

    re: non/radically different direction.

    that makes a lot of sense to me.

    although, i have to wonder if for some programming concerns, the way most people think about programming is something which can be fraught with troubles that actually do require significantly different approaches to get right.

    e.g. concurrency.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s