The Necessity of Type InferencePosted: July 15, 2008
Type inference is a subject near and dear to our heart here at Guidewire; one of the primary features of GScript from the very start has been type inference of local variables, and it’s proven over the years to be one of the more invaluable features in the language. So invaluable, in fact, that it’s hard for me to stomach the prospect of using statically-typed languages without type inference.
For GScript, we’ve actually taken a relatively limited approach to type inference, at least compared to languages like Haskell or OCaml; we only infer the types of local variables or private member variables that have an initializer statement. Method parameters and return types and non-private mamber variables always have to be explicitly typed, as do variables that have no initializer. It might seem like that doesn’t buy you a whole lot, but in reality it makes your code tighter and more flexible without sacrificing any safety.
The code elimination part is fairly obvious if you try it out. In Java, creating a new generified Map might look like:
Map<String, String> myMap = new HashMap<String, String>();
In GScript, it’s:
var myMap = new HashMap<String, String>()
The astute reader will notice that those don’t do exactly the same thing, but I’ll get to why that’s not really a problem later on. In Java you can lessen the pain somewhat by using an IDE that automatically extracts variables or by ignoring the compiler warnings and dropping the type parameters on the right-hand side, but relying constantly on the IDE to do every little thing is a bit painful. If you need to go back and change things, rather than just changing the type you have to use an IDE refactoring to make sure it changes the variable declarations. Situations where you assign to the result of a method call are even more of a waste in Java:
Map<String, String> myMap = myMethodCall();
var myMap = myMethodCall()
One of the classic developer refrains is Don’t Repeat Yourself, and explicit type declarations for things that are obviously inferrable clearly violates that. If you want to change the type returned by myMethodCall(), you again have to be sure to use an IDE refactoring to make sure all the variables are changed, and you could end up needing to touch plenty of files to propagate it through. The redundant type declarations add friction to your code base that not only make initial code creation harder but also make subsequent changes more difficult.
There’s a more subtle way in which type inference enables your code base to be flexible, though: it provides some measure of duck-typing in the right situations. For example, if you have the code:
var myList = myMethodCall() print(myList)
It doesn’t matter whether myMethodCall() returns a List<String> or a String or any other variant as long as the subsequent calls are still valid. Drop-in replacements for classes or interface extraction become much easier as a result of type inference, since the changes don’t have to propagate as heavily through the code.
You could go further with type inference than we have and attempt to infer the types themselves, infer union types, or infer method return types, but we’ve chosen to stop at variables for a few reasons. The first one is that method parameters, return types, and variables exposed outside of the current class all create a contract between two parts of your code, and in those cases you generally want to control that contract more explicitly and use interfaces like List and Map instead of concrete implementation classes like ArrayList and HashMap. Local variables and private member variables, by definition, don’t leak outside of a fairly limited scope, so it’s generally not worth worrying about whether they’re interfaces or implementation classes. Secondly, it simplifies the rules about when explicit type declarations are needed; we could try to infer method return types but there would be some arbitrary conditions where that broke down thanks to cyclic dependencies between classes and methods and variables, and it wouldn’t be at all obvious to the user when or why. Having a simple set of rules, we hope, makes that less confusing. Lastly, anything more aggressive, like inferring parameter types based on usages, would require a completely different type system from what we have in GScript and a completely different direction for the language. That’s not to say it’s a bad thing (I think a lot of what OCaml does with types is quite cool, for example), it’s just a radical departure from the more traditional Java/C++/C lineage than we want to make.
So given all the advantages, why does Java still avoid adding some amount of type inference in? I honestly don’t know. If anything should have provided the impetus to add type inference, it was the introduction of generics, which have caused countless redundant keystrokes over the years. Personally I can’t construct a credible case against it, and I haven’t really heard one presented to me, though I do hear two arguments fairly often.
The first argument is the “use interfaces over impl classes” argument. Yes, every good Java programmer knows that you want to use more generic interface types like List and Map instead of concrete implementation types like ArrayList or HashMap to protect the flexibility of your implementation and to increase encapsulation. And that’s true, up to a point: the point at which your implementation is exposed to the outside world. For local variables and private member variables, however, it’s all implementation details at that point. If you’re assigning the variable to the result of a method call, that call should be coded to return a List instead of an ArrayList. If you’re passing the variable out to a method call, the parameter should be typed as a List instead of an ArrayList, so there’s no encapsulation leak there. So the only thing you protect against by using interface types for local variables is against using implementation-specific methods without realizing it on objects created locally within that method or within the class. That’s just not much of a danger in my book; you’re protecting a method or class against changes to itself that couldn’t possibly affect anything outside of itself, which seems a wee bit silly: if I find that I’m using methods on ArrayList that aren’t on List, and I change the local variable I’m using so it’s a LinkedList, I’ll just deal with the fallout right then in that method. It’s certainly not worth avoiding type inference to try to “protect” against that situation.
The second argument is that type inference makes the code harder to read. This one has a little more credence, but most of the time it’s pretty obvious what’s going on. New instance creation is always obvious, and in most other cases the variable is well-enough named, the method being called is well-enough named, or subsequent usages are obvious enough to make it easy to figure out. The worst case is that you have to dig one level in to see what type a method is declared to return. Any decent modern IDE will basically do this for you, though: code-completing on the variable will tell you what type it is, bringing up the javadoc for the method being called will tell you, clicking through to the method will tell you, etc. In our IDE, we’ve even added a shortcut, Control-T, that shows you the type of any variable or expression at any point in the code. Even then, it’s not that much different than looking at a variable somewhere that’s been declared elsewhere: you don’t repeat the type information on every usage and somehow people find a way to muddle through. Yet somehow the redundant information on the declaration is critical and can’t be lived without? Again, it seems like a post hoc justification for being afraid of something different, and in practice it’s not that much of an issue: without any IDE it’s not that much different from what you have to do now to deal with variables and method calls you aren’t familiar with, and with an IDE it’s a complete non-issue.
To me the issue is pretty clear-cut: a type system is a tool to help catch errors early and to improve the ability for tools to understand and manipulate code, but it does so at a heavy cost in terms of verbosity and inflexibility. Dynamically typed languages obviously have a huge advantage on those two fronts. Type inference is a way to preserve the benefits of static typing while reducing its overhead, and at this point in the history of language, compiler, and IDE development it should be a part of every modern language, including Java.