Some Ways To Improve On Java’s Generics

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook
Plenty of ink has already been spilled over the issue of Java generics, so perhaps I’m just adding to the noise with this; hopefully I’ll manage to add something useful instead. When I first started diving into generics, I honestly didn’t think they were that bad: in simple cases they’re clearly a huge improvement in both clarity and safety over just casting all over the place. There are also plenty of cool things you can do with generics to make reflective programming more typesafe, and plenty of other uses you can put them to as well.

Now that I’ve used them fairly extensively and pushed the limits in pretty much every direction, I’m starting to learn when to back off and avoid using them. Generics, like static typing in general, provide a tool to help you reduce the number of errors in your program. If the cost of using the tool outweighs the benefit you get from the tool, you just shouldn’t bother using it. Too often I’ve found myself spending a long time puzzling over complicated generics statements that, in reality, will help me prevent one really obvious bug that will be caught by our automated tests and fixed in about 5 minutes; spending 2 hours to lower the chance from 3% to 0% that I’ll have to spend 5 minutes fixing something isn’t a win.

For example, a few months back I was rewriting the part of our system that defines how policy products are defined. It’s a highly customizable area of the application, and to generate new configurations on the fly for test purposes we use a builder pattern. I found myself writing code with method signatures like:

protected <L extends ProductModelObjectWithCode, O extends ProductModelObjectBase>
ProductModelFKPopulator<L, O> createFKPopulator(ProductModelLinkField<L, O> field,
ProductModelBuilderBase<L, ? extends ProductModelBuilderBase> builder) {
  return new ProductModelFKPopulator<L, O>(field, builder);
}

What in the world was I thinking? Here we have a foreign key-like field generified on the linkee object type (L) and the owning object type (O), and our builders are generified on what they build and on the builder type itself (for covariant returns). So this is saying that you can create a field populator based on a nested builder if the builder you pass in builds objects that are the same type as the linkee of the field you want to populate. I’m not sure that declaration is even correct, since it should probably be ? extends L in the ProductModelBuilderBase argument declaration. The point is that clearly I was temporarily insane when I wrote this: it’s trying to prevent an error that I’m unlikely to make and that, if I make it, will be easy to detect and fix. In order to prevent it, I’ve managed to make the method signature completely incomprehensible and introduced the risk that someone will suffer a seizure merely from looking at it. And on top of that, it’s probably not even strictly correct and will prevent some valid calls from being made. Writing that was not exactly the most productive use of my time.

So certainly it’s helpful to know when generics just aren’t worth the trouble. Some parts of generics could be made more useful or lightweight, however, and with that in mind in GScript we support Java’s generics but with some changes that we think make generics easier to work with, more useful, and less heavyweight. There are also a few other directions we’ve considered going in but haven’t yet decided are worth the additional complexity.

Wildcards Aren’t Worth It

My understanding of this is that wildcards exist primarily to plug the array type hole. Suppose you have Shape with two subclasses, Circle and Square. In Java, Circle[] is assignable to Shape[], but you can’t actually treat a Circle[] as a Shape[], since you can store a Square in a Shape[] but not in a Circle[]. If you try to do that, you’ll get an ArrayStoreException at runtime, but it can’t be caught at compile time.

The need to catch that statically with generics was probably even greater because generics aren’t reified the way array types are, so there’s not even a way to catch that you’re storing a Square in a List<Circle> in Java. You’ll only get a ClassCastException when yanking the Square out and treating it like a Circle.

To combat that, in generics List<Circle> is not a subtype of List<Shape>. Instead, it’s a subytpe of List<? extends Shape>. If you have a variable of type List<? extends Shape> you can read from it (you get back a Shape), but you can’t add to it, because you don’t know what the actual concrete type is. More generally, you can’t call a method that takes a wildcard type, but you can have the wildcard type as a return value and it’ll be inferred to the bounding type.

The problem is that the wildcards tend to filter through your code in all sorts of horrific ways, especially with interfaces. You have to be really, really careful to make sure that interface methods return wildcard types everywhere. Having an interface method that returns List<Shape> means that someone building up a List<Square> for a local variable can’t return that as the value of the method, so you almost always have to be careful to make it List<? extends Shape>. That’s confusing and verbose, and it also tends to ripple through the code, since all the other methods relating to lists of shapes need to be properly wildcarded as well in case someone wants to pass the return value from one method as an argument to the next method. Fixing a situation where an interface was written to return a List<Shape> instead of List<? extends Shape> can quickly balloon into an exercise that requires changes to dozens of files that really shouldn’t be related.

In GScript, we’ve basically relaxed that constraint, such that List<Square> is, in fact, assignable to List<Shape> so you don’t have to use wildcards. It might mean that we don’t statically catch as many problems, but in my experience those array store type problems come up so infrequently and are so easy to fix that it’s just not worth the cost of putting wildcards everywhere. Generics work how you intuitively think they should, you don’t have to think about it too much, and the price of letting a few array-store-type exceptions through every now and then isn’t really very high.

Generics Need Variable Type Inference

Typing generic type names can get pretty old after a while. The most obvious, annoying case is something like:

Map<String, List<String>> myMap = new HashMap<String, List<String>>();

I shouldn’t have to repeat the information on both sides. You can avoid that in Java by just not including the generics on the right hand side (in which case the compiler will warn you) or by calling a helper method, thanks to the way that generic parameters are type inferred. In those cases, you can at least omit them on the right-hand side. But none of that helps you get out of this situation:

Map<String, List<String>> myMap = someMethod();

someMethod has to be declared to return a Map<String, List<String>> but I still have to declare my variable of that type. You better hope your IDE has good refactoring tools if you want to change the return type of that method, or else you’ll be doing a lot of typing or cut and pasting. A lack of type inference is already painful in a strongly-typed language; generics just makes it worse by making the type declaration strings even longer.

GScript deals with this by letting you omit the type declaration for variables that have an assignment statement, so you could do:

var myMap = new HashMap<String, List<String>>();
var myMap = someMethod();

which makes the generics overhead more bearable.

Self-Parameterization Should Be Easier

We haven’t done this one yet, but a common pattern in a lot of our more reflectively-driven code is to parameterize a type on itself. In the builder example above, we parameterize all of our data builders on the type of the builder, so we can have methods like:

B create();

where the method will automatically return the right thing. You could accomplish the same goal by covariantly overriding methods, which is the easier route to go if you only need to do it a few times. In our builder case, where there are methods on supertypes that return the builder back out, self-parameterization is much more convenient than covariantly overriding every possible method on every superclass in the chain. Self-parameterization can also be useful for writing reflective libraries, though it can also be overkill as I’ve demonstrated above.

Unfortunately, self parameterization is a pain; you have to type it in everywhere, leading to declarations like:

CoveragePatternBuilder<B extends CoveragePatternBuilder> extends ProductModelBuilderBase<CoveragePattern, B>

On leaf types, you have to specify the type explicitly in the extends clause, whereas for non-leaf types you have to declare the type variable and then pass it through in the extends clause. It’s doable, it’s just cumbersome.

One thing we’d kicked around is the idea of having a special SELF parameter type declaration that indicates that the type variable should be bound to the value of that type; i.e. it would implicitly be treated like T extends Foo when you used the SELF type variable on class Foo and T would automatically be bound appropriately on subtypes. We haven’t really nailed down exactly how it would work or decided to implement it, but it’s something that’s on the table for the future that might make it much easier to deal with something that, at least in our code, has turned out to be a reasonably common pattern.

Runtime Generic Information Is Helpful

The Java type erasure horse has been beaten to death by this point, so I’m not going to harp on it: it’s a complicated decision, they did it for certain reasons, and it’s not going to change. I’ll merely say that in GScript we can do some amount of un-erasure and reification (though not completely, since Java is still managing the bytecode for those classes and any Java code that creates or uses them); basically we just have first-class types in our system for the different generified versions of a type, and enhancement methods can be added specifically to those types. For example, our base set of enhancements provides sum() and average() methods on Collection<Number>. In Java, there is no such type, so there’s nowhere to hang those sorts of methods. So the types exist statically in our system enough to let you enhance them, but they’re still essentially lost at runtime.


One Comment on “Some Ways To Improve On Java’s Generics”

  1. […] Posted on May 1, 2008 by Carson Gross In Keef’s post pointing out some problems with generics in java, he made a quick mention of enhancements of […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s