Sorting A List


Note:I’ve rewritten this post based on feedback from Neal and Ted in the comments. I’ve left the inflammatory comments in (this is a blog, after all), but tried to do a more apples-to-apples comparison between GScript and java. The original article can be viewed here.


My first post showing some GScript on this blog compared sorting in Java with sorting in GScript. Today I spent a lot of time going over the psychosis of Comparators in java, and it got me thinking again about how differently Java and GScript approach the problem of sorting lists. I’ll split the discussion into two parts: the code developers must write to sort a list and the actual signatures of the sorting methods used.

The “Client Side”

Comparators, as everyone knows, are java’s way of defining orderings for collections of objects. In Java 1.5 and greater, they are parameterized on T: if you want to sort a List<T>, you pass in a Comparator<T>. Sort of. We’ll get to that. First, let’s look at a simple (!!!) example of sorting in java:

   List>Employees> someEmployees = getEmployees();
   Collections.sort( someEmployees, new Comparator<Employee>(){
    public int compare( Employee e1, Employee e2 ) {
      return e1.getSalary().compareTo( e2.getSalary() );
    }
  });
  return someEmployees

What that code does is sort a collection of employees by Salary. You are forgiven if you can’t tease that fact out of that mess.

As I wrote in the previous post, the GScript to accomplish the same task is:

  return Employees.sortBy( \ e -> e.Salary )

GScript uses closures to boil the operation down to a single line of code and type inference to minimize the syntax of that line. What do we want to do? Sort the list by something. What is that something? The employees Salary. We know that Employees is a list of Employees, so why should we have to specify the type of the “e” parameter to the closure? A GScripter can let the compiler take care of a lot of the details, and still gets nice code completion since everything is statically typed.

The “Server Side”

Now let’s take a look at the other side of the fence: the signatures of the methods used for sorting.

The signature of Collections#sort(list, c) is:

static <T> void sort(List<T> list, Comparator<? super T> c)

What the hell does that mean? Basically, it means you can pass in any comparator that is parameterized on T or a supertype of T. This is because the comparator is going to be invoked with objects of type T, so only comparators that take that type or *higher* in the inheritance chain will work. In particular, Comparator<? extends T> will *not* work, because the comparator might be expecting a subtype of T. This is a case of contra-variance, which is fairly rare in type-systems. We are used to the opposite, variance, where a subtype of T is acceptable in place of T. Some programming languages, notably Scala, allow you to annotate type variables with the particular variance they allow, usually using a ‘+’ or ‘-‘ sign.

(As a side note, as I’ve said elsewhere, any non-trivial application of generics requires me to stop and put on my generics hat, and, ten minutes after I’ve understood and solved the task at hand, *poof*, it’s gone.)

Interestingly, since java lacks any way to specify variance, internally Sun engineers have to cast the Comparators to the generic type to do comparisons. I’m not lying. Check out the implementation. The implementation of TreeSet is even better, with the developer putting in little /*-*/ comments in the places to indicate variance more correctly when he has to cast to the generic type.

So here we have a lot of syntax flying around to make sure sorting lists is absolutely, positively, 100% type safe. And, despite all that, the sun engineers still have to crosscast to the generic type to make the damned thing work. All to prevent what has to be one of the rarest programming errors I can imagine: accidentally sorting a List with a bad Comparator. I’ve never seen that happen. I’ve never heard of that happening. I’ve never even heard it mentioned that it might happen. And yet all that complexity has been thrown at this problem in java.

The signature of sortBy() is:

  function sortBy( value(T):Comparable ) : List

and it is injected onto List via GScript Enhancements.

The parameter is declared to be a block that takes an argument of type T and returns a Comparable object, which will be used to order the list. (We eventually delegate to Collections.sort() internally so that the sort occurs in java, rather than GScript. Hey, we still want it to be fast, right?)

Note that Comparable is left generic because there is almost zero chance someone will screw a call to sortBy() up from a typing perspective, and if they did, the incomprehensible generics error message would be far harder to understand than then inevitable runtime exception that would occur. It just isn’t worth the complexity to specify it.

Yeah, so?

Well, to me, GScript throws complexity at the right part of the problem. It takes a common operation on Lists, sorting by some attribute of their component elements, and boils it down to its essence using closures. Closures, like generics, are somewhat complicated and can take a bit to get your head around. But once you grock closures in GScript you end up having to know less, not more, about the complexities of sorting. This is in stark contrast with the generics in Collections.sort(). That application of complexity doesn’t help developers much, if at all. The only time you will ever notice it is when it gets in your way.

The overriding theme of GScript, formally spelled out by Scott at the start of our Bedrock release: “GScript features are about making developer’s lives easier.” I’m convinced that if static language designers make pragmatic concessions to real-world developer needs and the acknowledge the practical limitations of static typing, the current excitement around dynamic languages would temper dramatically.


17 Comments on “Sorting A List”

  1. Neal Gafter says:

    It seems as if you’re comparing the declaration site in Java today (which includes the wildcard) with the use site in GScript. Isn’t that apples to oranges?

  2. Ted Young says:

    Isn’t this post rather disingenuous? 99.99% of Java developers will never need to write nor look at the implementation for Sort. I’ve been programming in Java for 12 years and I’ve never looked at the implementation (or at least not in any detail).

    And yes, Java has a lot of syntax stuff around inner classes, but if you’ve been reading and writing Java for any length of time, you get better at reading past the syntax to the meat of the method. Note that I’m not saying it’s good, just that it’s not all _that_ obscure.

    What does obscure what the sorting is keying off of is the use of compareTo(), because presumably you’re using BigDecimal instead of a “real” money class that would have built-in greaterThan() or lessThan() methods.

    Finally, I would say that it’s more likely for the inner class to have been refactored into a static inner class on Employee, so that your sort would look like:

    Collections.sort( someEmployees, Employee.SALARY_ASCENDING_COMPARATOR).

    Even better would be for whatever is the container for the Employee collection does the sorting for you, e.g.:

    getEmployeesSortedBy(Employee.SALARY_ASCENDING_COMPARATOR)

    or, even:

    getEmployeesSortedBy(Employee.SALARY)

    So, closures/blocks in GScript are awesome, but Java is still the primary language and we shouldn’t make it look worse than it really is just to make GScript look better.

  3. Raoul Duke says:

    @”Note that I’m not saying it’s good, just that it’s not all _that_ obscure.”

    please let us not be apologists. i’ve been doing Java for long enough and it still drives me nuts to have to deal with that kind of junk.

    problems like that interact with other problems like that in a sort of exponential way, i think.

    (not that there’s any perfect language out there.)

  4. Doug says:

    I’m sure Alan will have more to say, but :

    To Neal:
    Compare the GScript salary sort to the java salary sort. The complexity of the underlying implementation of the java sort in terms of wildcard generics is a separate issue, illustrating what will happen to you if you wind up being the person who has to implement a method like sort for generics. Both sides are actually easier in GScript, just be sure to compare the apples to apples and the oranges and oranges. Perhaps Alan should show what the java generic implementation of sort looks like in detail, and implement generic sort in gscript for comparison.

    To Ted: breaking up the hideous pieces of the java implementation certainly makes java look better, but consider what that implies: you need to break up this really trivial method to hide the horrible complexity involved in the java side, whereas the gscript version is so straightforward there is no need to break it up.

  5. Doug says:

    Oops Alan -> Carson

  6. Carson Gross says:

    Neal,

    Yeah, you are right. It was a bit of unintentional slight-of-hand on my part, and wasn’t really fair. Given my opening paragraph, I should have stuck with why I thought the declaration site syntax wasn’t worth the cost of the bugs it intends to prevent.

    The core idea, which Doug summarized better than I did, was to contrast the application of complexity in language design to a common problem: sorting a list. If I were to rewrite the post, I would start off with that rather than a rant about how I’m not smart enough to understand generics. That just happened to be on my mind after the day I had had.

    Ted-

    Yeah, I’m obviously painting GScript in the most favorable light possible here, but I don’t think you are giving the flexibility and expressiveness of blocks enough credit. If you are in an environment where code is changing frequently changing and often being thrown away, not having the overhead of creating and maintaining a bunch of code artifacts is a big win.

    I agree with both of you that the original post was a bit unfair. I’ll try to be more balanced in future posts.

    Cheers,
    Carson

  7. Alan Keefer says:

    As mentioned in a previous post (https://guidewiredevelopment.wordpress.com/2008/05/01/some-ways-to-improve-on-javas-generics/), we basically ignore wildcards in GScript, for what it’s worth, which loses some type safety in return for simplifying the language. For comparison, the declaration site in GScript is:

    function sortBy( value(T):Comparable ) : List

    As Carson has pointed out, it isn’t entirely a fair comparison (and our sortBy method eventually chains to CollectionUtils.sort() anyway).

    As far as the caller is concerned, the primary difference with regards to generics is around the type inference that GScript does. In my experience, type inference does a lot to make generics more user-friendly for users of generified classes and methods.

  8. Carson Gross says:

    All,

    I’ve rewritten the post based on the feedback above, and included a link to the original content.

    Thanks all for keeping me honest.

    Cheers,
    Carson

  9. Carson Gross says:

    problems like that interact with other problems like that in a sort of exponential way, i think.

    That’s absolutely right, and that fact works in the both directions. Good type inference + closures bring you back down the exponential complexity curve dramatically because the curve is so steep out there. But you have to have both, or you end up going further up the curve. That’s what I see when I look at the current java closure proposals.

    Cheers,
    Carson

  10. Ted Young says:

    With respect to not giving blocks enough credit, my point about breaking up the Java code (what in English I’d call a run-on sentence) was more about appropriate separation of concerns and making Java look worse than it could.

    As the author of the Company class (the one that has the getEmployees() method), I might write the comparator (or whatever) once and then consumers get it for free. If it wasn’t already there, as a consumer I might write the comparator and keep it to “myself”, or put it in the Company class, or perhaps somewhere else that makes more sense depending on usage.

    Certainly blocks (along with more type inferencing) would be awesome to have in Java to get rid of the overly wordy inner classes and far be it for me to argue against that, but as I mentioned, we don’t have to write bad Java code to make GScript look good in comparison.

  11. Carson Gross says:

    Ted,

    I hear what you are saying and I think this is a point about which we are going to have to agree to disagree. In my opinion, there are no concerns here to separate. It’s one concern: I just want to sort the damned list.

    My take is that Java forces you to over-engineer the problem. The small example above is repeated many times over in most of the frameworks and libraries available for java. One of the reasons why dynamic languages are gaining inroads is because the lack of type-safety *forces* library developers to create succinct, expressive APIs. If they are too verbose and baroque, no one can use them. So a weakness in one area leads to a strength in another.

    Cheers,
    Carson

  12. Patrick says:

    I think it’s important to note that with the java sort, most of the code is written by the (free) IDE (and many taps of CTRL-SPACE). The gscript code IDE is not free and if I need help – there is this blog and maybe 2 others on the internet.. compared to the gazzilion java threads.

    • Alan Keefer says:

      @Patrick, while the language-formerly-known-as-GScript (now known as Gosu) is not yet officially open sourced, when it is released it will include a fairly full-featured Eclipse plugin, so there will indeed be a free IDE for it. As to the documentation issue . . . well, we’ll do our best to provide good documentation along with the open source release, but certainly that’s one of the downsides of any language that isn’t yet well-established.

  13. Carson Gross says:

    Patrick,

    That’s a fair point. We are trying to address that by making Gosu open source, which will we hope will generate a larger community, and free IDE support is a part of that effort.

    There is no debating the fact that the shortcomings of Java has caused an explosion of great tools around the language, which actually makes adopting a new language much more painful. Funny thing.

    Cheers,
    Carson

  14. Patrick says:

    I very much look forward to a “good” Eclipse plugin. GScript has some nice features but really… if the tooling is poor, then take up will be slow.

    I run Guidewire Studio on a 4 CPU, 8GB memory desktop (ie. its way hefty) and studio is incredibly unresponsive. Eclipse is running on the same machine and is ultra-snappy.

  15. Carson Gross says:

    Patrick,

    We definitely hope to simplify and improve the performance of our tooling when we release the Eclipse plugins. We recognize the importance of tooling for a programming language (see https://guidewiredevelopment.wordpress.com/2010/08/05/the-effect-of-language-design-on-tooling/) and we’ve designed Gosu specifically to support good tooling, so that’s the goal, even if it has not been perfectly achieved at this point.

    Can you get in touch with us via your local guidewire services people? While I can’t guarantee that we can solve the performance issues you are seeing, at least getting a ticket opened for them should give you some clarity around what the performance issues are.

    Cheers,
    Carson

  16. Patrick says:

    HI Carson,

    I have actually checked with our team and I believe there is a portal incident already open for this very issue (I believe patch 7 of Guidewire Studio 5.0.5 was meant to address this very issue).


Leave a reply to Neal Gafter Cancel reply