One Language to Rule Them All: The Importance of Scaling Up and Scaling Down

As anyone that’s programmed in Java can attest, Java does a horrible job of scaling down to small projects. Part of that is simply due to the Java syntax and standard libraries: there’s no way to write a little code to solve a little problem. If you want to write a program that does something simple like looping through every line in a file and counting the number of times a particular word appears, good luck doing it in only a few lines of code. But even if those problems get fixed, Java has an additional problem, which is simply that it’s a compiled, statically-typed language, which necessitates a toolchain that’s simply unnecessary for dynamic languages that can execute as scripts. You can’t just write some .java files and then say “java” and watch it execute: you have to compile them down to Java classes, which means making decisions about where to store the .class files and how to make sure they get loaded in to your classpath at runtime. And of course, to do anything interesting in Java you’ll probably need to download some third-party libraries and figure out where to put them and how to get them into your classpath, and the Java syntactic overhead is really only manageable with an IDE . . . so basically, doing just about anything in Java requires installing an IDE, deciding on your project layout, creating a project for your IDE with the right path settings, maybe writing some build scripts to do the compilation, and probably writing some other scripts to actually run your program with the right classpath arguments; the overhead of doing that is only worth it if you’re writing at least a few thousand lines of code. Once you have the environment and build scripts set up the environmental issues are less of a pain, but they represent a fairly large barrier to entry for small-scale projects where the development time is measured in hours rather than months or years. The same set of problems holds, to a greater or lesser extent, for pretty much any other compiled language like C or C++: the extra overhead of creating build scripts, figuring out where to store compilation artifacts, and figuring out how to link in additional libraries is a serious pain (though at least on Linux you have things like a default path for libraries that make compiling a “hello world” program in C without a makefile a little less painful).

Dynamic languages, on the other hand, scale down very well to those small-scale projects: just write your hello-world.rb file and run “ruby hello-world.rb” and you’re done. How long did that take? Ten seconds? And since most mature dynamic languages have some sort of built-in packaging scheme like Ruby gems or Python eggs, for simple scripts you can easily pull in any libraries you’ve already got on your system, and you can easily pull in new ones without worrying about build scripts and classpath shenanigans.

The ability to scale down is one reason why we think it’s important that Gosu has first-class programs, which can have an embedded classpath directive in them, so that you can use Gosu for simple one-off scripts and not just for larger projects. It’s also why we ship a usable out of the box editor that can do things like code completion, error highlighting, and other basic stuff so that you write simple scripts and explore the language without needing additional tools (we’re working on making it better). One thing we don’t have an answer for yet is around the default packaging, installation, and usage of Gosu libraries: it’s definitely something to handle better in the future.

That’s the scaling down story. But what does it mean to scale up? Scaling up means handling projects with more developers, more lines of code, more features, more versions, and more customers. This is where things get a little more contentious: fans of static typing, like myself, tend to argue that statically typed languages scale up to larger projects better than dynamically typed languages. Why? First of all, static typing helps when coordinating across large numbers of people (both concurrently and through time) by providing more up-front error checking and discoverability: if you want to add another argument to the foo() method, static typing makes it easier for tools to identify all usages of that method, and if you change it in an incompatible way, there’s a good chance the program won’t compile. It’s not foolproof, and it’s not nearly as useful to someone who understands the entire code base, but it’s a very valuable tool when the code base is large enough that people often have to make changes to code that will in turn affect code they’re not even aware of. It also makes it easier to discover how someone else’s code works, thanks to things like auto-completion, reference-traversals, usage finding, and other such tools that are enabled by static types. Secondly, static typing makes it easier to refactor and clean up large code bases, since it makes it possible for tools like IDEs to make those changes safely and automatically. If I want to rename the MyClass.getName() method to getShortName(), my IDE can do that while only changing references to MyClass.getName() and not references to OtherClass.getName(); if there are only 5k lines of code and there’s only one method named getName() that doesn’t matter so much, but if there are 500k lines of code it’s more likely that there are several methods with that same name, and static typing makes it possible to differentiate between their uses statically. Lastly, static typing makes it easier to create hard APIs: using a construct like a Java interface makes it clear what the contract is that a particular class will adhere to, which makes it easier to understand what sorts of changes will end up breaking the API and thus affect caller code and which changes can be safely made without changes to callers. Managing large, long-lived projects is highly dependent on your ability to properly modularize code into components that have well-defined APIs to other parts of the system. Defining those APIs tightly, and telling when they’re changing, is made much easier by static typing.

Now, none of those statements I made before are contention-free. The most common argument I hear is that dynamic languages are so much more efficient that you simply don’t have large projects with large numbers of people. Your 500k LOC Java project might turn into a 50k LOC Ruby project, and your team of 50 might instead become a team of 10, so many of the arguments about dealing with large code bases and large teams don’t apply. There’s some truth to that argument, and it’s definitely worth pointing out, but acting like it’s a discussion-ending trump care tends to display a certain amount of ignorance/arrogance around A) the scale of other people’s systems and B) the efficiency of code in a dynamic language versus what the best engineers can do even in a syntactically-crippled language like Java. So while it’s true that syntactically-powerful languages can help minimize the problem, and thus keep more projects below the threshold at which the project becomes so big that it can’t be done by a small team, that doesn’t mean that there simply aren’t any projects that are fundamentally big and complicated and require a lot of work and a lot of people. I also don’t personally lend much credence to arguments that refactoring can be done well in a dynamic language (I’ll believe it when I see it, and SmallTalk isn’t a counter-example: show me refactoring tools for Python or Ruby or Javascript working anywhere near as well as they work for Java and I’ll believe it).

That isn’t to say that there aren’t tradeoffs, or that static typing is always a win: you could reasonably argue that you think the benefits of static typing in terms of tools and API clarity aren’t worth the other associated costs relative to your preferred dynamically-typed language. My point isn’t to convince anyone that static typing is better than dynamic typing, but rather just to argue that static typing has certain benefits that are valuable when working on large projects.

The ideal language to me, then, is one that is able to both scale up and scale down. I want something that I can use to write one-off scripts to do simple things, that’s still fine for a 50k LOC program, and that still excels when I have 5 MM lines of code. I have a pretty good memory, but my mental capacity is still limited, so I don’t want to have to learn a bunch of different languages that are each well-suited to a different task: I don’t want to constantly be trying to juggling a bunch of different syntax rules/execution ordering rules/libraries/toolsets in my head, I just want one set of tools I can set up and use for whatever I need to work on. To me, at least, that means something that has first-class scripts, reasonable default libraries, a concise syntax, static typing, and excellent tooling with support for automated refactoring. It would help if the performance is good enough to never be an issue, and if the language itself is relatively easy to learn (which is yet another highly-contentious metric). Right now, though, I don’t see such a language out there; it’s certainly the niche we’re hoping to aim Gosu at, though, so perhaps a few years from now we’ll be able to credibly say that we think it’s a good candidate for a language that can scale both up and down.