Revising the Development Process: Getting More Agile in a Real-World Project

When I started working at Guidewire, back in 2002, the company was all of 15 people, maybe 10,000 lines of code, and one unreleased product.  No one really knew anything about Test Driven Development or unit testing in general and we didn’t really have a continuous integration server, but we did attempt to generally follow the scrum model and have daily sprint meetings, month-long sprints of development work that combined design and implementation (and testing, such as it was before we had any QA folks), and a backlog of work for the release organized in priority order that we’d pull from to plan the next sprint.  For a long time, that’s pretty much how we continued to develop:  at the start of each sprint people would estimate out what they thought was 20 days of work, and at the end of the sprint we’d see what had actually gotten done, discuss what went well and what we should change, and use that to inform the plan and process for the next sprint.

Eventually we set up an auto-build, started to try things like unit testing, and after several fumbling false starts there we managed to make it a core part of our process and culture.  Now, as we’ve mentioned before, we have our own in-house test harness application that manages running 40,000+ automated tests across dozens of branches over a farm of servers.

But somewhere along the line, the scrum process kind of broke down for us, in my opinion.  It happened at different points on different teams, and you can point to a lot of factors as the culprit:

  • Communication breakdowns as the team got larger
  • Increased inaccuracy of estimates as the product(s) got much, much larger and more complicated
  • Increased maintenance costs as we increased the number of customers and releases
  • Increased maintenance costs in the form of test maintenance
  • Poorer estimates, increased complexity, and increased product surface area lead to internal date slippages, putting pressure on everyone to scramble to still meet external date commitments, leading to process breakdowns and increased technical debt

There are probably other factors in there that I’m forgetting, but the upshot was probably a pretty classic software development story:  the methodology that worked well with 10 developers, tens of thousands of lines of code, one or two customers, and hardly any maintenance releases didn’t work so well with 50 developers, half a million lines of code, more customers, more releases to maintain, and 4-year-old crufty tests that often did more harm than good.

So what do you do about that?  Clearly, we needed to change our development methodology somehow, but for a long time we avoided really looking more seriously at anything like XP:  after all, we were already doing lots of testing, refactoring, month-long timeboxed iterations, iteration retrospectives, continuous integration, and maintaining a backlog of work that we pulled from each iteration.

Unfortunately, the agile community generally isn’t too helpful about dealing with real-world situations like ours:  large teams, large codebases, years of stacked development, legacy unit tests, multiple customers to please, hard release commitments to be met, no real on-site customers, etc.  Most of the literature just kind of tells you to change those things:  split the team up, simplify your codebase, make your tests faster and more independent, get customers on-site, avoid hard release commitments, etc.  Reading the agile literature can be frustrating at times as a result, and it can be easy to read through it and say “that won’t work for us” because, well, as strictly written it won’t.  (I’ll expand on those issues in some later posts).

The result, unfortunately, was that we didn’t end up tweaking our process all that much.  We did our best to deal with the test and code maintenance issues, and we attempted to split each product team into smaller “pods” within the team to address some of the management and coordination issues with larger teams.  As we did it at the time, however, I don’t think it was a particularly successful approach.

And then came the 3.0 release of PolicyCenter, where we rewrote a huge percentage of the application from the inside out (i.e. without changing the end-user behavior all that much) in an attempt to address some major architectural issues that lead to an explosion in complexity that had made the product buggy and difficult to work on.  That kind of cleanup, however, is inherently a huge unknown:  you’re not changing functionality, so you can’t really measure progress in terms of end-user changes, the changes were violent enough that going halfway on any of them wasn’t even close to an option, and the changes were also so drastic that most of our existing tests wouldn’t run or compile anymore, meaning that we had to start over from scratch on a lot of our testing efforts (realizing that we didn’t know what to test, incidentally, lead to the creation of the Riki).  We attempted to organize into sprints, but the reality was that we had no idea how long things would take, product managers weren’t able to provide much oversight or exert much control, and it was a whole lot of controlled chaos.  The fact that we made it out basically on time (as of our revised timeline) with a stable, functional product is a testament to the quality of the team, but there’s no way we could continue to work the way we have for the past year.

Meanwhile, one of Guidewire’s other products, BillingCenter, was being run much differently from the other application teams:  they were using a methodology much closer to stricter agile methodologies like XP, with two-week iterations, story cards, point-based estimation, and a focus on getting things “done done” before moving on to the next feature.  That was working much better for them than our scrum process ever had for the other teams (except, perhaps, back when we were tiny and had hardly any code), so naturally the rest of the teams have moved to adopt that model.  Our ClaimCenter team already has, and PolicyCenter will be in a few weeks when we start on our next release.

Of course, it’s never that easy, and we’ve got the disadvantages of a larger team than BillingCenter, a more complicated product, a more configurable product, much more disparity between our customers, a much larger long-term desired feature set, and a lot of resulting date pressure (both internal and external) around particular features.  Unsurprisingly, those are some of the problems that got the PolicyCenter project in trouble in the first place.

Even so, I’m confident a process change will keep us on the right track and will help to alleviate some of the issues that have killed us in the past.  So what, specifically, are we doing differently?

  • Cross-functional pods – Our original mistake with pods was to only really include development in them.  We’ve attempted to re-arrange our seating several times to include PM and QA in with the developers, which has helped, but we’re now going to more formally create sub-teams that officially include PM, QA, development, and (if we can) docs.  We’ll reduce cross-pod communication as much as we can, optimize for high-bandwidth communication within the pods, estimate and assign work at the pod level, and do our best to let the teams have latitude to self-organize and experiment with what works best for them.
  • Focus on “done done” – We fell into the classic development trap of leaving too much bug-fixing until the end, creating uncertainty, stress, and piling up deep-seated architectural issues until far too late in the cycle.  In my view, the lack of doneness is always largely driven by date pressure:  with date-based estimation and long-range release plans, developers always want to hit their estimates, and they’ll (often unconsciously) cut corners and skimp on testing to do it.  Making “doneness” an explicit, shared criteria ought to fix that, though it’ll slow down our perceived rate of progress (but in the long run increase our actual rate of progress).
  • More up-front agreement on features prior to development – PolicyCenter functionality is complicated, hard to get right, and contentious, and it requires much more up-front research, experimentation, and debate than normal features do.  In the past, we’ve started working on features before those issues were worked out, and the aforementioned date pressure would make people feel they had to build something even though there wasn’t necessarily agreement on what to build.  Doing that work in-process, as it were, was often pretty fatal:  the product managers would be rushed and the developers would be frustrated or would just make assumptions.  We’re focusing now on using stories and doing more up-front work to figure out what to build so that when it comes time to plan an iteration we only schedule work that’s already been fully agreed upon by all parties.
  • Shorter iterations and stricter timeboxing – Four weeks just turns out to be too long to really adjust, which means that our timeboxing was never that strict and priorities would be shifted mid-sprint when new issues came up because people just couldn’t wait.  Our lack of up-front agreement pretty much always guaranteed that unexpected issues would crop up as well and ensured that our estimates would be inaccurate and wishful.  Moving to two week iterations with more up-front agreement and more reliable estimates should make it possible to better avoid mid-iteration corrections.
  • Tracking velocity rather than estimates – Estimating work in days seems natural, but it’s just a horrible, horrible mistake.  Doing it meant that we never corrected when our estimates were skewed by maintenance burdens or just chronic optimism about how fast we could work, and combined with a lack of up-front agreement our estimates were usually fairly inaccurate.  The real upshot was that the team couldn’t commit to its estimates, further exacerbating the timeboxing problems, and we never really had a great indication of how fast we were actually going since a lack of “done doneness” threw things off as well.

That, of course, is merely my hope for how things will work out.  We’re starting development of the next release a couple of weeks from now, using that process, and I’m sure we’ll learn plenty and make plenty of tweaks as we go.  I’ll do my best to report back on how it actually works out, what difficulties we find, and what we try to do to overcome them.


Two Sprint Equations

What should be the order of items to do when installing a Sprint process from scratch? In the coaching days, we requested one to one ratio between the coaches and the rest of the developers plus a project manager, go all out for a couple of sprints, give everyone a chance to adjust to the process, before adjusting the process to the team.

For a single person, the strategy would have to be different. You would need to look at all the practices and make trade-offs with eyes on the big picture. The following two equations are what kept popping into my mind as I am installing the process to the two teams.

Value = Scope * (Feature Quality * Code Quality)

One comment to make here is that during the product development, these three factors are sometimes working against each other. A good business analyst or product manage is one that knows how and when to balance them. Only then, one with a strong personality can bring the best value to the product.

Scope is measurable, as the second section will show. Feature quality and code quality however are simply not something can be determined by objective measurement, not purely on it anyway. When pushed on something like a scope, the things that are not measurable get sacrificed, and everyone ends up paying for it sooner or later.

Scope = Velocity * Number of Sprints

Assuming that the quality of the product is controlled, the scope would be the next thing to look out for during a project. This is pretty easy to understand: the more the team can do without sacrificing the quality (both feature quality and code quality), the better.

Velocity is something that can only be affected by tuning the Sprint process of the team, but can never be demanded. What is left for this equation to work would be to adjust either the scope of the project, or the time of the project (number of sprints), and most of the time both. This is probably one of the commonly stated facts, at the same time it is probably also one of the most ignored fact.

Boosting velocity is the same as boosting productivity of the team, which is the job for the team lead. This is the purpose of a lot of XP practices: TDD, paired programming, co-location, shared ownership, continuous integration.


Getting Back to Business Fundamentals

The economic news over the last few weeks has, obviously, been fairly grim, and after an initial shrug by Silicon Valley, over the last week or so it seems that everyone in the valley has started to realize that this really will, in fact, affect them, that VC and angel funding will dry up, that advertising spending will likely decline, and that in general everyone had better actually focus on business fundamentals and actual cash flow again.  The Sequoia Capital presentation was the most well-reported example, but in general it seems like everyone is starting to get it.

Working at a company that was founded at the end of 2001 (and being here since mid-2002 myself), my perspective is naturally a bit different.  Guidewire started out with the intention of building a solid, sustainable business that involved selling valuable, mission-critical software to an underserved market.  In any rational universe, it shouldn’t really require a massive economic calamity to get people running back to trying to actually build software and provide services that are good enough that people are willing to pay actual money for them.  But for the last few decades, at least, Silicon Valley hasn’t really qualified as a rational place. 

In a way, the hype machine/echo chamber around here has, for the last several years, gone back to rewarding/emphasizing very similar tactics to what happened during the first internet bubble, with the exceptions that this time everyone’s goal was to get acquired instead of to IPO (since everyone could plainly see the IPO pipeline for software companies has been non-existent for several years now) and people tended to take less VC funding and be correspondingly less agressive (thanks to some serious advances in commodity hardware and hosting that made it much cheaper to start something).  The focus was essentially still on making a big splash, and getting as many users as possible before really worrying about a business plan or cash flow, with the hope of selling the company off and leaving the dirty businesses of creating a revenue stream and growing/maintaining the product to someone else.

If there’s any silver lining to this whole catastrophe as far as the Valley is concerned, it’s that times like this always force people to go back to fundamentals, which results in stronger companies that develop real value instead of massively-hyped, overvalued startups that are just trying to piggy-back on whatever’s currently trendy (“Look, it’s a web 2.0 social networking mashup microblogging site for people who wear sandals!”).

Really building a business for the long haul is hard, and there are plenty of challenges you encounter as you transition from 10 employees to 50 employees to 500 employees, or from one product to three products, or from one released version to 50, or as you move into your sixth or seventh year of working on the same codebase and have to struggle with all the decisions you made years ago when you had no real idea where things were going.  And of course, actually building something that people need, want, and are willing to pay for is the core challenge that drives everything else.  Grinding through those issues might not be as glamorous as the initial work is, but they’re still hard, difficult problems that are worth solving and that are critical to the survival of the business.  While I can’t compare it to the satisfaction of some sort of buyout (never having been there myself), I can say that seeing something be sustained and grow over that period of time provides an entirely different kind of gratification than just getting the first version out does.

Times like this really separate the wheat from the chaff, as it were, so it’ll be interesting to see who manages to survive and actually generate revenue.  And in a lot of ways it’s a good time to start a company, if you can swing it, since the talent pool is generally better than it is when times are better, and there’s less pressure to do things that damage the long-term health of your business in order to grow more quickly than is prudent.  It’s certainly a good time to be working at a company that builds real products that sell for real money, and I’m proud to be working at a company that has been in it for the long haul since day 1.  Did I mention that we’re hiring?


A More Clearly-Stated Version of My Argument

I was perhaps a little poorly stated and a little (intentionally) combative in my last post, so I figure I should clarify a few points.  Thanks to everyone who commented or otherwise pointed out errors in my reasoning, over-reaching statements, or other ways in which I’m wrong.  I think this is a subject upon which highly-intelligent, reasonable people will disagree, so this is really just my take on the direction things will go in the future (and not necessarily which direction they should go), based on my (sometimes more poorly-informed than I’d like) opinions about things like the ease of mastering particular languages, irregardless of the actual utility of those languages.

My real targets were the arguments I’ve heard time and again lately, the first of which is basically that the coming increase in multicore processors and the basic halt to clockspeed improvements will change programming languages fundamentally, finally pushing high-concurrency languages like Erlang or functional programming in general into the mainstream while leading to a slow abandonment of existing languages that don’t have that level of concurrency support.  The second argument I was targeting is the general subcurrent of “But language X is so much more powerful, so everyone should use it” that always seems to exist in the programming community.

So to make my first argument a bit more explicit, it goes something like this:

  1. Small-scale parallelism (i.e. parallelizing what was previously a single thread of execution across mulitple cores on a local box) and large-scale parallelism (i.e. scaling an application across multiple cores on multiple boxes to handle large user/data volumes) are different problems
     
  2. Small-scale parallelism isn’t really helped too much by just using a functional language.  While in theory a pure functional language can allow for parallelism of certain operations without the programmer having to do anything, in reality sustained usage of multiple cores requires more explicit parallelism on the part of the programmer, where the algorithm is broken down into explicitly independent pieces.  In other words, if you want your HTML template rendering system or your video-encoding program to use multiple cores, you’ll have to design the algorithms with that goal in mind.  Using a functional language might help with the implementation, or it might help with the thought process, but mere use of the language won’t be any kind of a magic bullet, and the same algorithmic approach is generally translatable to an imperative language as well.  The algorithm is the important part if you really want to scale on that level, not the language (though, again, programming in a functional language might help get you thinking the right way).
     
  3. In general, with more software moving off of individual desktops, the need for small-scale parallelism is even more minimized, as per-user/per-request parallelism makes it easy to saturate an N-core box.  If you really care about efficiency at that level, you’ll be more worried about absolute language/framework performance rather than concurrency support.
     
  4. Large-scale parallelism is helped by using functional techniques to parallelize and distribute work, but doesn’t necessarily require a functional language, as either per-request parallelism or explicit work-queue-like models can often allow horizontal scaling up to fairly large volumes regardless of the language.
     
  5. More advanced techniques might be required above a certain performance threshold, but the vast majority of applications never make it there and those techniques can impose a huge tax on development, so it’s more important for most developers to focus on getting the application to work rather than getting it to scale infinitely.
     
  6. Therefore, cloud computing/multicore processors aren’t going to be the “killer app” that switches people over to more concurrency-friendly languages.  General developers will switch to those languages or not based on their merits as a problem-solving language and not due to their concurrency support.  There’s a place for high-concurrency languages, but support for concurrency won’t be enough to propel a language into the mainstream.

So will functional languages go mainstream on their own merits as programming languages?  I honestly don’t think so, though that’s an even-more-contentious argument.  My reasoning there was essentially:

  1. Functional programming techniques don’t come naturally to most people; in my opinion, the human brain is designed for imperative algorithms to solve problems sequentially, since people are really only capable of doing one thing at once.  Some people are probably wired a bit differently and have an easier time with functional programming, but most people naturally think in imperative terms, which will always make functional languages more difficult for most people to grasp.
     
  2. Many functional languages tend to be a bit more academic in nature, and as a result include even-more-obscure language features that can be powerful but further increase the barrier to entry for most developers.  Monads already throw most people for a loop for a while, but the syntax and type system in Haskell further add to the difficulty of understanding it, as does the infix syntax in most Lisp varients.  As a result, in general the most popular functional languages tend to be harder to learn and to master than the most popular imperative languages (at least if you count Python, Ruby, and Javascript as imperative).
     
  3. Being harder to learn and master means that the languages are also restricted to a subset of the current development community.  There are a large number of people out there that can be reasonably competant in Java or Ruby that would flail if they were asked to learn Haskell or Scheme or OCaml.  More “advanced” languages tend to be magnifiers; while they can make really-talented developers more productive, they can make less talented developers far less productive.
     
  4. Network effects, community size, talent pool, and barriers to entry matter for a lot of projects.  Most software companies have a significant turnover or growth rate among their staff, making the ability to recruit people and bring them up to speed important.  The best companies will hire for general ability rather than particular skills, but out-sourcing and contract work generally isn’t done that way, and ramp-up time still matters even in the optimal case.  In addition, most companies can’t restrict themselves to the top 5% or 10% of development talent and can’t afford to limit their talent pool by choosing a language many developers won’t ever be able to master.  As a result, the languages that are hardest to learn will inherently be a bit marginalized, since fewer people will already know them, the prospective talent pool will be smaller, and the ramp-up times will be longer.
     
  5. The scalability of certain languages to large, long-lived codebases with large development teams is suspect due to a small sample size.  Most developers and project leads would rather choose a proven technique that they’ve seen work and that’s been used by hundreds or thousands of other teams, or that at least is similar enough, rather than trying to push the envelope with a radically different approach.  There will always be outliers, but most developers are going to choose a well-trod path that they know can work rather than a less-clear path that might lead to increased developer productivity.
     
  6. Mainstream languages will continue to pull in more functional concepts (like closures), eroding some of the advantages that functional languages might otherwise have, meaning that the functional concepts become more mainstream but the languages that they originated in won’t.

So again, my arguments aren’t around what should happen or which languages are better in any sense, but rather they’re observations about what I think is happening and what will continue to happen in the future.  It also is part of the reasoning that informs the direction we’ve gone with GScript; we’ve tried to emphasize ease of use, readability, speed of development, and suitability for building tools and frameworks, and we’ve avoided adding in features that we feel like will complicate the language but which might allow for better concurrency support, more flexible syntax, avoidance of side-effects, or more extensible syntax.


API Design

API design is hard. You can tell it is hard because there are so many bad API’s out there, often written by pretty smart people. Why is this?

I believe that one reason is that, in order to do APIs right, you often need to layer their complexity. This layering should make simple stuff dead simple for people who only casually use the API, and then provide greater functionality (and complexity) for more advanced use cases.

A lot of developers get uncomfortable with this idea because it means that There is More Than One Way To Do It, a development philosophy that has earned Perl infamy amongst sane developers. Java developers, in particular, seem to dislike redundancy and would rather eliminate it. Unfortunately, the reasonable goal of eliminating redundancy can lead to miserable-to-use APIs.

Reading a File

As a canonical example of the problems caused by this dichotomy, consider reading a file. It’s as common an I/O operation in day-to-day development as you are likely to find.

This is how you might do it in java:

    String str = null;
    BufferedReader in = null;
    try {
        in = new BufferedReader( new FileReader( "C:/tmp/tempfile.txt" ) );
        StringWriter sw = new StringWriter();
        byte[] buf = new byte[1024];
        while (true) {
          int count = in.read(buf);
          if (count < 0) {
            break;
          }
          out.write(buf, 0, count)
        }
        out.flush();      
        str = sw.toString();
    } catch ( IOException e ) {
      // uhhhhhh...
      throw new RuntimeException( e );
    } finally {
      try {
        in.close();
      } catch ( IOException e ) {
        // double uhhhhhhh.....
        throw new RuntimeException( e );
      }
    }

Yeeeeeeeeeeehaw.

Now, why is this so complicated?

It’s due to the fact that the I/O library is written against very abstract notion: streams. This was done for good reasons: streams are relatively high performance and they generalize to any kind of input (e.g. network connections). You can see why someone writing an I/O API might find such an abstraction enticing. There is only one* way to do I/O in java, regardless of what sort of I/O you are doing.

Unfortunately, that way sucks.

A Layered Solution

It is unacceptable to us that reading a file be so complex in GScript. To address this, we’ve created a layered approach to file I/O for our users.

Layer 1: Dead Simple

The simplest notion I can imagine for I/O is reading a file into a String. These are two relatively easy to understand objects that nearly all developers have experience with. To make this as easy as possible, we have introduced an enhancement method to java.io.File:

  var file = new File( "C:/tmp/tempfile.txt" )
  var str = file.read()

The implementation of File#read() is, of course, much like the java code above. However, just by adding this simple method, GScript users doing simple I/O need not worry about exceptions, what try/catch structure is needed or know what streams are. Just read a file into a string. Simplicity itself.

Layer 2: A bit more Complex

You may object that file.read() wastes memory by reading the entire file into a single string. While I think that this objection is often overstated given todays hardware, there are situations where it is valid. To handle these cases, we introduced another method to java.util.File:

  var file = new File( "C:/tmp/tempfile.txt" )
  var str = file.readLines( \ line -> print( line ) )

The readLines() method takes a block that is called with each line of text from the file. It’s somewhat analogous to a SAX-style parser. Note that the user still does not need to know anything about streams to use this API, although they have much more control over the memory footprint of their program.

Layer 3: Whole Hog

Finally, someone might actually need the level of control or performance that streams provide and, of course, are free to use them. But they need not become experts in Java I/O unless absolutely necessary.

A Bit of Redundancy For A Lot of Ease

With this layered approach to File I/O, GScript users do not need to become familiar with the complicated Java I/O libraries to do simple and even moderately complicated things. Yes, there is now some redundancy, but each layer fulfills a particular band of the complexity/performance continuum. GScript programmers are only required to know as much about I/O as is necessary to solve the problem they have with acceptable performance.

Some redundancy, done right, is often the right thing.

* – Actually, this isn’t true anymore, since the NIO library came along for high-performance I/O situations. Let’s not dwell on inconvenient facts that compromise my argument, OK?


Don’t Believe the Hype

I should begin this with a disclaimer: while I normally try to make these posts well thought-out and to stick to subjects I know a lot about, this time around I’ll be getting a little less neutral and talking a little more about things I have limited experience with. I’m not a functional programming guy, and I’ve never programmed extensively in any sort of functional language. I’ve also never actually tried to write anything serious in Erlang or Haskell or Scala, and the only time I wrote in any kind of Lisp was for one class back in college. I am, in other words, probably not qualified to have the kinds of opinions I have. And of course, as someone pushing a new, as-yet-unreleased language with a more traditional imperative programming model and syntax, my opinions are naturally biased.

But well-founded or not, I do have strong opinions, and the seeming constant parade of blog posts over the last year or two about the rise of functional languages and the coming multi-core apocalypse really kind of annoys me. I think they do a lot more harm than good by focusing too much time and energy on problems that seem “cool” to solve but are largely irrelevant for most developers, and I think they’re generally symptomatic of the fact that a lot of engineers like trying out new toys and enjoy thinking about clever, cool solutions to hypothetical problems sometimes more than they like actually solving real-world ones. To that end, here are the developer hype bubbles I’d most like to see burst.

Multiple Cores And Cloud Computing Will Not Fundamentally Change Programming Languages

This seems to be the primary driver behind a few other misconceptions that I’ll get to next, but the bottom line to me is that programming languages themselves are not going to be heavily affected by the increasing number of desktop cores or the increasing move toward cloud computing. There are several reasons for that. First of all, most web applications already parallelize pretty obviously at the request level. Whatever language you’re writing in, you can spawn a thread or a process per user request (or pool them), and continue on your merry way worrying about threading only when you need to access something like a shared cache. That level of concurrency is already taken for granted by everyone, and generally lets web applications scale up to make use of N processors/cores where N is the number of active users at any one time (ignoring blocking IO and memory constraints, etc.). But after that, it’s generally difficult to parallelize: if your request looks like “add a record to the User table, then read the list of current users and render them to the UI” it’s just not easy to figure out how to parallelize. Could you read data while you’re updating? Maybe, if you have to read some bits of data that you know won’t be updated, or if you can later merge in the results of what you updated. Could you render different parts of the template in parallel? Sure, provided that you knew those parts didn’t interact and had some good way to stitch them back together later, and assuming you wanted to buffer the page in-memory instead of writing straight to the response stream in sequence. At most you could, with a lot of effort, find a way to use an extra core or two here or there. But you’re not suddenly going to make your request->action->response loop use N threads for a single user.

Likewise, desktop applications aren’t going to get all that much more parallel. Game developers already have a tough time using more than a few cores; look at how hard fully utilizing the PS3 is for people. Likewise, iTunes and Windows Media probably aren’t going to start using 16 cores any time soon, and neither is Word or Quicken. Could certain tasks occasionally be split up and parallelized? Sure. But the vast majority of the application code will remain relatively sequential.

In other words, the inability of most programs to take advantage of an arbitrary number of cores is inherent to the problem space, not the result of a lack of proper tools or languages. You could have the best language support for concurrency in the world and it wouldn’t make parallelizing page rendering any easier because you’d still need to do the hard work of figuring out which parts were independent.

The primary ways in which the new multi-core processors will be used will be to 1) offload desktop apps into the cloud, where per-user parallelism is an obvious win by allowing one server to handle work for multiple users, and 2) frameworks that allow for easy parallelism for certain parts of applications, the way web-servers make it easy to parallelize requests per user. But those frameworks really won’t depend too much on underlying language support, at least not to the level that will restrict them only to certain languages (certain languages might do it better, of course, but the bar will really be “good enough” not “optimal”).

There are some problems that lend themselves to parallelism, and those will benefit from more cores and from languages specifically suited to that situation. But for the majority of programmers writing the majority of programs, there’s just not going to be any fundamental shift in how programs are written, regardless of what the hardware guys keep saying. There will be a shift in the frameworks and, more importantly, in where the programs are deployed and run, but not so much in how they’re written.

You’re Not Going To Have To Scale That Much

The Ruby guys have been saying this for a while, every time someone complains that Rails isn’t fast enough: most people just don’t have to scale that much, and those people that do can just add a few more processors or servers. Cloud computing and multi-core processors make that even easier, meaning you have to worry less about some performance issues. Google and Facebook and eBay might have to deal with volumes of data and user requests that make your eyes bug out, but again, they’re an incredibly small subset of the programming population. Scaling to those levels requires some serious black magic and some major architectural changes, but working with those constraints makes everything far more difficult, so you naturally don’t want to impose them on yourself if you don’t have to. Most smaller web-applications can get away with a database server or two and horizontal scaling of servers as they add more users; again, per-use parallelism makes that fairly easy. Scaling at that level still isn’t trivial (you have to load test, know how to performance tune, index properly, optimize queries, and do a ton of caching), but it doesn’t require you to fundamentally build everything with scaling up to 100 million users in mind. Step 1: Build something that works. Step 2: Make it fast enough for your users. Step 3: Worry about scaling to 100 million users when it becomes apparent that you might actually have 100 million users instead of just 100.

Erlang Is Not Going Mainstream

So those two general observations bring me to the point that Erlang isn’t going anywhere. It’s rightly getting attention as a great language to write programs that are massively parallelizable and fault-tolerant (in the sense that individual processes can die and be restarted). But otherwise, the syntax is fairly primitive and restrictive compared to any modern programming language, the tools and libraries are as well, and no one is going to start writing their web application in Erlang instead of using Rails or Django or even ASP.NET, unless they’re doing it just to show off that, “Hey, look, I wrote something in Erlang!” The benefits of the language are negligible for most standard applications. If you need to scale up to 100 million users, would the concurrency advantages potentially outweigh the language disadvantages? Sure. If you’re writing a back-end transaction processing system for a bank, or an instant messaging server that’ll need to handle hundreds of thousands of users, should you consider Erlang? Definitely. Will more than about 1% of software projects benefit from using Erlang? Nope.

Functional Programming Is Not Going Mainstream

Which brings me to a related point, which is that functional programming in general is not going mainstream either. Will languages like Ruby and Python that have some functional ideas in them get more popular? Probably. Will Java potentially add some better functional support? Hopefully. Will “mainstream” developers start writing in Lisp or Haskell or Erlang any other the other functional languages out there? Not a chance. Why not? My theory is that most people just don’t think that way. The real world is imperative and largely sequential; people can really only do one thing at a time, and then do them in order. So if you’re going to make a pb&j sandwich you think in terms of imperative steps: take out ingredients, apply peanut butter to bottom slice, add jelly on top of peanut butter, put on top slice, slice sandwich, eat sandwich. In other words, you’d program it as:

var ingredients = fridge.removeIngredients()
var sandwich = new Sandwich()
applyPeanutButter(sandwich, ingredients)
applyJelly(sandwich, ingredients)
addTopSlice(sandwich, ingredients)
sliceSandwich(sandwich)
eatSandwich(sandwich)

You don’t think of it as:

eatSandwich(sliceSandwich(addTopSlice(applyJelly
                    (applyPeanutButter(removeIngredients)))))

It doesn’t matter whether you think the first version is too verbose, or that side effects are evil, or that the second version parallelizes more easily. People just don’t think that way. They can learn to think that way, sure, but it’s my firm belief that the first version is just how most people’s brains are wired. So people will naturally always be put off by functional programming, regardless of whether or not it’s technically superior. Furthermore, languages like Haskell (and even Scala) have so many complicated concepts baked into the language that most people will just fall back to something easier to use and learn. Sure, monads might be great and powerful and necessary, but anything that takes dozens of wiki pages to explain and multiple tries for most experienced developers is imposing a pretty high barrier to entry. If you’re programming a project just for yourself that you’ll work on for 10 years, and if libraries or platform support weren’t an issue, Lisp (or maybe Haskell) would probably be the right language to write it in. The expressive power and flexibility would let you get more done over time. But if you have to have other people understand your code? Or you need to bring new people onto your team? Wrong answer. The barrier to entry is just too high. It doesn’t matter if something is the best tool if it’s too hard to learn to use, unless you’re the only one that will ever need to use it. And because of that, the libraries and community support just won’t really ever come, which provides a further barrier to entry.

To me, it’s really just a fundamental problem: (most) people naturally think imperatively and will always have an easier time learning imperative languages and reading imperative code, and thus imperative programming is always going to be far and away dominant, regardless of whether or not it’s truly superior in a technical sense.