Thoughts on Performance Tuning, Part 1: Making It Faster Versus Not Doing It At All

Performance tuning, as I’ll define it, is the art of making a program run faster. (Note that pure performance tuning is different from scalability as a general practice, which could also involve things like parallelization to take advantage of more hardware). It’s an area that I’ve worked in a fair amount over the years, and it’s a large enough topic that it can’t really be covered in a single blog post, so I’ve decided to do a semi-regular series of posts about various aspects of it. I should perhaps add the caveat that I can’t clam to be a serious expert on all aspects of performance tuning, so by no means will this serious be an exhaustive catalog of all the ways to go about it; there’s simply a huge amount of depth to the subject. Hopefully, however, it will serve as a useful introduction and give some insight into how at least one person thinks about the subject.

At the end of the day, performance tuning comes down to one thing: accomplishing the same end goal (as far as the client is concerned) while using fewer resources: fewer CPU instructions, less memory, fewer disk accesses. But it’s often helpful to break that problem down further into two related sides of one coin. For any given part of the program that looks problematic, you have two high-level options: you can make that part faster, or you can avoid doing it at all. For example, if your profiler tells you that a lot of your application’s time is spent in a function called writePageHeader(), you fundamentally have two options: you can try to make writePageHeader() faster, by optimizing the method or its downstream components, or you can try to avoid calling it, perhaps by caching the output when possible. Even though technically they amount to the same overall thing, i.e. doing less stuff, it’s helpful to think about the problem from those two different directions. Where you decide to target will determine how you look at the problem: making writePageHeader() faster might mean eliminating calls to some downstream writePageElement() function, so depending on where you’re looking you might either be “making something faster” (the writePageHeader() funtion) or “not doing something” (the writePageElement() function).

There are a lot of well-known techniques to use in both camps. When it comes to making something faster, you tend to look at familiar things like algorithmic complexity, for example using hashing to turn an O(N^2) algorithim into an O(N log N) algorithm. There are also other optimizations that make a big real-world difference but don’t show up in big-O analysis; reducing constant factors is often a critical part of optimization, for example by storing the results of a function in a local variable rather than calling it three different times inline, or by conditionally skipping some parts of the computation. And there are other, even stranger optimizations you can make, which I’ll likely talk about in a later post, which often depend on the interaction between your program, the compiler, or the hardware you’re running on: for example, re-arranging functions so the compiler or JIT can inline them, or so that memory accesses run along cache lines instead of across them. There’s a nearly infinite number of techniques to learn, and many of them are platform-dependent.

There’s a similarly large array of techniques when it comes to attempting to avoid doing something, though often times the answer comes down to caching something, either trivially (such as in a local variable as mentioned above) or in a more complex manner (for example by caching page fragments so you don’t have to regenerate them). Lazy computation can also help sometimes, if certain uses of the application or code paths will ultimately avoid triggering the computation at all, or will delay it usefully (perhaps amortizing the cost and reducing latency). Restructuring upstream code in certain ways can also be used to eliminate calls (that two sides of the same coin thing again). Occasionally, you might have to take more drastic measures by simply eliminating some kinds of functionality; for example, if your application colors text red when a certain condition is met, and evaluating that condition is taking up half your page rendering time, you might decide that the functionality simply isn’t worth the cost, or you might show the information in a tooltip when the user hovers over the area, delaying computation until that point and making the initial page render faster.

This isn’t intended to be anything close to an exhaustive list of techniques; the more of them that you know, though, the more options you’ll have when faced with a problem.

Ultimately, being successful when doing performance tuning requires the ability to be flexible in how you approach the problem, and to look at it from many different perspectives. Perhaps your profiler shows that writePageHeader() is a hotspot, but you can’t seem to figure out how to change that method and make it any faster. So instead, you start trying to figure out ways to just not call the function at all. If you’re unsuccessful there, you can try moving down the stack, and find the core methods writePageHeader() calls and squeeze some extra performance out of them instead. If that’s not good enough, you move way up the stack, back to your overall renderResponse() method, to see if there’s anything at the higher level, six calls up the chain from writePageHeader(), that can be changed to do some really high level page fragment caching. If that doesn’t work out, perhaps you decide that writePageHeader() isn’t going to get any faster right now, so you move on to a different hot spot to see if you can make progress there. Performance tuning really requires understanding multiple different perspectives of the same thing, even though you can only view the world through one lens at a time. In some ways, your job is akin to what the cubist painters were trying to do: the mental juxtaposition of a large number of normally-exclusive views on things to help reveal some larger truth about the relations between pieces. The key, then, is to have as many different perspectives as you can, to have tools at your disposal to help you get those perspectives, to know what techniques are useful for each perspective, and then to be able to shift perspectives freely whenever you get blocked.