Clojure performance tuning

2009-07-20

Having done some actual coding in Clojure I can post notes that will hopefully help others. I have code that gets executed a lot (it’s a stack-based language interpreter) and needed to bring it up to reasonable performance. Here are some notes from the process:

You really should know the difference between a seq and a list. If you use any of the seq functions (such as drop or take), your list will no longer have an O(1) count operation. Instead, it will become a LazySeq and count will become O(n). If count is something you call frequently (I do), you will want to avoid this. In my case, my stacks were implemented as lists and count gets called very frequently, so this was a serious and surprising problem. Surprising, because most tutorials skim over the difference, only emphasizing how general seqs are. So if your count performance takes a dive, see if you can replace (drop 2 mylist) with (pop (pop mylist)). The latter will keep the PersistentList structure.
I implemented my stacks as both lists and vectors. There was almost no performance difference between the two (lists were actually slightly faster). I found this to be a surprising result, I expected vectors to be faster. I still think vectors might have lower memory requirements, but I don’t know how to check.
For vector access, (v n) is faster than (nth v n) which is faster than (get v n). This is not something I expected. I think this will get ironed out in the future, as Clojure matures.
Attempts to replace Clojure vectors containing integers with primitive Java int arrays produced no performance gains. In fact, they managed to hurt performance because of some necessary conversions.
As expected, there are very few places where type declarations improve performance. But if you have loops that get executed a lot, you might want to check. Always measure, as the results might be unexpected, and remember that less is more.
When measuring anything running on top of the JVM, let it run for a while before you draw any conclusions. The Hotspot JIT compiler does really cool things with the code, but it does them after a while. In my case, I run code for at least 10-20 seconds and watch the results. I take them into account only after they stabilize. It is common to see a 2x improvement between the first run and the last one.
Accessing vars costs cycles. Clojure has no constants, so many people use *global-parameters*. This is not a good idea in performance-sensitive code. There are two ways around it:
1. Define macros that expand to constants:
  (defmacro global-parameter [] *global-parameter*)
  and use (global-parameter) in your code.
2. Enclose your function definitions in let forms, rebinding global constants to lexical variables:
  (let [global-parameter *global-parameter*] (defn my-function [] ...))

As for profiling tools, the old and tried method of actually measuring wall time worked best. I tried the YourKit profiler, but wasn’t impressed, at at $499 it is way overpriced. If they come out with a Clojure edition (drop some features) at $79, I’ll consider. I also tried JVisualVM, but it turned out that it is buggy on a Mac and the profiler doesn’t work. I hope this will be fixed in the future.

Comments

Another gotcha with vectors:
(v (count v)) is O(1)
(last v) is O(n)

Sam Merritt2009-07-20

Nice post. For profiling, I've used the NetBeans profiler in the past, which is good, once you've taken a couple of hours to find your way around it. There's an old open source project called Japex that can be used to run microbenchmarks, and it will take care of the warm-up time for the optimizing JIT for you. Also, there are usually flags you can pass the VM to have it print inlining, method compilation, garbage collection and other detailed information if you want a low-level view. See for example the "debugging" section of the "Java HotSpot VM Options" page (HotSpot VM Options Page).

Keep us informed on your progress, this in interesting stuff.

Patrick

Patrick2009-07-20

Sam: interesting -- I didn't know that, thanks!

Patrick: I tried the -Xrunhprof flag, but couldn't find anything relevant in the results produced. I guess I should read up on this and try it again.

Jan Rychter2009-07-21

This is helpful. Thanks Jan.

Garth Sheldon-Coulson2009-08-02