The hazards of microbenchmarks

I recently had lunch with my team and had the dubious fortune of sitting across the table from Mr Knowitall. As we recently installed SONAR, he mentioned his surprise at our "critical performance" issues, namely that we're using a + b to concatenate strings instead of StringBuilder in a few dozen places. I dismissed this as "not really a critical problem", which spawned a heated discussion about the merits of using StringBuilder over normal string concatenation in Java.

He ended up issuing a direct challenge that Stringbuilder is 25% faster than string concatenation. He also stated that I could build a test harness and see for myself and bet that if he was right I would buy lunch...

Aside from kicking myself for getting into a ridiculously pointless argument that I've had am million times before (he also tried to drag me into the "you better check .isdebugenabled() before calling .debug() in log4j because otherwise you're taking a performance hit, but I resisted this one).

I tried to get a word in edgewise about how I had this discussion with a guy from Sun (Brian Goetz) who REALLY knows this crap a LOT better than me he basically said it was a fool's errand to try and rely on microbenchmarks for real performance indicators as the JIT compiler, garbage collector, and a million other things will make your benchmark crap. There was no reconciling Mr. Knowitall though, he was absolutely certain he was right and determined to make sure I was beat into submission about my wrongness.

So after dropping my daugther off at cross country, I created this class:

package javaapplication1;

import java.util.Random;

public class Main {

public String getValue() {
return "AAAAABBBBBcccccdddddEEEEEfffff";
}
Random r = new Random(6);
private String getRandom() {
String a = String.valueOf(r.nextInt() * 100900);
return a.substring(0,4);
}

public static void main(String[] args) {
Main one = new Main();
Main two = new Main();
Main three = new Main();
Main four = new Main();
Main five = new Main();
Main six = new Main();
int count = 10;
for (int i = 1; i < 1000; i++) {
two.stringBuilder(count);
two.finish();
one.stringaddition(count);
one.finish();
three.stringBuilderMethod(count);
three.finish();
four.stringadditionMethod(count);
four.finish();
six.stringadditionMethodRandom(count);
six.finish();
five.stringBuilderMethodRandom(count);
five.finish();

}
System.out.println(one.time());
System.out.println(two.time());
System.out.println(three.time());
System.out.println(four.time());
System.out.println(five.time());
System.out.println(six.time());


}
public String time() {
return name + ": " + duration;
}
private String name;
private long start;
private void start(String name) {
this.name = name;
this.start = System.currentTimeMillis();
}
public long duration = 0;
private void finish() {
duration += (System.currentTimeMillis() - start);
}
public String stringBuilder(int count) {
start("stringBuilder");
StringBuilder sb = new StringBuilder();
for (long i =0; i < count; i++) {
sb.append("AA");
}
return sb.toString();
}

public String stringaddition(int count) {
start("stringaddition");
String output = "";
for (long i =0; i < count; i++) {
output += "AA";
}
return output;
}

public String stringadditionMethod(int count) {
start("stringadditionMethod");
String output = "";
for (long i =0; i < count; i++) {
output += getValue();
}
return output;
}

public String stringBuilderMethod(int count) {
start("stringBuilderMethod");
StringBuilder sb = new StringBuilder();
for (long i =0; i < count; i++) {
sb.append(getValue());
}
return sb.toString();

}

public String stringadditionMethodRandom(int count) {
start("stringadditionMethodRandom");

String output = "";
for (long i =0; i < count; i++) {
output += getRandom();
}
return output;
}

public String stringBuilderMethodRandom(int count) {
start("stringBuilderMethodRandom");
StringBuilder sb = new StringBuilder();
for (long i =0; i < count; i++) {
sb.append(getRandom());
}
return sb.toString();

}

}

I put the random stuff in there just for fun to see variations with the overhead of "normal" stuff one might do.

Here are the results:
stringaddition: 6
stringBuilder: 6
stringBuilderMethod: 5
stringadditionMethod: 22
stringBuilderMethodRandom: 15
stringadditionMethodRandom: 11
BUILD SUCCESSFUL (total time: 0 seconds)

Woa! first off, in the trivial case, there was no real difference... In the method call, string concatenation was 4x slower and in the "do some work" version, stringbuilder was SLOWER.

Case closed, I win!!! Of course, if I cared to investigate further I would discover something... like, when I ran it a second time, I got these results...

run:
stringaddition: 6
stringBuilder: 1
stringBuilderMethod: 10
stringadditionMethod: 24
stringBuilderMethodRandom: 7
stringadditionMethodRandom: 12
BUILD SUCCESSFUL (total time: 0 seconds)

Huh... Now it appears that StringBuilder is a CLEAR winner, 6x faster in the best case, a still a good 20% in the worst. The really bad news (I think) is that our margin for error is now much larger than our potential savings from these sort of refactorings.

Well, what now? It turns out, by fiddling with the JVM settings I can get crazily different results that make one or the other of these look a little better or worse. Also, if I change the JIT settings and/or call the methods in a different order, I can make it look like StringBuilder is consistently slower.

After cracking open the source for the runtime library, the implementation of string + string actually USES StringBuilder, the only real difference is that every call MUST call toString() so there is additional garbage generated. As I was trying to say at my (now mostly wasted) conversation with this guy was: "it's more complicated than simply answering which method is faster".

The lessons to learn from this are:

#1 Don't prematurely optimize. The fact is, the code that this person was worried about would be called about 1 or two times per day. This means changing the code would have a net gain in performance of a couple of milliseconds per year... Hardly a stunning victory for performance. I had to call it a hundredthousand times to get any reasonable gain in performance.

#2 Don't try to reason with unreasonable people. Some folks just REALLY want to be right, they don't really want to have a conversation. When they start bullying and pounding their chests, figure out a way to get out of the conversation without engaging in a pointless debate... (but don't necessarily back down as this will often encourage the boorish behavior).

Comments

Unknown said…
Will send beer for more posts like this. :o)

Excellent topic, analysis and closing arguments.

Popular posts from this blog

Push versus pull deployment models

the myth of asynchronous JDBC

Installing virtualbox guest additions on Centos 7 minimal image