Skip navigation

One of Java’s biggest advantage against C is its support for strings. The String class is so well-integrated into the language (it is part of the java.lang package) that newbies easily mistake it for a primitive type; in a typical Java introductory course, you’ll almost always find someone complaining why "Hello World" == "Hello World" returns false. One of it’s most useful features is how it allows concatenation using the + operator. Very intuitive.

However, this convenience comes with a price, as I will discuss in this post. Also, we’ll look at one way to mitigate this computational overhead.

Java’s Ruse
Tell me, what’s wrong/inefficient with the following lines of code?

1
2
3
4
5
6
7
8
String[] names = {"Homer", "Lenny", "Karl", "Barney"};
String greetEveryone = "Hello";
 
for(int i = 0, limit = names.length; i < limit; i++){
  greetEveryone += " " + names[i];
}
 
System.out.println(greetEveryone);

See nothing wrong? As I noted above, Java’s support of concatenation through the + operator is very intuitive. So intuitive it has led a lot of people (me guilty!) to overlook the String specification of the Java API. And I quote the JDK API 7 specification1:

Strings are constant; their values cannot be changed after they are created. …String objects are immutable…

Strings are immutable. Unlike incrementing the index variable i, which stores the incremented value to the same block of memory as the original value, concatenating a String actually creates a new object (which is quite a costly operation). The pointer of greetEveryone is just updated to the newly created object so that, come line 8, you get to greet the whole gang.

Stop Shooting Your Own Foot
Java actually sees line 5 as something like,

String space = " ";
String withName = space.concat(names[i]); 
String updatedGreeting = greetEveryone.concat(withName);
greetEveryone = updatedGreeting;

It creates three objects just so it can update one variable! And at every iteration of the loop, you create more Strings floating in memory. Putting the variable declarations outside the loop does not help either because, as I said, Strings are immutable and every assignment to an already-declared String variable creates a new String object.

Can’t we do better? Fortunately, Java already has a solution to this costly operation: StringBuilder and its thread-safe brother StringBuffer

And I quote from StringBuilder’s API 7 specification2

(StringBuilder is) [a] mutable sequence of characters.

The principal operations on a StringBuilder are the append and insert methods, which are overloaded so as to accept data of any type.

With StringBuilder, we can make Java read our code above as follows:

1
2
3
4
5
6
7
8
9
10
String[] names = {"Homer", "Lenny", "Karl", "Barney"};
StringBuilder greetEveryone = new StringBuilder("Hello");
 
for(int i = 0, limit = names.length; i < limit; i++){
  String space = " ";
  String withName = space.concat(names[i]); 
  greetEveryone.append(withName);
}
 
System.out.println(greetEveryone.toString());

But, that doesn’t seem to do that much…
For some of your typical uses of Strings—System.out.printing a debug trace or creating UI labels, for instance—using a StringBuilder might be overkill; debug traces may be disabled without affecting the program and UI labels are rarely concatenated with something else. But some Java operations rely heavily on Strings that a few extra lines declaring StringBuilders and calling their toString() later on is a worthy investment. IBM developerWorks3 note a throughput increase of 2.3x using StringBuffer over plain-old concatenation (which, again, does not seem much unless you read the rest of this post).

An instance where you’ll be very thankful for StringBuilder and StringBuffer is when you work with JDBC. While most of your queries will be pre-determined and so hardcoded, every now and then you will most likely have to generate a dynamic query (based on, say, user input). Querying a large database is already costly even without the expense of generating temporary objects for String concatenation. Suddenly, the meager 2.3x throughput increase seems a very generous deal. And heck it is.

Bonus Links
Looking for more ways to optimize Java code? Read Peter Sestoft’s paper on Java performance. For the numerical algorithms geek, he also has a paper on numeric performance in C, C#, and Java.

  1. http://docs.oracle.com/javase/7/docs/api/java/lang/String.html []
  2. http://docs.oracle.com/javase/7/docs/api/java/lang/StringBuilder.html []
  3. http://www.ibm.com/developerworks/websphere/library/bestpractices/string_concatenation.html []

Leave a Reply

Your email address will not be published. Required fields are marked *