Skip navigation

Just a test post from my Droid. I’m on a Galaxy Y and it’s a shame to report that landscape mode isn’t very nice here. Shame.

Elsewhere in my life I’ve been doing lots of JSP and Apache. Quite some XML here and there, and bits of Oracle. Oh, and PyGame too!

One of Java’s biggest advantage against C is its support for strings. The String class is so well-integrated into the language (it is part of the java.lang package) that newbies easily mistake it for a primitive type; in a typical Java introductory course, you’ll almost always find someone complaining why "Hello World" == "Hello World" returns false. One of it’s most useful features is how it allows concatenation using the + operator. Very intuitive.

However, this convenience comes with a price, as I will discuss in this post. Also, we’ll look at one way to mitigate this computational overhead.

Java’s Ruse
Tell me, what’s wrong/inefficient with the following lines of code?

1
2
3
4
5
6
7
8
String[] names = {"Homer", "Lenny", "Karl", "Barney"};
String greetEveryone = "Hello";
 
for(int i = 0, limit = names.length; i < limit; i++){
  greetEveryone += " " + names[i];
}
 
System.out.println(greetEveryone);

See nothing wrong? As I noted above, Java’s support of concatenation through the + operator is very intuitive. So intuitive it has led a lot of people (me guilty!) to overlook the String specification of the Java API. And I quote the JDK API 7 specification1:

Strings are constant; their values cannot be changed after they are created. …String objects are immutable…

Strings are immutable. Unlike incrementing the index variable i, which stores the incremented value to the same block of memory as the original value, concatenating a String actually creates a new object (which is quite a costly operation). The pointer of greetEveryone is just updated to the newly created object so that, come line 8, you get to greet the whole gang.

Stop Shooting Your Own Foot
Java actually sees line 5 as something like,

String space = " ";
String withName = space.concat(names[i]); 
String updatedGreeting = greetEveryone.concat(withName);
greetEveryone = updatedGreeting;

It creates three objects just so it can update one variable! And at every iteration of the loop, you create more Strings floating in memory. Putting the variable declarations outside the loop does not help either because, as I said, Strings are immutable and every assignment to an already-declared String variable creates a new String object.

Can’t we do better? Fortunately, Java already has a solution to this costly operation: StringBuilder and its thread-safe brother StringBuffer

And I quote from StringBuilder’s API 7 specification2

(StringBuilder is) [a] mutable sequence of characters.

The principal operations on a StringBuilder are the append and insert methods, which are overloaded so as to accept data of any type.

With StringBuilder, we can make Java read our code above as follows:

1
2
3
4
5
6
7
8
9
10
String[] names = {"Homer", "Lenny", "Karl", "Barney"};
StringBuilder greetEveryone = new StringBuilder("Hello");
 
for(int i = 0, limit = names.length; i < limit; i++){
  String space = " ";
  String withName = space.concat(names[i]); 
  greetEveryone.append(withName);
}
 
System.out.println(greetEveryone.toString());

But, that doesn’t seem to do that much…
For some of your typical uses of Strings—System.out.printing a debug trace or creating UI labels, for instance—using a StringBuilder might be overkill; debug traces may be disabled without affecting the program and UI labels are rarely concatenated with something else. But some Java operations rely heavily on Strings that a few extra lines declaring StringBuilders and calling their toString() later on is a worthy investment. IBM developerWorks3 note a throughput increase of 2.3x using StringBuffer over plain-old concatenation (which, again, does not seem much unless you read the rest of this post).

An instance where you’ll be very thankful for StringBuilder and StringBuffer is when you work with JDBC. While most of your queries will be pre-determined and so hardcoded, every now and then you will most likely have to generate a dynamic query (based on, say, user input). Querying a large database is already costly even without the expense of generating temporary objects for String concatenation. Suddenly, the meager 2.3x throughput increase seems a very generous deal. And heck it is.

Bonus Links
Looking for more ways to optimize Java code? Read Peter Sestoft’s paper on Java performance. For the numerical algorithms geek, he also has a paper on numeric performance in C, C#, and Java.

  1. http://docs.oracle.com/javase/7/docs/api/java/lang/String.html []
  2. http://docs.oracle.com/javase/7/docs/api/java/lang/StringBuilder.html []
  3. http://www.ibm.com/developerworks/websphere/library/bestpractices/string_concatenation.html []

Hey! Did you hear? PyCon is happening in the Philippines! And it happens right at my alma mater, UP Diliman (at the UP Alumni Engineers’ Centennial Hall’s Accenture Ideas Exchange Room—a.k.a. Lecture Hall), from June 30 to July 1. That’s a weekend so I have no problems with it conflicting with my work schedule.

But…registration, with the cheapest being at PhP 295, most expensive at PhP 695, requires the use of PayPal, which requires a credit card. I don’t have a credit card and, right now, I really don’t have any plans of getting one. That sucks.

Although right now, at the PhPUG Facebook Group, there are requests of having payment done through GCash. That, I think, is a better idea since how do you honestly expect students to turn up if registration requires a credit card?

I keep my fingers crossed.

EDIT (06/10/12): Update from the PhPUG Facebook group: It seems that a GCash option will be rolling out on Monday (tomorrow). Yeah!

EDIT (06/11/12): And yes, finally, PyCon Philippines is accepting GCash (since yesterday, late update :P). They also seem to support over-the-counter payments. Don’t know how that one goes though.

Whenever I code in C, one of my biggest annoyances is that there is no built-in function for determining an array’s size. So I either resort to hard-coding the maximum size the arrays my program will use (which is a waste of memory and is inflexible) or I allocate arrays dynamically (which is cumbersome).

When I mastered my bits a bit more, I realized that an array’s size can actually be determined by doing sizeof(array)/sizeof(array[0]). I then coded a function that will do just that for me. However, there are two problems with this:

  1. I’d have to define this function for every possible data type I’ll make an array of. AFAIK, C doesn’t have something like Java’s Object super-root class; and
  2. C is one hell of a liar. Among the things I (as a beginner and even as someone with moderate C experience) find very messy with C is that it is a pass-by-value language but it uses explicit pointers and so manages to act like a pass-by-reference language. Confusing, right?

Let’s a dwell a bit more on #2. Whenever you pass an array or a struct to some C function, say foo , that function can modify the contents of the array/struct for the whole program to see. It would seem to us that foo was given the whole array/struct upon function call. But no. C actually went behind your back1 and passed a pointer to the array/struct. That pointer is just an integer although C knows that it refers to some place in memory and uses that to modify the contents of your array/struct.

And boom. There goes the problem with doing

int size(int *array){
	return sizeof(array)/sizeof(array[1]);
}

as sizeof(array) will always return 4—the number of bytes in an integer (pointer).

I gave up my hopes on making C a bit more Java-ish for my taste. Until I (recently!) learned what macros are for.

Macros are the #define statements at the beginning of a C code listing, just after the #includes. I know that they can contain virtually almost every kind of C code but I’ve neglected that and up to now only used them to define constants. Assigning sizeof(array)/sizeof(array[0]) to a macro gives me just the thing I’m looking for.

So why would a lowly macro work where functions failed? It’s because of how macros are parsed. Whenever C encounters a macro, the bit of code assigned to that macro is, shall we say, copy-pasted directly into your code. So when we see

1
2
3
4
5
6
7
8
#include <stdio.h>
 
#define ARRAY_SIZE(array) ((sizeof(array) / sizeof(array[0])))
 
int main(){
	int foo[9] = {1,4,1,5,9,2,6,5,3};
	printf("%d\n", ARRAY_SIZE(foo));
}

C actually understands line 7 as

7
printf("%d\n", ((sizeof(foo) / sizeof(foo[0]))));

No parameter passing occurs, and hence no pointers. Everything is executed inline. While this may make parsing a bit longer, it surely is worth more than what it costs. And though it may not still be as nifty as Java (see what happens when you use that macro inside some function that requires an array as a parameter), it sure becomes less awkward with this.

Post Script. I orginally saw this trick here (CTRL+F to Chapter 17). Apparently, Linux has this macro pre-defined in linux/kernel.h . kernel.h includes fine in my machine but then it doesn’t seem to have the aforementioned macro. I opened my kernel.h and, indeed, it is not there. Weird.

  1. I just realized that C gurus will squirm at my choice of words. But hey, this is for non-gurus like me, alright? ;D []

For sometime now, I’ve been talking about my experience porting Octave code to Java. While I still have a few things to say on the matter, I decided to put up the source code of our Java port. It is accessible at my portfolio website.

I don’t really find it an interesting code listing; there are a lot of functions doing the same thing only on different data types and most of them are elementary processes you learn the first time you learn about matrices and vectors. But, if someone needs to port Octave/Matlab to Java, I think I’ve laid a pretty nice groundwork here.

It’s not perfect but there you go. Caveat emptor.