Monday, July 28, 2008

Why Java Generics Suck

When I first saw Java's Generics, I thought it was a pretty neat addition (rather harsh syntax, but effective). You get stronger type safety without all the extra casts. Less code with less room for mistakes, right? Well, yes and no. The problem is, Sun wanted to add this new feature, but at the same time preserve bytecode compatibility. There are probably a lot of reasons that maintaining bytecode compatibility is a good thing, but I just don't buy it.

So, what am I talking about you ask? Good question. The way Generics are implemented is via a thing called Type Erasure. What this essentially means is that the type information is there only for the compiler. So, if you have a List<String>, you don't ACTUALLY have a list of strings. You have a List that will only let you store and retrieve strings, but ONLY as long as you know the type of that list. I will get to an example in a little bit. My best guess at why Sun did this is they wanted to add this neat feature that C# had and C++ had, but at the same time they didn't want to cause upgrade problems where clients of your code MUST be using the Java 1.5 runtime. Except, if you use any of the new API, your clients are going to have to upgrade to Java 1.5 anyways, so it seems like a myth that they are ACTUALLY benefiting anyone by being backwards compatible. By trying to appease 2 crowds (those wanting more features and those wanting to be able to run on old runtimes), they ended up creating a bit of a Frankenstein's Monster in the process.

So if that garbled mess of a paragraph didn't help you understand the problem (which it probably didn't... I was really just ranting), then hopefully a concrete example will help.

Have you ever seen the warning "Type safety: The cast from Object to List is actually checking against the erased type List" or something like it? You can get it from the following code:

public static void storeInt(Object object, int value) {
List<Integer> list = (List<Integer>) object;
list.add(value);
}

What this means is that you tried to take an object without Generic type information (Object for example), and you tried to extract the Generic type information from it (Integer)... except that information was erased the moment you stored your list into a reference of type Object. Why? Because in order to preserve backwards compatibility, the generic type of Integer is not ACTUALLY available at runtime. This means you can end up recasting your list to the wrong generic type, and add elements of the wrong type, which can later cause a ClassCastException. Consider the following code:

public static void storeString(Object object, String value) {
List<String> list = (List<String>) object;
list.add(value);
}

public static void storeInt(Object object, int value) {
List<Integer> list = (List<Integer>) object;
list.add(value);
}

public static void printList(Object object) {
List<?> list = (List<?>) object;
for (Object value : list) {
System.out.println(value);
}
}

public static void main(String[] args) {
List<String> list = new ArrayList<String>();
storeString(list, "Hello World");
storeInt(list, 1);
printList(list);
}

The above code will not only compile, it will also print 2 lines, the first saying "Hello World" the second saying "1"... even though we have clearly violated our initial list of strings and stored an integer in there. However, let's add some more code. After the printList call in the main method, try:

for (String value : list) {
System.out.println(value);
}

Now you will finally see the runtime error you might have been expecting in the previous example. This time you will see "Hello World" followed by "1" followed by "Hello World" followed by a ClassCastException. This is because Java is trying to take the "1" and put it in the String value reference... but it's not a String, so we have a casting problem! If sun had implemented Generics right (which probably would have meant byte code incompatibility with 1.4), then the first time you tried to cast that list of strings as a list of integers, you would have gotten a ClassCastException. Furthermore you would have been able to do things like this:

public static <T> T createAndStore(List<T> list) {
T value = new T();
list.add(value);
return value;
}

and this:

public static <T> T[] toArray(List<T> list) {
T[] array = new T[list.size()];
for (int i = 0; i < list.size(); i++) {
array[i] = list.get(i);
}
return array;
}

But with type erasure those 2 examples are impossible, because the type is unknown at runtime.

You may think that I am just ranting about these and that the issues never come up, but I assure you they do come up. If you are working with any legacy code or APIs that were not built when Generics were around and did not upgrade to use Generics (or were built when they were around, but the author(s) didn't think to use them)... or if Generics just can't solve the problem that a simple list of Objects can, then there can be issues.

Case in point: servlets. Take a look at the servlet API, particularly HttpSession's getAttribute method. I can only store and retrieve Objects, which makes sense. I should be able to store anything I want in the Session. However, the moment I store an object with Generic type in there... especially one that needs to be modified, I have thrown type safety out the window.

Let's say I store a list of strings in the session, and I want to add a string to that list periodically throughout my user's session. This doesn't seem so far fetched, right? Well, now how do you suppose to add items to that list once you have removed the String generic type information?

public static void addString(Object object, String value) {
List<String> list = (List<String>) object;
list.add(value);
}

causes a warning at the casting time like we discussed before.

public static void addString(Object object, String value) {
List list = (List) object;
list.add(value);
}

also causes a warning, this time at the point where we call add... because Java thinks we should be using Generics... after all, they spent so much time designing the system, you should always use it!

public static void addString(Object object, String value) {
List<?> list = (List<?>) object;
list.add(value);
}

causes an actual compiler error on the add call... because Java can't give you any guarantees that you are safely calling add with the correct type, so they just don't let you.

The solution? Well, you have to just use the SuppressWarnings annotation on the method where you want to add that string! Seems like a hack to me, but maybe you can live with that. Personally, I would rather just ditch Java and go use Ruby. None of this is even remotely an issue when you go dynamic.

Monday, July 7, 2008

C# Regions and Style

Often, Jeff Atwood has insightful and interesting things to say. I very often agree with his viewpoint, though every now and then it just doesn't quite sit well. That's probably important, to be sure that I am reading with a critical eye and not just taking his word on blind faith. The reason I start with this is his most recent post just didn't sit well with me.

I worked with C# at my last job. It had its perks and its problems, and it was a Microsoft product, which adds instant negative baggage (sorry Jeff, despite enjoying your writing and podcast material, I don't think I will ever understand your appreciation of Microsoft products). One thing I did like was #regions. C# is about as verbose as Java (at least it was before 3.0, which I never got to use but have heard they improved things a bit). As such, this often leads to large code files, especially in GUI code that has tons of event handlers and special setup code. I found at my last job that a small sprinkling of regions helped organize my code quite well. I could keep all methods of a particular type grouped together with the ability to toggle the code at will. Some of my favorite groupings were "Event handling", "Constructors" and "Utility methods". It just seemed cleaner to keep things together that are semantically similar. Though I prefer Java and (even more so) Ruby, Regions is one thing I wished I could take from the Microsoft world into the rest of the world.

He is completely right that it can be abused, though. I saw plenty of other code in the project that would simply have regions around every method definition, which just seemed redundant to me. I know Eclipse will allow this kind of code folding automatically, and I could swear Visual Studio did it also (but it has been long enough that I could just be thinking of Eclipse). At that point, the regions aren't grouping semantically similar concepts, and it ends up being quite a mess visually and organizationally. At least to me... I'm sure the author preferred it that way.

This leads me to conclude that the first part of his article is dead on. Despite my disagreement with the details of regions, I fully agree that a team must be aligned on code style. There is one class I missed in college that in retrospect I truly regret not taking. It was a class that everyone feared as being too hard, but everybody who took it ended up loving the professor. It was some kind of individual software project course, and the class was known for the professor enforcing a standard coding style on the students, with some rules purposefully contrary to common style guides. Thus, the student walks away learning the ability of bending one's will for the good of the team. Style is often a "religious" debate that has no answer. Both sides are valid, because it's just a matter of personal style. The style I consider beautiful is ugly to the next person, and likewise his or her style is ghastly to me.

The style I appreciate most of all though, is consistent style. If that means bending my style a little sometimes, then so be it.