Which of following two Java package structures is least well designed?
Figure 1: Spaghetti structure.
Oops! Let's try that again.
Figure 2: Two Java package structures: JUnit and Spoiklin Soice.
This blog's banged on and on about how much the package structure on the right in figure 2 (in which circles are packages, straight lines represent down-the-page dependencies and curved lines up-the-page) is better than that on the left because the right's structure presents clearer dependencies, making update costs easier to predict and updates themselves often easier to implement.
Two problems, however, persist.
Firstly, graphical evaluation is subjective. Yes, most would agree that the structure on the right is better, but consider figure 3.
Figure 3: Two more, messy Java package structures: Lucene and Struts.
Which do you think the messier structure in figure 3? (Answer in table 2 below.)
Secondly, diagrams such as these offer insight when we evaluate a small number of nodes, as on package-level, but fail before the ghastly nodepocalypse of class- and method-level.
Figure 4: A Java method-level structure. Good luck with that.
What we need is to objectively quantify spaghettiness1: its messiness, its ... disorder. But how on earth do we do that? What makes a structure messy?
Fortunately, mathematics has already defined what spaghettiness is by defining its opposite: partial order. And we can apply this to computer programs, with just one teensy supposition.
Mathematics says if a set has a binary relation with just two specific properties, then that set enjoys partial ordering. Let's go through it.
Consider the three methods in figure 5, where method a() calls b() and b() calls c(), forming the single transitive dependency: a b c.
Figure 5: Three simple methods.
From this diagram we must extract a set of numbers, and we'll extract our old friend depth, where depth is method's position in the transitive dependency: thus a() is at position 0 (because programmers), b() is at position 1 and c() is at 2.
Figure 6: Three simple methods numbered by their depth in a transitive dependency.
Mathematics tells that this "program" is partially ordered with respect to depth, if, when you extract these depth values and iterate over them in pairs - a pair of depth values being say d1 and d2 - then the following properties hold:
The first property is rather trivial, but that second property says that if we write out our transitive dependencies then depth values should never decrease. And in figure 6, they do not: a(0) b(1) c(2).
As our program's depth satisfies these properties, then it is partially ordered. We have achieved mathematical objectivity. (Thanks to Johannes Zick for pointing out that a program is not totally ordered as all methods do not call all others.)
Figure 7 shows a slightly more complicated program of two transitive dependencies, again with methods' depths indicated.
Figure 7: Ooooo! TWO transitive dependencies.
Both transitive dependencies separately satisfy the three properties required.
Now let's look at a bad boy. Suppose someone grabs this code and calls e() from c(), that is, creating a dependency from c() back up to e().
Figure 10: Our first messiness.
Recall that curved lines represent dependencies that go up-the-page and with c() now depending on e() we have the transitive dependency: a(0) b(1) c(2) e(1) f(2), in which the depth value decreases at one node2. This transitive dependency is therefore not partially ordered, so we cry carbohydrate!
Thus we can now define our metric. No, not, "Spaghettiness." Let us channel our inner squares and call it, "Structural disorder." A transitive dependency is structurally disordered if it does not satisfy the partial order properties above, and a program's overall structural disorder is then the percentage of disordered transitive dependencies.
Let's take this puppy out for a spin.
Looking at the two packages structures in figure 2 once again, we would intuitively expect the structure the right to be far less disordered than that on the left, and it turns out to be so:
Figure 11: JUnit disorder: 76% but Spoiklin Soice disorder: 3%.
And although we seek an objective measure, we nevertheless expect that as structures become subjectively messier-looking, their structural disorder values should rise. We can test this by taking two perfectly structured systems, "refactoring" them by applying random dependencies between nodes and checking whether their disorder values generally rise as their structures collapse. See figure 12.
Figure 12: Two sad, decaying systems.
We can even simplify matters by defining the (admittedly arbitrary) categorization whereby a program suffering from 50% disorder or more is spaghetti. The threshold might have been 40% or 60% - feel free to choose your own. In fact, we'll have four categories, distinguished by garish, child-friendly colour-coding: red/black=naughty, green/white=nice.
Structural disorder | Evaluation | 0% - 24% | Good | 25% - 49% | Fair | 50% - 74% | Spaghetti | 75% - 100% | Just absolute darkness |
Table 1: The four categories of structural disorder.
Let's point our disorder-binoculars at 15 Java programs, some quite well-known. Table 2 shows the programs and their structural disorder percentages on method-, class- and package-level. You'd expect most professionally designed programs to be, "Good," to, "Fair," on the disorder spectrum, so the table should appear overwhelmingly green and white, yes?
Program | Method | Class | Package |
Cassandra | 41 | 82 | 84 |
Zookeeper | 28 | 85 | 93 |
ActiveMQ Broker | 24 | 80 | 89 |
Jenkins | 26 | 72 | 90 |
JUnit | 34 | 78 | 76 |
Camel | 22 | 90 | 70 |
Lucene | 33 | 70 | 73 |
FitNesse | 33 | 55 | 61 |
Tomcat (Coyote) | 22 | 81 | 40 |
Maven | 30 | 30 | 74 |
Log4j | 25 | 59 | 47 |
Struts | 11 | 42 | 74 |
Spring | 27 | 60 | 35 |
Netty | 22 | 69 | 20 |
Spoiklin Soice | 26 | 25 | 3 |
Average | 27 | 65 | 62 |
Table 2: The structural disorder percentages of 15 Java programs3.
Oh.
It seems that we, as professional programmers, can write more or less well-structured methods, but above that ... RRRR MRRRR GRRRRD.
Three points are note-worthy.
First, we chest-thump endlessly about refactoring. Yet refactoring definitionally involves just one thing: improving software structure. Table 2 suggests that we fail to consider refactoring at class- and package-level.
Second, higher-level structure can provide a model, a simplified view, of the lower levels: a good package-structure, for example, can offer a great map of functionality without pushing the programmer's nose into foul code. Yet our higher-level models seem vastly more disordered than that which they model. Table 2 suggests that we fail to maximize the benefits of higher-level structure.
Thirdly, Oracle will release Java9 any day now (honest!) with its new modules, offering a level of structure above even package-level4. Yet we apparently lack the desire or competence to manage the levels we already have. Table 2 predicts the rise of spaghetti modules.
Figure 13: Not another code review ...
So, are we still writing spaghetti code?
Hell, yes! Not only are we still writing spaghetti code, we're living in the golden age of spaghetti code, an age in which we professional programmers don't just observe and casually ignore spaghetti, we don't even recognize it in the first place.
The GOTO statement used to be the alarm that forced programmers to manage control flow in their programs. Abandoning the GOTO statement, however, in no way removes this concern but rather migrates control flow to the realm of inter-method, inter-class and inter-package dependency where - in those last two cases at least - its complexity now thrives, far from the programmer's gaze.
The greatest trick spaghetti code ever pulled was convincing the world that it didn't exist.
This is my structure. There are many like it, but this one is mine.
My structure my best friend. It is my life. I must master it as I must master my life.
Without me, my structure is useless. Without my structure, I am useless. I must design my structure true. I must design cleaner than my enemy who is trying to out-structure me. I must embarrass him before he embarrasses me. I will...
My structure and I know that what counts in programming is not the variables we rename, the methods we extract, nor the conditionals we replace with polymorphism. We know that it is reduced disorder that counts. We will reduce disorder...
CC image Spaghetti courtesy of David Purhouse on Flickr.