Is your code too concrete?


A post of note.

Mark Tiele Westra wrote a splendid post two years ago, "The ball of mud transition," in which he elaborated on an idea nabbed from a Stuart Kauffman book. He writes,

"Take 20 buttons and a ball of string, and scatter the buttons over the floor. Then take two buttons at random, and connect them using a thread. Do this for a while ... pick up a random button, and count the number of buttons it is attached to."

Apparently, a threshold number of connections emerges below which picking up a button lifts few others with it, but above which a large number of buttons will be snagged (the referenced paper shows that this, "Giant component," will contain half the buttons).

Considering software, Mark then wonders whether such a threshold of dependencies might signal a transition to the infamous big ball of mud pattern and investigates with some Clojure experiments.

This post will tread a similar path, but will analyze the effect on two types of dependency. The experiment above focuses on direct dependencies, dependencies between just two methods (strictly speaking, "Binary dependencies," may be a more accurate term, but the connection to compiled code may confuse).

Good structure, however, concerns the reducing of the impact costs of ripple effects, and ripple effects do not limit themselves to direct dependencies but instead race along transitive dependencies, which are direct dependencies laid end to end. Figure 1 below, for example, contains three direct dependencies - a() -> b(), b() -> c() and b() -> d() - but contains only two transitive dependencies - a() -> b() -> c() and a() -> b() -> d().

Spoiklin Soice image

Figure 1: Three direct dependencies, two transitive dependencies.

So we shall explore the relationship between direct dependencies and the transitive dependencies they spawn.

Let us generate dozens of systems of 1000 Java methods randomly connected by increasing amounts of direct dependencies, then count the number of transitive dependencies that result. Is there a "transition point" at which the number of transitive dependencies suddenly takes off, reflecting a horribly interconnected and expensive-to-maintain structure? See figure 2.

Graph of how transitive dependencies as random direct dependencies are added

Figure 2: Systems of random direct dependencies.

The red crosses in figure 2 show (the blue line is the theoretical trend) that as long the number of random direct dependencies remains well below the number of methods (1000), then the total number of transitive dependencies stays relatively low, indicating systems that would be cheap to maintain.

As the number of direct dependencies approaches the number of methods, however, then the number of transitive dependencies does indeed soar from around 800 to 10,000. When, furthermore, the number of direct dependencies rises to just 10% more than the number of methods, then the structure completely collapses with nightmare scenarios of 70,000 transitive dependencies and more.

So, can we expect real systems with more direct dependencies than methods to also show this catastrophic structural degradation? Table 1 shows the relevant data for some recently reviewed systems.

Methods Direct dependencies Transitive dependencies Abstract
Ant 5058 8995 596935 ???%
Fitnesse 6198 10113 33586 18%
Struts 3114 3049 7764 16%
JUnit 1291 1707 2650 16%
Spoiklin Soice 2783 4787 5050 62%

Table 1: Analyzed programs.

Table 1 shows that almost all these programs have more direct dependencies than methods, yet only Ant exhibits the expected disastrous structure with half a million transitive dependencies. Why do the others show so few transitive dependencies?

It turns out that the model above is a little too simplistic to be applied to source code: it suffers from two flaws.

Firstly, it takes no account of Java interfaces. An interface method is abstract; having no body it cannot call others methods, and so cannot be the source of a direct dependency.

Secondly, it presumes direct dependencies to be uniformly distributed, but real programs demonstrate a power-law of dependency distributions, with some methods being the target of vastly more direct dependencies than others.

In well-structured systems, furthermore, both these points reinforce one another. With GoF's, "Program to an interface, not an implementation," and Martin's, "Depend on abstractions not on concretions," large percentages of direct dependencies terminate on abstract methods. Table 1 shows the actual percentage figures for those analyzed programs in the right-most column (several ironically low), with Ant's figure hidden till later as a test of a theory.

Let us run the 1000-method experiment again, but this time let us force 16% (casually chosen from table 1) of all direct dependencies to terminate on abstract methods, see figure 3.

Graph of how transitive dependencies as random direct dependencies are added

Figure 3: Random direct dependencies with 16% terminating on abstract methods.

Comparing figures 2 and 3, we see that in figure 2, the number of transitive dependencies rose to 10,000 (on average) with just 10% more direct dependencies than methods. In figure 3, with 16% abstract-terminating dependencies, the system rises to 10,000 transitive dependencies at around 20% more direct dependencies than methods.

Whether we choose 10,000 to represent structural chaos or some other number, the principle of, "Program to an interface not an implementation," seems to delay the onset of that chaos.

A figure of 16% abstract-terminating dependencies indeed seems quite low. How does the chaos-threshold move as we add more and more interfaces? See figure 4.

Graph of how transitive dependencies as random direct dependencies are added

Figure 4: Random direct dependencies with 16%, 25%, 35% and 50% terminating on abstract methods.

Figure 4 suggests the trend continues, with a system of 50% abstract-terminating dependencies remaining well structured with vastly more direct dependencies than less well-abstracted systems.

Of course, all of these analyses lack the scope and statistical rigour to form definitive conclusion. They suggest, however, the crucial importance of abstraction in the fight against structural complexity.

As a final finger-in-the-air test, this theory would predict that Ant's embarrassing showing in Table 1 is due to its having a very low number of direct dependencies terminating on abstract methods.

Its actual figure?

6%.

Summary.

Program to an interface not an implementation.

Depend on abstractions not on concretions.

Each your vegetables.

These principles exist for a reason.