What is good source code structure?

Three objective, syntactic properties.


What on earth is the point of all this eye-wateringly tedious drivel?

Good question.

The point of all this analysis is making washing machines cheaper.

And cars. And phones. And Magnetic Resonance Imaging scanners. If source code structure is about cost reduction - as is claimed - then, "Good," source code structure is that which saves the cost of, "Bad," source code structure.

So, what is this, "Good," source code structure?

Well, if such a thing exists then we are going to make a demand of it. We shall demand that it is objective, that it lies in independently verifiable properties of the structure itself rather than springs from our oh-so-subjective brains. We make such a demand because we are not interested in Mary's opinion. Or Sebastian's. We ultimately want a machine to chew our source code and spit out potential bad structure so that we can improve it and save money.

This really shouldn't be that difficult, should it? We've been changing source code for decades: we've bolted on countless new features and we've heaved it sweatily through unending, purgatorial maintenance. Can we really have toiled for so long yet remain ignorant of costly structure?

Of course not.

There are already plenty of good structure proposals guiding our path. Three present themselves here, distinguished for certain categorical reasons to which we shall return later.

The principle of depth

Our first objective quality of good structure stems from the statistical. Statistics slither notoriously through the history of bad argument but we'll give it a bash. We're going to base a conclusion on two assumptions. If these assumptions hold our conclusion should obtain. If not, sayonara structure.

We shall assume that all source code change is random.

That surely sounds odd given that so few programmers support themselves by making random character changes to large, corporate code-bases.

Taking a long-term view, however, we see that, yes, we programmers make precise and meticulous changes, planned in at least some detail and iterated to perfection with via the blinding rapidity of unit tests, but few of us can predict such precision and meticulousness six months in advance. Or a year. Or five.

Markets transform overnight. Customer wallets bulge then bleed. Fads go before they've come.

No one knows what the customer will want in a year or what bugs we'll have to fix. The changes we make, furthermore, range fearlessly over the full breadth of the source code. No function's safe.

In this sense, given our inability to make precise impact predictions and given the law of large numbers involved in making thousands of changes per year to a large code-base, our updates might be modeled - in advance - as approximating a sequence of random or pseudo-random events. Hence, source code change is random.

Second assumption: a change to a function will have a greater probability of impacting its calling functions that those functions it calls. (We'll use functions as examples throughout, though classes or packages would also suffice)

Let's say function a() calls b() and b() calls c(). Consider we then have to fix a bug and change b(). Which is more likely to suffer a ripple-effect change: a() or c()?

The very nature of software dependency suggests that, given that c() does not depend on b() and a() does, a() has a higher probability of being impacted. Of course, a() won't always change and sometimes c() will change, but a casual probability analysis seems to justify our claim.

We further note that if a change to c() can potentially force changes to both b() and a(): this transitory nature of impact, indeed, is what gives the ripple-effect its name.

So, given both these assumptions, here is our simple question to carry us through to our conclusion: should we have long dependency chains or short dependency chains?

That is, if we have a choice, should we choose our structure to reflect (1) or (2)?

  1. a() → b() → c() → d() → e()
  2. a() → b(), a() → c(), a() → d(), a() → e()

Let's use a mathematical tool called the impacted set (also known as, "Counting"). This quantifies the worst-case scenario by gathering all the functions potentially impacted by the change of a single function and then counting them. This is then repeated for all functions, that is, by presuming each function is impacted and counting the potential ripple-effect. The bigger the impacted set - or rather, the higher the impacted set cardinality - the worse the structure.

Both cases above involve dependencies between 5 functions.

For case (1), if e() is changed then in the worst case a(), b(), c() and d() would have to change: e() has an impacted set cardinality of 4 because 4 functions are potentially impacted if it changes. d() has an impacted set cardinality of 3, etc. The impacted set cardinality of case (1) in its entirety is 10. The impacted set cardinality of case (2) is 4.

Impacted set analysis stands, however, as merely a specific case of a more general principle: the principle of depth. The depth of a transitive dependency is merely the number of functions it contains. The depth of an impacted set is the average number of functions in all its transitive dependencies. The principle of depth simply states that shallower structures cost less to update than deep structures. Depth is bad.

Yes, a worst-case analysis skews our figures and this portrays matters far worse than they probably are and not every function is equally likely to be updated and there are all sorts of semantic and encapsulation-related and goodness-knows-what reasons why we sometimes need a long dependency chain, etc., etc.

Fine. Granted. This whole approach must be taken with a pinch of salt. It cannot apply to all cases. Got it.

In the far corner, however, stands that large number effect. And he's swingin' furiously.

Given thousands or tens of thousands of function chains, will we have good semantic and encapsulation reasons why most of long chains should be preferred over shorter? And if impacted set analysis - based on two reasonable assumptions - offers even a slight indication why shorter chains will save money, will this not become increasingly important at scale?

We can swap punches with our snarling bruiser for a week or a month, but sooner or later we'll make a change and watch helplessly as it ripples out from here to Samoa. We'll wonder what that crunching sound is as the blow lands.

This, then, is our first objective quality of good structure: the longer the average dependency chain length - that is, the deeper the structure - the worse the source code structure.

Cyclic dependencies

Penny farthing

The second quality of good structure is well-known and merely an infinite extension of the first. This concerns cyclic dependencies.

Authors have spilled oceans of ink already in deriding the evil that is the cyclic dependency. Basically, if you have a() → b() → c() → a() and you have to make a change to any function there is a non-zero probability of being forced to change all the functions because they all transitively depend on one another.

So well-known is this gorgon that we need spend little time on it, though we might note one point. The length of cyclic dependency is significant. There are many cases where a function may wish to recursively call itself; functional programming positively encourages it.

Cyclic dependencies really only earn their prison-tatts when they grow large, with function cyclic dependencies sprawling across multiple classes and packages. Such brutes render useless all sorts of otherwise intelligent encapsulation mechanisms without offering a scrap of benefit. The advice is unanimous: avoid.

This is our seconds objective quality of good structure: the fewer the significant cyclic dependencies, the better the source code structure.


Our final structural quality reveals itself whenever programmers confess their nastiest code sins. Heads bowed and eyes raised (only programmers can accomplish this uncomfortable feat) they will usually reply with a soft, sad exhalation: duplication.

And with good reason.

With code duplicated in two functions a system requiring change to that specific code must be updated in both locations or risk inconsistent, stature-draining behaviour. The problem being, of course, that poor, stressed programmers easily overlook that such dual-entry updates are required.

That duplication is a source code evil is not in doubt. The question is: is it a structural issue? Or more particularly, can we use structure to identify (and hence eliminate) duplication thereby improving our structure?

Machines already thrive at hunting repeated source code sequences that suggest duplication. If two or more such sequences are found the machine can suggest extracting them to their own function. Yet whereas such an extraction would change and improve the source code structure it is not identifiable from the structure (as we have defined it) in the first place.

Structure may, however, offer assistance in the detection of another type of duplication. Consider that multiple client functions call a group of target functions in sequence, consistently. Let's say functions f() and g() call target functions a(), b() and c().

We are perfectly justified in asserting that f() and g() might contain duplicated sequences just based on the three functions that they call and that these three invocations should be extracted to a single function called by both f() and g() - otherwise some change in the calling sequence might be updated in f() but forgotten in g().

But it is by no means certain. Conditions may prevail to negate any benefit from collapsing the invocations, or it may simply be incorrect to do so. Nevertheless, a machine can raise a red flag based only on the structure and this is precisely what we are looking for.

This is our third and final objective quality of good structure: the fewer the questionable invocation duplications, the better the source code structure.


So, there we have them. Our three objective properties of good source code structure are:

  1. Shallowness.
  2. Few significant cyclic dependencies.
  3. Few questionable invocation duplications.

Remarkably little new stains these pages. This is more an attempt - in advance of deriving consequent principles - at objective and syntactic categorization than novelty.

Photo credit attribution.

CC Image "Rumble in the Jungle" Boxing Night at Osmani Youth Centre courtesy of andriux-uk events on Flickr.

CC Image DSC_0626-1.JPG courtesy of Clarence Risher on Flickr.