Sip tea.


A previous post showed how to find objective correlationsNote1 between the number of times a method was updated and its structural properties, thus suggesting properties that produced expensive code and hence principles for their avoidance.

The correlation mechanism used was Pearson's coefficient, which exposes linear correlations.

Programmers, however, don't care too much that a linear relationship exists between, say, method size and cost. They already feel that large methods cost more to maintain and just want objective evidence to support this. They care that the relationship exists, not that the relationship is linear, quadratic, or ... anything else.

Fortunately, another correlation lies at hand for just this purpose: Spearman's rank correlation coefficient. It identifies not just linear but any monotonically increasing relationships.

So the entire analysis was re-run using Spearman, with methodological improvementsNote2 to increase accuracy and with hundreds of thousands more method revisionsNote3 thrown in for fun. Table 1 below shows the results, presenting the Spearman coefficients for all the structural propertiesNote4 appraised, averagedNote5 over all programs analyzed; a zero implies no correlation and a 1.0 implies a strong correlation.

Property S. D. f. C. c. T. d. I.S. A. p. c. A. C. d. Impd. S. C. M. D. D. o.
Avg. 0.36 0.31 0.28 0.25 0.29 0.08 0.13 0.16 0.01 0.15 0.13 0.18 0.01

Table 1: Spearman correlations of structural properties with number of times a method was updated.

As before, no blockbuster correlations shine. Even these higher ranges, above 0.20, are merely weak correlations. But there are not negligible. Some properties do clearly correlate, albeit weakly, with number of updates. The five properties on the left show relatively higher correlations than those on the right, thus offering statistically significant evidence that the more such properties maraud through your code, the more expensive your code will be to develop, even if the influence is small.

The five properties that correlate with more expensive code are:

  1. Size - size of the given method.
  2. Dependencies from - number of dependencies leading from the method.
  3. Conditional count - number of conditionals in the method.
  4. Transitive dependencies - number of transitive dependencies involving the method.
  5. Impact set - the number of all other methods that the given method depends on, directly or transitively.

These are the properties you should design out of your code to minimize development cost.

You might, however, object, "Hang on! Of course big methods have more conditionals and dependencies-from: that's why they're big. Are all these not measuring the same thing?"

Excellent question. You're essentially asking whether correlations exist between the properties themselves, so that if everything correlates with size then we can reap maximal benefit by doing the minimal work of just keeping methods small, ignoring these other properties. To find such a correlation, we'd need to have some sort of statistical correlation mechan--- CALLING MISTER SPEARMAN AGAIN!

Table 2 below shows the averageNote6 of how all the significant properties correlate with method size. Again the value is between 0 and 1, and the higher the value, the more the property correlates with method size.

Size S. D. f. C. c. T. d. I.S. A. p. c.
Avg. 1 0.57 0.69 0.32 0.52 0.09

Table 2: Spearman correlations of method size with all other significant structural properties.

Table 2 suggests that one property - conditional count - correlates so strongly with method size that simply keeping your methods small should go a long way to managing it (so much for rushing to reduce cyclomatic complexity).

Transitive dependencies, by contrast, correlates far less with method size, so you should probably manage those separately.

The remaining two properties - dependencies from and impact set - are something of a mid-way point, correlating both with cost and size. As correlation with cost must take primacy, one of these must surely make our short-list: we cannot leave both to management-by-small-method. Fortunately, further examinationNote7 reveals that they both correlate massively with one another, so as method size affects impact set less, then controlling specifically this property - impact set - would seem to improve structural quality more.

Thus these three properties - method size, impact set and transitive dependencies - may be proposed as foundational properties with which programmers might evaluate method structure, with the associated principles to minimize each property grouped under the acronym: SIPT.

These properties are, "Foundational," not in that they render others obsolete, but that, having objective evidence, however weak, to support their importance, they should be considered perhaps before all others - a first tier - and only once tackled should less evidenced properties consume attention. As such, this approach merely attempts to prioritize between properties (and the principles for their management) rather than curtail any analyses per se.

Summary.

This post continued the search for objective evidence of good method-level code structure. The search is not ended: future analysis may find stronger correlations to depose one or all of the properties identified here.

Nevertheless, while these three reign, later posts will elaborate on the properties and also apply Spearman analysis to package-level structure in the hope of finding more goodies.

If you like good Java, SIPT.