The usual caveat: correlation is not cause. The purpose of this analysis is to find objective correlations between properties and costs and it's true, such a correlation does not necessarily mean that reducing those properties also reduces costs. But having no objective correlations is not cause, either, and this is the situation we find ourselves in regarding most principles we use today. Consider this post as an attempt to examine our beliefs rather than to definitively justify them.
The actual measurements made were as follows. If measuring size, then we measured the size of a method over all the times it was updated over multiple releases. Let's say it was updated five times. Then the two numbers correlated for this method were the number of times updated - 5 - against the method's average size over all 5 updates.
Also, it was noticed that some Java bytecode changes even when the source code does not (and even when the compiler versions do not change). As there was no way around this fault, this introduces some error into the analysis.
As usual, the caveat must be made that the only way to ensure that a method has been updated between project releases is to manually check all methods, and this was NOT done in this case.
Instead, the project releases were sieved through a code analyzer which checked automatically for changes by comparing the before and after bytecode of each method. This will not catch all method changes, and could conceivably flag unchanged methods as changed. Hence the results of this experiment are not definitive.
The programs analyzed were:
Only core jar files were analyzed where programs were very large.
The analysis requires that successive release use the same Java compiler version, as whether a method had changed is identified by whether the its bytecode changed. Hence Hadoop releases, for example, are split in two around a compiler update.
All data available on request.
The properties investigated were:
The program-by-program Spearman results are shown below:
Program | S. | D. f. | C. c. | T. d. | I. s. | A. p. c. | A. | C. d. | Impd. S. | C. | M. | D. | D. o. |
Junit | 0.16 | 0.14 | 0.12 | 0.19 | 0.14 | -0.12 | 0.07 | 0.14 | 0.04 | 0.07 | -0.01 | 0.25 | 0.06 |
ActiveMQ | 0.37 | 0.31 | 0.32 | 0.24 | 0.29 | 0.1 | 0.14 | 0.1 | 0.03 | 0.02 | 0.08 | 0.17 | 0.05 |
Camel_1 | 0.31 | 0.26 | 0.27 | 0.2 | 0.24 | 0.05 | 0.08 | 0.19 | -0.11 | 0.06 | 0.13 | 0.21 | -0.12 |
Camel_2 | 0.34 | 0.29 | 0.31 | 0.2 | 0.25 | -0.03 | 0.1 | 0.13 | -0.06 | 0.08 | 0.02 | 0.24 | -0.04 |
FitNesse_2 | 0.28 | 0.26 | 0.17 | 0.21 | 0.25 | -0.1 | 0.09 | 0.1 | 0 | 0.16 | 0.14 | 0.2 | 0.02 |
Hadoop_1 | 0.49 | 0.37 | 0.39 | 0.31 | 0.32 | 0.26 | 0.08 | 0.25 | 0.02 | 0.27 | 0.18 | 0.15 | -0.01 |
Hadoop_2 | 0.42 | 0.43 | 0.33 | 0.24 | 0.29 | 0.25 | 0.09 | 0.18 | -0.05 | 0.26 | 0.2 | 0.21 | -0.03 |
Log4j_2 | 0.58 | 0.49 | 0.37 | 0.39 | 0.49 | -0.13 | 0.21 | 0.27 | 0.01 | 0.18 | 0.23 | 0.18 | 0.03 |
Lucene_1 | 0.28 | 0.28 | 0.25 | 0.23 | 0.27 | 0.1 | 0.21 | 0.15 | 0.05 | 0.15 | 0.12 | 0.22 | 0.02 |
Lucene_2 | 0.18 | 0.18 | 0.19 | 0.14 | 0.18 | 0.05 | 0.12 | 0.17 | 0.05 | 0.16 | 0.14 | 0.09 | 0.06 |
Maven | 0.35 | 0.31 | 0.24 | 0.28 | 0.31 | 0.18 | 0.18 | 0.24 | 0.05 | 0.14 | 0.15 | 0.12 | 0.06 |
Struts | 0.49 | 0.34 | 0.43 | 0.27 | 0.38 | 0.32 | 0.12 | -0.07 | 0.01 | 0.16 | 0.09 | 0.11 | 0.02 |
Zookeeper | 0.4 | 0.33 | 0.31 | 0.3 | 0.32 | 0.14 | 0.19 | 0.23 | 0.03 | 0.22 | 0.23 | 0.18 | 0.03 |
Derby | 0.37 | 0.29 | 0.32 | 0.22 | 0.26 | -0.05 | 0.07 | 0.16 | 0 | 0.13 | 0.08 | 0.25 | 0.01 |
Avg. | 0.36 | 0.31 | 0.28 | 0.25 | 0.29 | 0.08 | 0.13 | 0.16 | 0.01 | 0.15 | 0.13 | 0.18 | 0.01 |
Table: Spearman correlations of structural properties with number of times a method was update in all programs.
For completion, the Pearson results are shown by for comparison.
Program | S. | D. f. | C. c. | T. d. | I. s. | A. p. c. | A. | C. d. | Impd. S. | C. | M. | D. | D. o. |
Junit | 0.23 | 0.18 | 0.18 | 0.06 | 0.08 | -0.09 | 0.03 | 0.07 | -0.03 | 0.03 | -0.01 | 0.11 | 0.06 |
ActiveMQ | 0.35 | 0.32 | 0.33 | 0.02 | 0.21 | 0.3 | 0.05 | 0.19 | -0.04 | 0.04 | 0.07 | 0.12 | 0.03 |
Camel_1 | 0.2 | 0.24 | 0.16 | 0.12 | 0.17 | 0.08 | 0.08 | 0.06 | -0.02 | 0.09 | 0.03 | 0.03 | -0.05 |
Camel_2 | 0.33 | 0.37 | 0.29 | 0.16 | 0.27 | 0.17 | 0.07 | 0.07 | -0.03 | 0.15 | 0.01 | 0.09 | 0 |
FitNesse_2 | 0.26 | 0.32 | 0.12 | 0.05 | 0.2 | -0.14 | 0.03 | 0.06 | 0.01 | 0.11 | 0.1 | 0.15 | 0 |
Hadoop_1 | 0.38 | 0.38 | 0.28 | 0.13 | 0.28 | 0.27 | 0.04 | 0.1 | 0.03 | 0.29 | 0.15 | 0.21 | 0.02 |
Hadoop_2 | 0.38 | 0.45 | 0.27 | 0.01 | 0.27 | 0.22 | -0.02 | -0 | -0.06 | 0.23 | 0.18 | 0.08 | -0.05 |
Log4j_2 | 0.43 | 0.43 | 0.32 | 0.2 | 0.31 | -0.02 | 0.13 | 0.12 | 0.03 | 0.12 | 0.14 | 0.09 | -0.01 |
Lucene_1 | 0.07 | 0.29 | 0.19 | 0.19 | 0.26 | 0.16 | 0.12 | 0.1 | 0.02 | 0.19 | 0.12 | 0.06 | -0.01 |
Lucene_2 | 0.16 | 0.23 | 0.16 | 0.1 | 0.15 | 0.07 | 0.08 | 0.01 | -0.01 | 0.24 | 0.13 | -0.02 | -0.02 |
Maven | 0.41 | 0.31 | 0.24 | 0.29 | 0.33 | 0.01 | 0.12 | 0.14 | -0.01 | 0.16 | 0.14 | 0.14 | -0 |
Struts | 0.45 | 0.23 | 0.36 | -0.03 | 0.12 | -0.19 | -0.01 | -0.05 | -0.05 | 0.04 | 0.03 | 0.05 | -0.05 |
Zookeeper | 0.47 | 0.45 | 0.42 | 0.32 | 0.39 | 0.08 | 0.13 | 0.14 | 0.01 | 0.23 | 0.18 | -0 | 0.01 |
Derby | 0.22 | 0.35 | 0.27 | 0.04 | 0.25 | 0.1 | 0 | 0.04 | -0.02 | 0.13 | 0.05 | 0.09 | -0.01 |
Avg. | 0.32 | 0.32 | 0.25 | 0.12 | 0.23 | 0.07 | 0.06 | 0.08 | -0.01 | 0.15 | 0.1 | 0.09 | -0.01 |
Table: Pearson correlations of structural properties with number of times a method was update in all programs.
The table below shows how the various properties correlated with method size in the latest version of all programs analyzed.
Size (bytecode) | S. | D. f. | C. c. | T. d. | I.S. | A. p. c. | A. | C. d. | Impd. S. | C. | M. | D. | D. o. |
Junit | 1 | 0.51 | 0.63 | 0.23 | 0.36 | 0.15 | 0.3 | 0.22 | 0.06 | 0.22 | 0.34 | 0.09 | -0.01 |
ActiveMQ | 1 | 0.65 | 0.68 | 0.3 | 0.63 | 0.02 | 0.31 | 0.15 | -0.15 | 0.16 | 0.29 | 0.21 | -0.19 |
Camel_1 | 1 | 0.57 | 0.64 | 0.33 | 0.54 | 0.18 | 0.23 | 0.16 | -0.01 | 0.08 | 0.3 | 0.15 | -0.02 |
Camel_2 | 1 | 0.64 | 0.66 | 0.37 | 0.6 | 0.26 | 0.28 | 0.21 | -0.08 | 0.1 | 0.31 | 0.29 | -0.11 |
FitNesse_1 | 1 | 0.56 | 0.61 | 0.37 | 0.5 | 0.02 | 0.28 | 0.3 | 0.09 | 0.26 | 0.39 | 0.32 | 0.05 |
FitNesse_2 | 1 | 0.55 | 0.63 | 0.37 | 0.51 | 0.12 | 0.29 | 0.26 | 0.11 | 0.2 | 0.37 | 0.29 | 0.04 |
Hadoop_1 | 1 | 0.62 | 0.73 | 0.39 | 0.57 | 0.05 | 0.3 | 0.34 | -0 | 0.26 | 0.36 | 0.19 | -0.07 |
Hadoop_2 | 1 | 0.64 | 0.74 | 0.39 | 0.57 | 0.04 | 0.33 | 0.32 | 0.01 | 0.25 | 0.38 | 0.24 | -0.05 |
Log4j_2 | 1 | 0.58 | 0.68 | 0.35 | 0.51 | 0.1 | 0.34 | 0.21 | 0.13 | 0.22 | 0.35 | 0.19 | 0.1 |
Lucene_1 | 1 | 0.54 | 0.79 | 0.25 | 0.48 | 0.06 | 0.33 | 0.16 | 0.01 | 0.17 | 0.3 | 0.27 | -0.03 |
Lucene_2 | 1 | 0.55 | 0.79 | 0.24 | 0.5 | 0.01 | 0.33 | 0.17 | -0.02 | 0.18 | 0.31 | 0.3 | -0.06 |
Maven | 1 | 0.52 | 0.7 | 0.24 | 0.51 | 0.27 | 0.3 | 0.19 | -0.03 | 0.19 | 0.33 | 0.16 | -0.12 |
Struts | 1 | 0.45 | 0.65 | 0.36 | 0.45 | -0.06 | 0.2 | 0.14 | 0.15 | 0.1 | 0.24 | 0.14 | 0.14 |
Zookeeper | 1 | 0.59 | 0.69 | 0.25 | 0.56 | 0.03 | 0.32 | 0.17 | -0.12 | 0.21 | 0.36 | 0.33 | -0.13 |
Derby | 1 | 0.69 | 0.74 | 0.36 | 0.63 | 0.3 | 0.33 | 0.19 | -0.1 | 0.22 | 0.36 | 0.41 | -0.15 |
Avg. | 1 | 0.57 | 0.69 | 0.32 | 0.52 | 0.09 | 0.29 | 0.21 | 0.01 | 0.18 | 0.33 | 0.23 | -0.03 |
Table: How method size correlates with all other structural properties.
The two tables below show how Dependencies from and Impact set correlate with all other significant properties. Note the strong correlation of 0.93 between the two, with the lower correlation of Impact set with Size deciding the winner.
Impact Set | S. | D. f. | C. c. | T. d. | I.S. | A. p. c. | A. | C. d. | Impd. S. | C. | M. | D. | D. o. |
Junit | 0.36 | 0.87 | 0.15 | 0.47 | 1 | 0.13 | 0.33 | 0.5 | -0.26 | 0.19 | 0.37 | 0.08 | -0.24 |
ActiveMQ | 0.63 | 0.96 | 0.47 | 0.45 | 1 | 0.08 | 0.34 | 0.16 | -0.23 | 0.19 | 0.31 | 0.23 | -0.23 |
Camel_1 | 0.54 | 0.93 | 0.28 | 0.62 | 1 | 0.15 | 0.28 | 0.22 | -0.14 | 0.16 | 0.33 | 0.15 | -0.12 |
Camel_2 | 0.6 | 0.93 | 0.36 | 0.6 | 1 | 0.28 | 0.31 | 0.34 | -0.18 | 0.17 | 0.32 | 0.27 | -0.18 |
FitNesse_1 | 0.5 | 0.92 | 0.21 | 0.59 | 1 | 0.03 | 0.35 | 0.51 | -0.14 | 0.29 | 0.43 | 0.32 | -0.12 |
FitNesse_2 | 0.51 | 0.92 | 0.25 | 0.56 | 1 | 0.13 | 0.36 | 0.44 | -0.13 | 0.23 | 0.41 | 0.3 | -0.13 |
Hadoop_1 | 0.57 | 0.93 | 0.4 | 0.62 | 1 | 0.11 | 0.34 | 0.65 | -0.15 | 0.3 | 0.4 | 0.18 | -0.18 |
Hadoop_2 | 0.57 | 0.91 | 0.36 | 0.63 | 1 | 0.09 | 0.35 | 0.66 | -0.14 | 0.29 | 0.37 | 0.22 | -0.16 |
Log4j_2 | 0.51 | 0.94 | 0.34 | 0.53 | 1 | 0.06 | 0.37 | 0.32 | -0.13 | 0.23 | 0.39 | 0.24 | -0.11 |
Lucene_1 | 0.48 | 0.92 | 0.31 | 0.44 | 1 | 0.09 | 0.39 | 0.31 | -0.14 | 0.22 | 0.34 | 0.25 | -0.17 |
Lucene_2 | 0.5 | 0.92 | 0.35 | 0.44 | 1 | 0 | 0.4 | 0.34 | -0.16 | 0.24 | 0.36 | 0.3 | -0.19 |
Maven | 0.51 | 0.98 | 0.35 | 0.37 | 1 | 0.18 | 0.36 | 0.28 | -0.13 | 0.22 | 0.38 | 0.19 | -0.17 |
Struts | 0.45 | 0.98 | 0.3 | 0.57 | 1 | 0.1 | 0.28 | 0.16 | -0.03 | 0.15 | 0.25 | 0.16 | -0.02 |
Zookeeper | 0.56 | 0.96 | 0.32 | 0.48 | 1 | 0.13 | 0.37 | 0.27 | -0.19 | 0.26 | 0.42 | 0.37 | -0.19 |
Derby | 0.63 | 0.9 | 0.44 | 0.57 | 1 | 0.32 | 0.33 | 0.36 | -0.26 | 0.25 | 0.34 | 0.37 | -0.27 |
Avg. | 0.52 | 0.93 | 0.32 | 0.53 | 1 | 0.11 | 0.34 | 0.37 | -0.15 | 0.22 | 0.36 | 0.23 | -0.16 |
Table: How impact set correlates with all other structural properties.
Dependencies from | S. | D. f. | C. c. | T. d. | I.S. | A. p. c. | A. | C. d. | Impd. S. | C. | M. | D. | D. o. |
Junit | 0.51 | 1 | 0.27 | 0.4 | 0.87 | 0.14 | 0.43 | 0.38 | -0.23 | 0.26 | 0.45 | 0.11 | -0.22 |
ActiveMQ | 0.65 | 1 | 0.5 | 0.41 | 0.96 | 0.07 | 0.36 | 0.15 | -0.24 | 0.2 | 0.33 | 0.25 | -0.24 |
Camel_1 | 0.57 | 1 | 0.33 | 0.57 | 0.93 | 0.15 | 0.33 | 0.18 | -0.14 | 0.19 | 0.37 | 0.19 | -0.11 |
Camel_2 | 0.64 | 1 | 0.43 | 0.54 | 0.93 | 0.25 | 0.36 | 0.28 | -0.18 | 0.2 | 0.36 | 0.33 | -0.19 |
FitNesse_1 | 0.56 | 1 | 0.27 | 0.52 | 0.92 | 0.01 | 0.42 | 0.44 | -0.14 | 0.35 | 0.49 | 0.38 | -0.12 |
FitNesse_2 | 0.55 | 1 | 0.29 | 0.48 | 0.92 | 0.12 | 0.41 | 0.36 | -0.14 | 0.27 | 0.45 | 0.34 | -0.14 |
Hadoop_1 | 0.62 | 1 | 0.45 | 0.54 | 0.93 | 0.1 | 0.39 | 0.47 | -0.19 | 0.31 | 0.43 | 0.23 | -0.22 |
Hadoop_2 | 0.64 | 1 | 0.43 | 0.52 | 0.91 | 0.07 | 0.42 | 0.45 | -0.17 | 0.31 | 0.44 | 0.28 | -0.19 |
Log4j_2 | 0.58 | 1 | 0.43 | 0.48 | 0.94 | 0.06 | 0.43 | 0.3 | -0.09 | 0.26 | 0.45 | 0.27 | -0.07 |
Lucene_1 | 0.54 | 1 | 0.39 | 0.37 | 0.92 | 0.07 | 0.44 | 0.24 | -0.17 | 0.25 | 0.38 | 0.29 | -0.2 |
Lucene_2 | 0.55 | 1 | 0.42 | 0.37 | 0.92 | -0.01 | 0.44 | 0.25 | -0.2 | 0.26 | 0.39 | 0.35 | -0.22 |
Maven | 0.52 | 1 | 0.37 | 0.35 | 0.98 | 0.15 | 0.37 | 0.27 | -0.13 | 0.22 | 0.4 | 0.2 | -0.16 |
Struts | 0.45 | 1 | 0.32 | 0.55 | 0.98 | 0.08 | 0.3 | 0.16 | -0.04 | 0.16 | 0.28 | 0.17 | -0.02 |
Zookeeper | 0.59 | 1 | 0.33 | 0.44 | 0.96 | 0.1 | 0.4 | 0.22 | -0.19 | 0.24 | 0.47 | 0.44 | -0.19 |
Derby | 0.69 | 1 | 0.51 | 0.46 | 0.9 | 0.35 | 0.4 | 0.25 | -0.25 | 0.28 | 0.39 | 0.45 | -0.26 |
Avg. | 0.57 | 1 | 0.37 | 0.47 | 0.93 | 0.1 | 0.39 | 0.3 | -0.16 | 0.25 | 0.41 | 0.27 | -0.16 |
Table: How dependencies from correlates with all other structural properties.