Thank you, Mr Pearson: notes.
1. The eight programs analysed were:
- P1 - Apache Hadoop, 8 revisions: 0.1.0 - 0.5.0.
- P2 - Apache Hadoop, 9 revisions: 0.6.0 - 0.9.2.
- P3 - Fitnesse, 15 revisions: 20121220 - 20151230.
- P4 - JUnit, 12 revisions: 4.0 - 4.92.
- P5 - Log4J, 16 revisions: 1.2.1 - 1.2.17.
- P6 - Apache Lucene, 15 revisions: 3.0.1 - 5.4.0.
- P7 - Maven, 9 revisions: 3.0.0 - 3.3.9.
- P8 - Struts, 12 revisions: 2.0.5 - 2.3.8.
Only core jar files were analysed where programs were very large.
The analysis requires that successive release use the same Java
compiler version, as whether a method had changed is identified by
whether the its bytecode changed. Hence Hadoop releases are split in
two around a compiler update.
All data available on request.
2. As usual, the caveat must be made that the only
way to ensure that a method has been updated between project releases
is to manually check all methods, and this was NOT done in this
case.
Instead, the project releases were sieved into a code analyser which
checked automatically for changes by comparing the before and after
bytecode of each method. This will not catch all method changes, and
could conceivably flag unchanged methods as changed. Hence the results
of this experiment are not definitive.
3. The properties investigated were:
- Size - size (in bytecode) of the given method.
- Dependencies from - number of dependencies leading from an method.
- Conditional count - number of conditionals in the bytecode, which roughly corresponds to conditionals in the source code, for example if-statements and loop boundary checks.
- Impact set - size of the impact set of a given method, that is, the complete set of all other methods that the given method depends on, either directly or transitively.
- Middle-man - shows whether an method is a middle-man, that is, it shows an method that could potentially be removed and instead have the parent do all the work that it did.
- Complectation - number of complected methods of multiple transitive
dependencies. If a transitive dependency has multiple dependencies on
another transitive dependency then it may be possible to access one
method of the target transitive dependency through multiple
paths. This sometimes (but not at all always) suggests an unnecessary
duplication of method invocation, artificially raising the impact set of the system and thus exposing the system to uncessary potential
ripple effects.
- Transitive dependencies - number of all the transitive dependencies involving an method.
- Potential coupling - the absolute potential coupling of this method. For example, the absolute potential coupling of function A is the number of other functions that A could depend on, i.e., that are accessible from A.
- Amplification - the amplification generated by this method. This is essentially a measure of how the number of transitive dependencies in which this method is involved is increased as a combinatorial effect of dependencies on and from this method.
- Circular dependencies - number of circular dependencies between methods. For example, if function a() calls function b() and function b() calls function c() and function c() calls function a(), then a(), b() and c() form a circular depdendency
- Dependencies on - number of dependencies on an method.
- Impacted set - This analysis shows the impacted set of a given method, that is, the complete set of all other methods that depend on the given method, either directly or transitively.)
- Duplication - number of common sequences of method invocations greater than a minimum value (currently 2). For example, if function a() calls functions x(), y() and z(), and function b() calls functions x(), y() and z(), then both a() and b() will be shown as sharing common calling sequences x(), y() and z().
4. We would like our properties to be independent of
one another, but it's likely that many might be dependent on method
size: basically, larger methods are more likely to contain more
dependencies, more cyclomatic complextity, etc., as is possibly
suggested in the attempt, below, to find One Metric To Rule Them All
by combining existing ones, yet producing results no better than the
uncombined versions. We will examine this in a later post.
Property combination
|
Change coefficient
|
Size
|
0.43
|
Dependencies from
|
0.45
|
Condition count
|
0.35
|
Impact set
|
0.35
|
Size + Impact set
|
0.45
|
Size + Dependencies from
|
0.43
|
Size + Dependencies from + Condition count + Impact set
|
0.45
|
5. There are two separate metrics: impacted set and impacted set, see figure 1 below.
Impacted set is the set of all methods that depend directly or indirectly on a method. Impact set is the set of all methods that a given method directly or indirectly depends on. The impact set seems to correlate with method updates whereas the impact does not. Who knew?
Figure 1: Left: all methods. Middle: Impacted set of a selected method. Right: Impact set of that same selected method.