Notes: How to avoid messy code?


Note 1.

The analysed systems were (presented here as their versioned jar files):

  1. torque-runtime-4.0.jar
  2. uima-cpe.jar
  3. uima-tools.jar
  4. xstream-1.4.4.jar
  5. zookeeper-3.5.2-alpha.jar
  6. abdera-1.1.3.jar
  7. ant-1.5.2.jar
  8. apache-cassandra-1.0.0-beta1.jar
  9. c3p0-0.9.1.1.jar
  10. camel-core-2.9.8.jar
  11. catalina-tribes.jar
  12. catalina.jar
  13. cxf-core-3.1.8.jar
  14. cxf-rt-frontend-jaxrs-3.1.8.jar
  15. derby-10.9b10.2.0.jar
  16. ecs-1.4.2.jar
  17. fitnesse-20151230.jar
  18. gremlin-core-3.2.3.jar
  19. gremlin-shaded-3.2.3.jar
  20. hadoop-0.9.2.jar
  21. hppc-0.7.1.jar
  22. ivy-2.3.0.jar
  23. javax.mail-1.5.2.jar
  24. jaxb-impl-2.2.1.1.jar
  25. jcs-1.3.jar
  26. jenkins-core-2.7.1.jar
  27. jgroups-3.6.8.Final.jar
  28. junit-4.92
  29. log4j-o-1.2.16.jar
  30. lucene-core-4.3.1.jar
  31. maven-core-3.3.9.jar
  32. netty-all-4.0.40.Final.jar
  33. q-activemq-broker-5.13.4.jar
  34. quartz-2.0.0.jar
  35. snakeyaml-1.15.jar
  36. spring-core-3.2.2.jar
  37. struts2-core-2.3.8.jar
  38. tomcat-coyote-9.0.0.mten.jar


Note 2.

Correlations were calculated against not a particular property value but against the average value of that property, that is, the property value divided by the number of methods, classes or packages, depending on the level.

This was done to avoid scaling effects. Most properties scale with size: the bigger the program, the more circular dependencies it has, and the more public methods it has, and the larger depth value it has, etc. These are extensive properties. But structural disorder is an intensive property: its value is a percentage, always limited to 100, and does not double in size when the system doubles in size.

To ensure, therefore, that we are measuring like-with-like, we measure the average of a property value to obtain an intensive property, and find the correlation of this average value with the structural disorder.

As mentioned, the post shows only the highest correlations, but you can see the full matrices of all properties correlated with one another (both averages and raw property values) on all levels: method, class and package.

The programs were analyzed with Spoiklin Soice.

It is amazing to see how seemingly unrelated properties correlate with one another, and how uncorrelated are others that we might have suspected to be closely related. (And please excuse the, "Average average," property as an artifact of generating the table.)


Note 3.

The correlation used was Spearman's rank correlation coefficient.

The usual caveat: correlation is not cause. The purpose of this analysis is to find objective correlations between structural disorder and various code properties and it's true, such correlations do not necessarily mean that reducing those properties also reduces structural disorder. This blog has, however, argued for a long time that depth causes structural problems, and finding such a high correlation between depth and structural disorder does not weaken that view.