As programmers, most of us keep our methods small.
Indeed Martin's famed, "Clean Code," tells us, "The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that." sed s/function/method/g
But is this actually true? And how small is small?
Let's side-step trying to identify an absolute number of lines of code or size of bytecode. Instead let's try a relative size.
Take all the methods in a code-base, and order those methods in terms of size (how much bytecode is in each compiled method). Then take the top, say, 20% of those methods. Let's call these the biggest 20%.
How much code is in these methods? More specifically: what percentage of the entire code-base do these biggest 20% contain?
If all methods were of equal size, then these biggest 20% would contain 20% of the entire code-base.
But of course code is not evenly distributed over all methods. So what percentage of code should these biggest 20% contain? Think about this before you look at the table below.
If the biggest 20% contain 30% of all code, you might think that's good: the system isn't too skewed and there aren't many monster methods lurking out there. (Or perhaps all methods are monsters, but thankfully this is seldom encountered.)
If the biggest 20% contain 90% of the code, however, you might consider this a highly skewed system, with many monster methods which are sure to attract most of the code-changes and hence be the most expensive methods in the code-base. And managers really don't like expensive methods.
Perhaps 50% of the code residing in the biggest 20% would seem like a fair threshold, above which you might think a refactoring due.
Let's examine some random, open-source Java programs on GitHub and see what percentages they reveal.
Program | # methods | % code in biggest 20% |
swagger-core | 2307 | 78 |
checkstyle | 7561 | 78 |
spark (core) | 42743 | 77 |
santuario-java | 4679 | 77 |
tomcat | 26840 | 77 |
ant | 11173 | 74 |
atmosphere | 4164 | 73 |
zxing | 1979 | 73 |
jackson | 6732 | 71 |
mybatis-3 | 3113 | 71 |
dubbo | 12093 | 71 |
dropwizard | 3102 | 70 |
logstash (core) | 2519 | 67 |
redisson | 13475 | 66 |
junit4 | 1836 | 61 |
RxJava | 10159 | 57 |
Table 1: How much code resides in the biggest 20% of methods
Of course, this isn't a rigorous, statstical sample of the entire Java code-sphere, but it's hardly insignificant at 154,475 methods. And it looks like we programmers don't write small methods afterall but stuff more than half the code into just one-fifth of the methods.
Does anyone know why?