The four best refactorings.


Pinch-of-salt time again.

What are the best refactorings?

Well, programmers use refactorings as tools and the best tools are those that best suit the purpose for which they are used. So, what is the, "Best," purpose for a refactoring? Given that refactorings change program structure without altering behaviour, the purpose of any refactoring must be the improvement of program structure. So how do programmers, "Best," improve structure?

Here, at last, accounting departments the world over reach a question that can be answered in glorious deafening unison: make it cheaper! Cheaper to understand and cheaper to modify (whether for correction or feature addition).

The two pincers of this," Cheaper," movement fall loosely into the two modes in which any structure can be said to exist: syntactic and semantic. The syntactic aspect of a structure brutally states its elements and their inter-relationships; the semantic aspect interprets the meanings of these elements and relationships.

Programmers improve structure semantically by clarifying element and relationship meaning so that development teams spends less time figuring out what on earth is going on and more time adding revenue-generating features.

Programmers improve structure syntactically by reducing the degree to which ripple effects propagate, that is, by minimizing the likelihood that a change to one part of the software will cause changes in another, which in turn will cause more changes, etc. The probability of such ripple effects rests entirely on the transitive dependencies between code elements: the more transitive dependencies that buttress a system, the more likely it is that a change to one element will spark changes elsewhere.

Of course, the two modes overlap somewhat: smashing corroded inter-dependencies between packages, for example, usually renders a system more comprehensible, but the modes differ sufficiently to warrant a splitting in two of the original question so that we can instead ask: which refactorings best clarify element and relationship meaning, and which refactorings best reduce transitive dependency (either in number or in length)? Let us take one question at at time, beginning with the latter.

Which refactorings best reduce transitive dependency?

Fowler classifies his refactorings under six pragmatic headings, but a timid step towards finding the, "Best," refactorings might be to re-classify Fowler's refactorings with respect to structure. This will identify the structural element to which each refactoring primarily applies - these elements, in Java, being the method, class and package - and will show whether the refactoring creates (C) or destroys (D) elements, or merely transforms (T) element relationships without creating or destroying elements. This page presents the full list but a sample suffices here; table 1 shows the refactorings that Fowler categorizes as the, "Composing methods," refactorings.

Refactoring Method Class Package
Extract Method C
Inline Method D
Inline Temp
Replace Temp with Query C
Introduce Explaining Variable
Split Temporary Variable
Remove Assignments
to Parameters
Replace Method with
Method Object
C
Substitute Algorithm

Table 1: Composing methods, viewed as Creating, Deleting or Transforming.

Two points stand out.

Firstly, this new classification concerns the, "Primary," effect of a refactoring. "Replace Method with Method Object," for instance, has at least two effects: the deletion (or perhaps alteration) of the relevant method and the creation of the new class to harbor the itinerant behaviour. Most programmers, perhaps, would see the primary effect as the creating of the new class, hence the classification; all models discard some information and here method-deletion, judged of lesser significance, vanishes from view. Granted, some subjectivity seeps into the analysis here, but hopefully not so much as to invalidate the exercise.

Secondly, the majority of the refactorings in the table appear to have no structural impact: they have no C, D or T under any heading. This, of course, sounds ludicrous. The cause of the confusion lies in the half-truth told above, namely that Java has only three structural elements: method, class and package. A fourth structural element of course exists: the lines of source code themselves, their variables, for-loops, operators, etc. Crucial though these are, such micro-structures find themselves excluded from analysis because they fail to contribute powerfully to ripple effect. Ripple effect necessarily involves non-local change and Java jails source code within methods (and initialization blocks): if a method invokes no others then it cannot be affected by changes in any others.

True, a change to a line of code in a method usually does cause changes to other lines of code within that method, but given that most programmers keep method length within reasonable limits such changes pale compared to the monstrous trans-package code-quakes that lay waste to higher structural levels. True, too, that methods can read fields directly from other classes, by-passing method-invocation, but again few programmers indulge in such lawlessness. So though useful, all such source-code only refactorings cannot be considered to greatly impact the probability of ripple effect and so cannot be the, "Best," refactorings.

Excluding refactorings that impact source-code only, then, takes us a step along our path, the reduced set of table 1 being shown in table 2.

Refactoring Method Class Package
Extract Method C
Inline Method D
Replace Temp with Query C
Replace Method with
Method Object
C

Table 2: Composing methods: method-, class- and package-impacting only.

Examining the four remaining refactorings in Table 2 raises yet another issue. A structure's resilience to ripple effect depends not so much on its elements as its relationships. Noting that a program contains 100 classes says little about whether it will cost more to update than a program of 200 classes, all else being equal. But given two programs of 100 classes each, the first having 10 transitive dependencies per class, the second having 20 transitive dependencies per class suggests that the first program will be cheaper to update.

Thus refactorings that create or destroy elements do not contribute to ripple effect to the same extent as those that target dependencies alone. Yes, the creation or destruction of elements also affects dependencies, but these refactorings primarily concern the creating and destroying of elements, with entailing relationship alteration being somewhat incidental. So a search for the, "Best," refactorings means the gathering of transformational refactorings alone, of which Table 2 holds, alas, none.

As it happens, Fowler has kept the total number of purely transformational refactorings quite small, table 3 listing them all.

Refactoring Method Class Package
Move Method T
Hide Delegate T
Remove Middle Man T
Change Unidirectional Association
to Bidirectional
T
Change Bidirectional Association
to Unidirectional
T
Pull Up Method T
Pull Up Constructor T
Push Down Method T
Extract Interface T
Form Template Method T
Replace Inheritance
with Delegation
T
Replace Delegation
with Inheritance
T

Table 3: The transformational refactorings.

Peering closer still, programmers can readily flick some refactorings from this shortlist.

Both, "Replace Inheritance with Delegation," and, "Replace Delegation with Inheritance," clearly transform relationships but only in their type, not in number of elements involved, and hence they do not alter transitive dependencies (which do not depend on the dependency type). These both fall off the shortlist.

The four, "Pull Up Method," "Pull Up Constructor," "Push Down Method," and, "Form Template Method," all concern implementation inheritance, a language mechanism which hardly dominates normal cases of dependency: most classes do not stand in an inheritance relationship with the classes with which they interact, and so most ripple effects do not travel predominantly over inheritance relationships. These four refactorings can therefore be discarded. Again, this does not imply their uselessness, only their unsuitability as candidates for the, "Best," refactorings from a cost perspective.

"Move Method," can also be discarded because herding methods from one class to another does not alter then number of dependencies on or from those methods: such a translations rarely contribute to a reduction in transitive dependencies.

This leaves only the refactorings listed in Table 4.

Refactoring Method Class Package
Hide Delegate T
Remove Middle Man T
Change Unidirectional Association
to Bidirectional
T
Change Bidirectional Association
to Unidirectional
T
Extract Interface T

Table 4: A further shortlist reduction.

Crunch time.

Fowler himself masterfully explains the need for the, "Change Bidirectional Association to Unidirectional," refactoring: "Bidirectional associations force an interdependency between the two classes. Any change to one class may cause a change to another. If the classes are in separate packages, you get an interdependency between the packages. Many interdependencies lead to a tightly coupled system, in which any little change leads to lots of unpredictable ramifications."

Forged precisely and singularly to fight ripple effect, this is the first of the four best refactorings. Unfortunately, Fowler artificially restricts this refactoring to class-level when it effortlessly extends to all. Fowler further constrains the refactoring by suggesting its application only between two mutually dependent elements when this offers just a specific case of the more general circular dependency, in which any number of elements ultimately form a dependency loop, as in: A → B → C → D → A. Eliminating bidirectional and circular dependencies between methods, classes and packages improves program structure because it usually reduces the number and length of transitive dependencies clawing at a program. This is a first-rate refactoring.

The curiously un-sung, "Remove Middle Man," holds that if class A depends on B and B depends on C then refactor so that A depends on both B and C. Syntactically, at least, this is the very definition of reducing transitive dependency length and so wins this refactoring a place in the prizes. Again, Fowler constrains the scope of this refactoring somewhat unnecessarily. Firstly, he discusses only delegates - a role unimportant to syntactic analysis - when it potentially applies to all transitive dependencies regardless of class role. Secondly, he limits the scope to classes when it applies to all structural elements: method, class and package. Thirdly, he offers the example of only three elements where this shy refactoring reveals its benefit least. Reducing the single transitive dependency A → B → C → D → E → F to the three transitive dependencies A → B, A → C → D and A → E → F showcases far greater benefit than the example cited (and two steps closer to sunburst) with little loss of generality. Despite these caveats, however, the refactoring captures a truly essential idea.

"Extract Interface," might, at first, seem an element-creation refactoring rather than a transformational refactoring, but again: what is the primary purpose of this refactoring? It does not simply create an interface which then affects little but the class from which it peels. Instead, it creates an interface to act as new target for all those other classes whose dependencies previously fell on some methods of the original class: its primary purpose is to deflect dependencies. Being thus dependency-oriented, the refactoring comfortably sits in the transformational pigeonhole. And what a transformation! The reasons are subtle, but because interfaces lack implementation and hence lack implementation dependencies, the, "Extract interface," refactoring has more power to reduce program depth than any other. It, too, rightly claims to be one of the, "Best," refactorings.

Table 5, then, presents these three best ripple-effect refactorings: each reduces program depth and makes a program less costly to patch and upgrade better than other refactorings. Of course, this does not imply the encouragement of blind application. Superfluous interface and artificially snipped dependency can create costly chaos if pushed too far; the most well-worn tool in the experienced programmer's toolbox remains prudence. Nevertheless, programs under-interfaced and over-dependent proliferate far more widely today than those at the other end of the spectrum, whatever this says about modern software development.

Refactoring Method Class Package
Remove Middle Man T
Change Bidirectional Association
to Unidirectional
T
Extract Interface T

Table 5: The three best ripple-effect refactorings.

As a bonus, the difference between tables 4 and 5 also present the very worst syntactical refactorings: the almost inexcusable, "Hide Delegate," and the CV-freshening, "Change Unidirectional Association to Bidirectional."

Which refactorings best clarify element and relationship meaning?

This, however, compiles only three of the four best refactorings: the three best syntactic refactorings. What is the fourth and best semantic refactoring?

Fowler makes this choice surprisingly easy. For of his sixty eight refactorings, sixty seven are syntactic: they change either the number of elements or the syntactic relationships between elements. Only one is semantic: it leaves the number of elements and all relationships precisely as they were and thus has no effect on ripple effect. Instead, it changes the meaning of an element in an obvious way: it changes the element's name. Perhaps the most widely used refactoring, "Rename Method," presents an invaluable refactoring, well-deserving its place on the podium though here, yet again, Fowler limits its scope of application too rigidly, applying it to methods only when it applies equally to classes and packages, too.

And the winners are ...

Table 6, finally, presents the four best of Fowler's refactorings.

Refactoring Method Class Package
Remove Middle Man T
Change Bidirectional Association
to Unidirectional
T
Extract Interface T
Rename Method

Table 6: The four best refactorings.

Even in this short final table, something striking surfaces. Consider the difficulty of the full list of refactorings: most seem easy. They can be implemented in minutes. Split Temporary Variable, Move Method, Encapsulate Field, Decompose Conditional, etc. "Rename Method," the semantic refactoring of Table 6, too, also falls under this heading (especially given the power of modern IDEs).

"Change Bidirectional Association to Unidirectional," and especially "Remove Middle Man," can, on the other hand, be difficult refactorings in poorly designed systems. They require precisely the, "Big picture," view, the non-local view of code that characterizes the costliness of transitive dependencies themselves. "Extract Interface," also taxes the programmer mind, for though any IDE can strip an interface from a class, only careful thought ensures a minimum of exposed behaviour and that it becomes exposed only to interested clients.

Can it be a coincidence that many Java programs suffer from such poor macro-structure while the refactorings that cry out for application are the very ones that would take most effort? Might programmers take unmerited pride in reporting to their boss, "I refactor as I go, minute-by-minute," while in reality they spend their time effecting relatively trivial refactorings while the system as a whole collapses beneath their fingertips? A patient gasping for a triple heart bypass will thank no one for a cosmetic nip-tuck. The morgue of software development needs fewer handsome corpses.

This bottom-heaviness, this over-emphasis on micro-structure to the exclusion of systemic health finds disturbing reflection in Fowler's refactorings, all of which distribute quite evenly over the source-code, method and class elements, none of which targets packages explicitly. This is a shame. Ripple effects are beasts of range and connectivity. A system with an excellent macro-structure yet poor individual methods may still defend itself from wildly cascading impacts; a system with a poor package structure has no such immunity regardless of method perfection.

Summary.

The unit of refactoring is the Fowler (F).

Guilt-laden managers have been known to occasionally pull shell-shocked programmers from the front-lines for some R & R, often leading with the words, "Look, take a break. Have a 100 Fowler refactoring on me. Any parts of the code you like. You've got a week."

When such lightning strikes, the programmer does well to remember that the universe did not create all refactorings equal. When the code-base stands firm and proud and structurally sound, by all means dump the entire 100F into breathless beautification. When the code-base creaks and totters, however, perhaps those Fowlers would better be spent in the sweaty and rather thankless drudgery of cutting, bolting and welding at the very core of the girder-work.

Great programming dirties the face.