Who's afraid of the big bad class?

The heart of a black hole, by Phil Plait

Ferocious, guzzling black holes of the software universe, super-massive classes result from run-away positive feedback: the class grows so huge that programmers fear any attempt at refactoring and instead simply dump more functionality into it, thereby making it even more likely that the next programmer will also pale before the terrifying refactoring and jettison yet more functionality into the maw. All stare at the hypnotizing accretion disc knowing that this class must be destroyed, its functionality scattered to a host of smaller classes, but no one knows where to start. Which methods should be moved first? How should this choice be made? What criteria might aid the decision?

Even the revered admit vacillation. Fowler, in his fantastic Refactoring book, writes the puzzling, "If I am not sure whether to move a method, I go on to look at other methods. Moving other methods often makes the decision easier. Sometimes the decision is still hard to make. Actually it is then no big deal. If it is difficult to make the decision, it probably does not matter that much. Then I choose according to instinct; after all, I can always change it again later."

This rather rapid resorting to instinct seems regrettable when several objective engineering criteria stand at hand eager to assist.

Take figure 1, for example, showing a spoiklin diagram - in which each circle represents a method, each line a dependency - of the SequenceDiagram class of (alas) Spoiklin Soice itself.

Spoiklin Soice image

Figure 1: Class SequenceDiagram.

This class has gorged itself fat, an unhappy state which calls for the creation of a new class into which some of SequenceDiagram's functionality may move. Where should a programmer start in the search for methods to move? Four criteria help.

First, the refactoring aims to strike some balance between the donor class and the new class. There seems little point in moving most of the methods from the donor class into the new class as this just transports the complexity from one class to another, solving nothing.

Second, the new class should have integrity in and of itself. It would not do to move randomly chosen and unconnected methods into the new class: its methods must serve some common purpose. Shared dependencies offering a strong indication of shared responsibilities, the methods of the new class should form an inter-dependent unit, a goal best achieved by selecting sets of already collaborating methods from the donor class. Thus the programmer seeks not to move methods ad-hoc but to move sets of dependent methods.

Third, the migration of any candidate set of methods should cause minimal disruption to the rest of the system. Modern tools and IDEs greatly ease the mechanics of moving methods but only the programmer can protect the design left behind from undue mutilation; just as the new classes must enjoy integrity so too must the donor stump. The number of methods that depend on any of those in the candidate set offers an indication of the degree of disruption caused by the refactoring. This number shall have a strong say in candidacy appropriateness.

Armed with these three criteria (the fourth will be encountered shortly), the programmer can approach the search systematically and objectively. The criteria do not guarantee a solution, but they at least avoid premature resignation. In the diagrams that follow - Spoiklin Soice's, "Encapsulatable," analysis - impact sets will be investigated, an impact set being the set of methods touched by all transitive dependencies from any particular method. Each method's circle will be coloured red according to the number of dependencies on that method's impact set: a deep-red circle indicates a method on whose children (and grandchildren, etc.) many dependencies fall suggesting an unsuitability for relocation. Pale-red circles, on the other hand, indicate methods on whose descendants few dependencies fall and thus suggest excellent candidates for transportation. The tool-tip will give the actual number of these dependent methods. (Black circles represent methods on which fewer than two dependencies exist.)

In figure 2 below, the programmer has clicked-on the execute() method to highlight its impact set (again: the methods on which it transitively depends) and found that only five other methods depend on elements of this entire set.

Spoiklin Soice image

Figure 2: Probing for movable methods: too many.

This selection, however, fails because of the first criterion: moving this large a set of methods to the new class might make the new class almost as large and complicated as the donor. Undeterred, the programmer probes further, clicking on the drawDiagram() method in figure 3 below.

Spoiklin Soice image

Figure 3: Probing for movable methods: still too many.

Unfortunately, here again the impact set of the drawDiagram() method appears too bulky to balance the behaviors between the two classes and it, too, must be rejected. The drawClassLines() method then sweeps into the cross-hairs, see figure 4.

Spoiklin Soice image

Figure 4: Probing for movable methods: too much impact.

Here the set of methods to move looks, at first, ideal, the set being neither too large nor too small. This selection is only scuppered by the number in the tool-tip: twelve other methods depend on various methods of this set, rendering it too disruptive an extraction (the method's deep red colour should have been a warning). The search proceeds. In figure 5, below, the programmer clicks on the drawLineFromCallingToCalled() method.

Spoiklin Soice image

Figure 5: Probing for movable methods: a suitable candidate.

Success. This selection offers a set of six methods all serving the same master, on whom only six other methods depend. This strikes a good balance between set-size and disruption and makes a good candidate for a first refactoring. Figure 6 below shows the refactoring in action, with each of the six methods yanked in turn from the SequenceDiagram class.

Spoiklin Soice image montage

Figure 6: Moving six methods.

After this refactoring, the SequenceDiagram class appears as shown in figure 7.

Spoiklin Soice image

Figure 7: First refactoring results.

Note that some extra methods were added to the top line: these were getters necessary for allowing the new class to read some data from SequenceDiagram which maintains data-ownership; these methods will eventually sink into an interface sloughed off to avoid circular dependencies. Being mere getters, they add little complexity to the donor class and thus constitute a worthwhile compromise. For interest's sake, figure 8 shows the new class, CallingLine, with its proud new tenants.

Spoiklin Soice image

Figure 8: A new class is born: CallingLine.

The task, of course, remains unfinished. Despite a successful operation, the donor class still weighs too much and further method sets must be transplanted. Here, the fourth and final criterion comes into play: depth. Good design dictates the minimizing of transitive dependency length where feasible. Studying figure 7 reveals two new method sets' having exposed themselves as glaring candidates: they, "Stick out," from the bottom of the diagram, the deepest reachable methods. Figure 9 shows the first, the positionOwningSetNameBoxes() method's impact set.

Spoiklin Soice image

Figure 9: Continued probing, a first new candidate.

Figure 10 shows the second, that of the stripeBackground() method.

Spoiklin Soice image

Figure 10: Continued probing, a second new candidate.

Though neither impact set boasts great largess, both suffer from only four incoming dependencies each and so their relocation incurs only minor disruption. Figure 11 shows the migration of both impact sets from SequenceDiagram. Note the depth reduction enjoyed by SequenceDiagram's structure as the process unfolds.

Spoiklin Soice image montage

Figure 11: Moving seven more methods.

This second refactoring yields a SequenceDiagram class as shown in figure 12.

Spoiklin Soice

Figure 12: SequenceDiagram after second refactoring.

Finally, studying the deeper structures of figure 12 unearths yet another candidate: the impact set of the processOwningSetNamesPrinting() method, see figure 13 below.

Spoiklin Soice image

Figure 13: Final probe and candidate.

Although small, this set of methods extends dependencies onto the newly created class thus motivating its belonging to the new class as much as the donor. Figure 14 displays the extracting of these three methods.

Spoiklin Soice image montage

Figure 14: Moving final three methods.

Figure 15 shows the final result, the refactored SequenceDiagram class, a class - taken in isolation - easier to understand with dependencies far easier to trace.

Spoiklin Soice image

Figure 15: Class SequenceDiagram, reloaded.

Compare this class to the original, pre-refactored class, see figure 16.

Spoiklin Soice image

Figure 16: The original SequenceDiagram.

Figure 17 shows the final form of the new CallingLine class, it, too, uncomplicated of structure.

Spoiklin Soice image

Figure 17: The final CallingLine class.

Summary.

Fear is inexperience.

Some programmers dread the task of refactoring large classes because they simply do not know where to begin.

Others know better. They know that, no matter how large the class, a suite of simple criteria, iteratively applied, can help illuminate the entire process (a suite, furthermore, that applies to overblown packages, too). These refactorers learn to relish such tasks, seeking out the gigantic monstrosities that lie deep in the heart of most systems, plunging through their event horizons, laughing maniacally at the tickling spaghettification.

Physics-killers.

Photo credit attribution.

CC Image The heart of a black hole courtesy of Phil Plait on Flickr.