CSC207 Software Design
Lectures
Refactoring

Background

Previous lecture looked at static design patterns

How the code looks at a moment in time

This lecture looks at dynamic patterns

How code changes over time

The key is to ask "Why does code change?"

Answer: to make it better

So looking at why code changes tells us:

What ought to be fixed

How it ought to be fixed

Problem #0: Unreadable Code

Different people have different definitions of readable

But inconsistency is always hard to read

Variable naming conventions

Indentation

Code organization (i.e. order of methods in source file)

When adding new code, always conform to existing style

Problem #1: Duplicated Code

Any code that appears in two or more places will eventually be wrong in at least one

Very hard errors to track down...

...since you "know" that you fixed it

Solution: extract common code

Make it a utility method in the class

Or a method in a common parent class

Or create a new class to encapsulate common operations

Note: utility method with five or six Boolean flags is not an improvement

Problem #2: Long Method

Short-term memory can hold seven items (plus or minus two)

So your readers will only understand a quarter of a method that does thirty things

Solution: keep methods short

Should be possible to describe what the method does in a single sentence

A page or screen can show roughly 80 lines

If your method is longer than this, break it into parts

Problem #3: One Big Class

Each class should do one thing well

Should be possible to describe its role in a single sentence, too

Don't add members/methods just because the class is there

Solution: divide class into logical pieces

Siblings

Parent/child

Inner classes

Interface should be no more than a page

Problem #4: Long Parameter List

Methods should only have a few parameters

Seven plus or minus again

If a method takes eleven strings as parameters, sooner or later you'll pass them in the wrong order

Solutions:

Store values in object's state

But do not use state for passing temporaries around

Store co-occurring parameters in utility objects

E.g. use a Rectangle instead of x0, y0, x1, and y1

Saves you a lot of typing, too

Problem #5: Divergent Derivation

Child classes do something conceptually different from parent

E.g. deriving PathName from String

Or deriving IndividualBird from SpeciesPenguin

Liskov Substitution Principle: it must be possible to use any instance of any derived class where an instance of the parent class could be used

Quick test: if you have a list containing instances of both parent and child types, can you iterate over that list and call arbitrary methods safely and sensibly?

Another quick test: can you pass a child object to every method declared to take the parent type as a parameter?

Solutions

Contain, don't derive

Start a new class hierarchy when you need to

Design by Contract

A class is specified by the pre-conditions and post-conditions on its methods

What the caller must guarantee is true before

What the class guarantees is true after

A method in a derived class can:

Weaken pre-conditions (i.e. take more kinds of input than its parent class)

Strengthen post-conditions (i.e. produce a sub-set of the output its parent class produced)

Problem #5: Scattered Changes

A change in one place requires you to change things in lots of other places

E.g. changing the type of the map you create in one constructor means changing it in the other four constructors as well

Solution: factor out common code

Create a utility method

Possibly taking some flags as parameters

Create a utility class

The wood teaches the carpenter

As you work on an application, you will discover new things about it

Patterns will emerge in code

Similar problems will arise in different places

In theory, rigorous up-front design could identify all of these in advance

In practice, design that rigorous is actually coding

Only experience can teach you when to switch from design to implementation and back

Hint: I have never met an undergraduate who spent too little time on design...

Problem #6: Method in Wrong Class

A class spends all of its time calling methods on another class

E.g. the methods of class A do nothing but make calls to class B

Solution: put the methods where they belong

But do not create One Big Class

Explicit Switch Statements

The switch statement makes decisions by matching simple values

A dumbed-down if-then-else

Largely a legacy of C

In many cases, can be replaced by defining and overriding methods

Adding new cases then doesn't require scattered changes

Exception is switching on atomic types (like characters)

Complex Conditionals

If ((A && B) || ((!A) && (!C))) is:

Hard to read

Fragile

Chain two or three of these together in an if-then-else, and it is impossible for readers to figure out what's going on

Solutions:

Break into pieces, using well-named temporaries

Use classifier methods

Use tables

Original Conditional

if ((a.getX() < b.getX()) && (a.getY() < b.getY())) {
    return -1;
}
else if ((a.getX() > b.getX()) && (a.getY() > b.getY())) {
    return 1;
}
else {
    return 0;
}

Break Conditionals Into Pieces

boolean xLess = a.getX() < b.getX();
boolean yLess = a.getY() < b.getY();
if (xLess && yLess) {
    return -1;
}
else if (xLess || yLess) {
    return 0;
}
else {
    return 1;
}

Conditional Classifiers

if (bothLess(a, b)) {
    return -1;
}
else if (bothGreater(a, b)) {
    return 1;
}
else {
    return 0;
}

Conditional Tables

static int[2][2] result = {{-1, 0},
                           { 0, 1}};
int xIndex = a.getX() < b.getX() ? 0 : 1;
int yIndex = a.getY() < b.getY() ? 0 : 1;
return result[xIndex][yIndex];

Problem #7: Inconsistent Initialization

Some constructors initialize members that others don't

Or some paths through code assign to variables, but others don't

Java compiler tries to warn you...

...but other languages aren't as helpful

Solution: guarantee object's state in between operations

State assertions

Often write an isValid() method to check this

Problem #8: Inconsistent Interfaces

Consistency lessens the burden on users' memories

Similar parameters should always be passed in the same order

E.g. input file, then output file, then filter object

Methods that do similar things should have similar names

The standard C library does a reasonable job of this

People will see consistency even when it doesn't exist

Do not use English spelling as a model

"ghoti" is pronounced "fish"

A Note on Style

Lots of arguments over trivia

Where the curly braces should go

m_name vs. name_ vs. szName vs. fName

Readability is the most important thing

Even more important than consistency

Laying out code well is an uncelebrated art form

If anyone else tells you that one style is better than another, ask for their proof

Not their reasoning

Not their anecdotes

But their independent double-blind study

Remember, life is short...


$Id: refactor.html,v 1.1.1.1 2004/01/04 05:02:31 reid Exp $