The Art of the Spreadsheet. Copyright 2008 John F. Raffensperger

1. Why is spreadsheet style important?

2. Make your spreadsheet read from left to right and top to bottom.

3. Omit unneeded bytes.

4. Omit unneeded sheets.

5. Organize blocks with care.

6. Attend to blank space.

7. Omit unneeded cells.

8. Format with caution.

9. Show all the information.

10. Spreadsheet errors.

11. How to audit a spreadsheet.

12. Suggestions for operations researchers.

13. Teaching the art of the spreadsheet.

Appendix. Checklist for a spreadsheet.

A spreadsheet should be written the way we read

The dictionary defines the word “context” as follows:

the part or parts of a written or spoken passage preceding or following a particular word or group of words and so intimately associated with them as to throw light on their meaning.

When the passage preceding or following a particular group of words is *not* associated with the particular
group of words, the group of words is out of context. The surrounding passage sheds no illumination on the bit we are
studying.

Gopen and Swan (1990) used linguistics and cognitive psychology to study style for technical documents. They wrote,

Since we read from left to right, we prefer the context on the left, where it can more effectively familiarize the reader. We prefer the new, important information on the right, since its job is to intrigue the reader. Information is interpreted more easily and more uniformly if it is placed where most readers expect to find it.

They go on to say that the reader considers contextual information on the left as old, and the reader considers subsequent information on the right as new. Then:

In our experience, the misplacement of old and new information turns out to be the No.1 problem in American professional writing today.

Misplacement of old and new information is also the No. 1 problem in spreadsheets.

Conway and Ragsdale (1977), citing Microsoft (1995), wrote:

D. G. Conway and C. T. Ragsdale, “Modelling optimization problems in the unstructured world of spreadsheets,”The English-reading human eye scans left to right, top to bottom. This fact should be considered and reflected in the spreadsheet design to enhance the auditability of the model.

The Microsoft reference reads:

In general, follow the layout conventions for how information is read. In western countries, this means left-to-right, top-to-bottom, with the most important information located in the upper left corner.

Old and new information in a spreadsheet are input data and formulas. The input data is old, since the spreadsheet user is expected to know it. The output is new, since it is derived information that the reader seeks. Data together with a formula that uses it, then, corresponds to a syntactic unit like a sentence or paragraph, with a beginning and an end. We expect to see the data first, at the beginning, and then when we have understood that, we expect to find the output formula, nearby, to the right or below nearby.

Here, the writer wants to say, “If you have an interest rate per year of 7%, a 20-year time horizon, and you want to borrow $100,000, then the payment per year will be $9,439.” The old information is the interest rate, the term, and the loan. The new information, the bit we are curious about, is the payment.

Since intermediate formulas logically are data for later formulas, the rule applies recursively to all numeric cells in a spreadsheet. Therefore, a formula should generally depend only upon cells above and to the left. To write clearly, we should not write formulas that refer to cells below or to the right of the current cell. When new information is placed where we naturally expect it, the spreadsheet is more predictable to the reader. Interestingly, spreadsheets recalculate faster when written this way.

Steve J. Davis, “Tools for spreadsheet auditing,” *International Journal of Human Computer Studies*, Oct., 1996, vol. 45, p. 429-42.

Archer (1989) and Davis (1996) observed that cell relationships can be represented as a directed graph, which Archer called a cell relationship diagram. Davis gave evidence of users' difficulty in understanding the spreadsheet structure, because precedents and dependents were hard to see.

Why are precedents and dependents hard to see? First, a formula requires the reader to look up its dependents in a verbal way. “So I am looking at cell B3. The formula is C1*(A5+B5). So the dependents are C1, A5, and B5. Okay, where is cell C1. Let's see, column A, B, C, row 1. Oh there it is.” The search is not a visual one of following shapes. Instead, it requires the verbal skills of remembering the references in the address and searching for the reference in the columns and rows.

The second reason that precedents and dependents are hard to see is that writers often do not help their reader by placing cells logically. Precedents and dependents are hard to see if they are not where the reader expects them.

Tufte, Edward,Tufte wrote (with the original emphasis),

Central to maintaining clarity in the face of the complex are graphical methods that *organize and
order the flow of graphical information* presented to the eye.

Organizing the flow of information means, first and foremost, organizing the spreadsheet from left to right and top to bottom.

Excel has an auditing toolbar (select Tools, Auditing, Show Auditing Toolbar), with buttons to insert blue temporary arrows that point to the dependents or precedents of a cell. We will call these
graphical segments - Archer's cell relationship diagram - the *arcs of precedence. *A spreadsheet reads from left
to right and top to bottom if, for every numeric cell, all the cell's arcs of precedence start above and to the left of
the cell.

Here, these two examples are mathematically the same. The one at left is confusing. The one at right, organized from left to right and top to bottom, shows more clearly how earlier values relate to the last cell.

Again, these are mathematically identical, but the left is confusing. On the right, we see faster that the 7.8 is unrelated to the other cells. On the left, this is easily lost in the confusion, even with the arcs of precedence displayed. On the left, to make sense of the formulas, we find ourselves relying on verbal skills, putting cell addresses in short-term memory.

If a spreadsheet's precedence tree includes blank cells, it is *perverse*. Similarly, a numeric formula is
perverse if it references labels. A perverse spreadsheet is one that depends on information that is missing. The
spreadsheet is incomplete or wrong. Here, cell AE18 perversely depends on the blank cell Z19. To discover a large blank
area on which other cells depend is disturbing indeed.

Clicking several times on the Trace Precedents button displays the complete precedence tree of the cell. This tree can be quite illuminating. If a precedence tree is tangled like a bowl of spaghetti, with arcs of precedence pointing in all directions like the monster here, the spreadsheet legitimately can be called a spaghetti spreadsheet!

A spaghetti spreadsheet is unpredictable, since the reader has no clear idea of its structure after examining a few formulas.

Exceptions to the left-to-right and top-to-bottom rule depend on reader expectations. An accounting balance sheet typically has years in columns. A year's profit in one column at the bottom may flow to the next year in the next column at the top. So the reader may reasonably expect one year's bottom line to flow into next year.

A singleton at the bottom of the spreadsheet may violate this rule properly. The rule “precedents above and left” would put a singleton at the bottom right of a spreadsheet. However, the reader's eye starts scanning at the left, as at the start of a sentence. So singletons may reasonably have precedents above and to the right.

In a spreadsheet, a formula is defined by its precedents. They should be close by. This rule of style has parallels in prose, mathematical writing, and graphics.

George D. Gopen and Judith A. Swan, “The Science of Scientific Writing,”Nicholas J. Higham, *Handbook of Writing for the Mathematical Sciences*, 1993, SIAM, p. 15.

According to Gopen and Swan (1990), old information should be viewed first, and then the related new information immediately following. It is best to introduce or define a concept just before the concept is discussed.

Higham (1993) states,

The practice of giving a long sequence of definitions at the start of a work is not recommended. Ideally, a definition should be given in the place where the term being defined is first used. If it is given much earlier, the reader will have to refer back, with a possible loss of concentration (or worse, interest). Try to minimize the distance between a definition and its place of first use.

Tufte said,

Tufte, Edward,If the visual task is contrast, comparison, and choice - as so often it is - then the more relevant information within the eyespan, the better. Vacant, low-density displays, the dreaded posterization of data spread over pages and pages, requires viewers to rely on visual memory - a weak skill - to make a contrast, a comparison, a choice.

Micro/macro designs enforce both local and global comparisons and, at the same time, avoid the disruption of context switching.

Later, he wrote,

*Comparisons must be enforced within the scope of the eyespan,* a fundamental point occasionally
forgotten in practice. [Italics in original.]

So in text, in mathematics, and in graphics, we see that related pieces of information should be visually located together. Fortunately, this has not been totally lost on the best spreadsheet writers. Conway and Ragsdale (1997) put it like this:

D. G. Conway and C. T. Ragsdale, “Modelling optimization problems in the unstructured world of spreadsheets,”Things which are logically related...should be arranged in close physical proximity and in the same columnar or row orientation.

So do not put related cells far from each other on the spreadsheet. Arrange cells to minimize the number of columns and rows between a formula and the cells to which it refers. A spreadsheet is easier to read, if data (or intermediate calculations) are close to the formulas that use the data.

Short distance and consistent direction create context. Here, cell BO47 is out of context, far from its dependents in BG47 and AG47. Once again, we have to switch from easy visual skills to harder verbals skills. We have to look for cell BG47 over a long distance, and we feel the need to verify it verbally, and to remember which reference is off-screen. The writer should move the column in BO next to the column in BG, and both of those to the left, adjacent to the information in the column off screen.

Context can also be described by the measuring units. The input-formula-output layout ignores measuring units. The business-flow layout will naturally align related measuring units together.

Besides measure units, the business-flow layout produces a better visual display. The beauty of the spreadsheet is its visual function. You make a change and you can instantly see the result on the same screen. You can change the number back, and again, you can instantly see the result on the same screen. The numbers become an animation. If we have the constants very close to the formulas that use them, then we use the power of the spreadsheet in the way it was originally meant to be: visual numerical animation.

Back in the 1970s at Northwestern University, I remember punching FORTRAN cards, giving the stack to an operator, the interminable wait in the basement of the engineering building, and the disheartening errors on the printout. At one point, I got so fed up with punch cards that I swore I would never use a computer again.

In the 1980s, along came Lotus 1-2-3. One sheet, little formatting, no mouse. It was glorious! The beauty of the spreadsheet is visual immediacy. Change a value or formula, and we instantly see the result, on the same screen, without printing, without more keystrokes. I changed my mind about computers.

Now, with Excel, we can have many sheets, lots of formatting capability, VBA, OLE, etc. *Power*. What do we get?
Models posterised on many sheets. “Type input. Click-click. See output.” To see the results of sensitivity analysis, we have to print out the spreadsheet, with penciled notes about what was changed. Developers have forgotten the why of the spreadsheet: visual immediacy.

If you locate a formula close to its precedents, you are more likely to immediately catch an error when creating or changing a formula. For example, if you insert a row, but you do not correct a SUM() formula, you are more likely to notice the problem if the SUM() formula is nearby.

Short arcs of precedence reduce or eliminate duplication, whether the serious error of duplicate input, or the benign confusion and verbosity of redundant output.

Worse than too-long arcs of precedence are arcs of precedence that cannot be seen. In this sheet, the precedence arc for the Preference Total cell points helplessly off screen. The first numeric cell is the Preference Total cell, but no other information about preferences appears on the Model sheet. After hunting, we find the Preferences sheet on a sheet tab at the bottom. The Preference Total cell is out of context, because it is too far from its precedents.

If we use multiple sheets, we are back to Type Input, Run, See Output, and the spreadsheet is just a calculator rather than a visual tool that animates calculations.

The above spreadsheet, used often in this text, was produced by LINDO Systems, Inc. Used with permission.You can find the off-screen precedents by double-clicking the precedence arc. The “Go To” box will open, and you can then select the precedent cell. However, this is rather arcane information - most of your readers will not know that they can find a precedent by double-clicking the arc of precedence.

Even with the best designs, many spreadsheets cannot fit entirely within the eyespan. For large spreadsheets, it is even more important that each cell relate to those nearby, to give the reader a sense of the spreadsheet as a whole cloth, a connected quilt, rather than a gerry-mandered plumbing job.

In this sheet, modified from the one above, we can see the precedents in a single click. If a cell is close to its dependents, the reader will more easily see the relationship between them. The spreadsheet will be naturally organized by blocks of meaning, blocks that reflect the business rather than the mechanical requirements of the spreadsheet. By contrast, arranging by input-calculation-output separates related cells by a large visual distance. A cell in an input block and a cell in a calculation block will be far from each other, so their logical relationship is harder to see.

If a spreadsheet has two unrelated parts, like this one, the precedence tree is unconnected. In these cases, the spreadsheet should probably be split into two separate files. Writers sometimes collect somewhat related bits and pieces into one place for their own convenience, but when writing for an audience, viewing unrelated information together can be confusing.

A cell derives its meaning from its precedents, but it gets its context - the reader's perception of connected meaning to precedents and dependents - by being close to its precedents and dependents. A cell located far from its precedents and dependents is out of context. A cell near its precedents and dependents is in context. If the cell is in context, the reader will more easily perceive the related meaning (the signal). So make formulas refer only to cells above and to the left, and put related cells close together.