The Art of the Spreadsheet. Copyright 2008 John F. Raffensperger

1. Why is spreadsheet style important?
2. Make your spreadsheet read from left to right and top to bottom.
3. Omit unneeded bytes.
4. Omit unneeded sheets.
5. Organize blocks with care.
6. Attend to blank space.
7. Omit unneeded cells.
8. Format with caution.
9. Show all the information.
10. Spreadsheet errors.
11. How to audit a spreadsheet.
12. Suggestions for operations researchers.
13. Teaching the art of the spreadsheet.
Appendix. Checklist for a spreadsheet.

4. Be concise with sheets

A widely held opinion about spreadsheets is that multiple sheets help readability. Spreadsheet writers will tell you they use multiple sheets for user friendliness, because they think that breaking the model into modules makes a large spreadsheet easier for the reader to digest. The modules may be abstract, with different modules for data, formulas, and output. Or the modules may be concrete, with modules associated with specific aspects of a business problem, such as people, time, and preferences.

Sheet tabs give the spreadsheet an outline, so the reader can jump directly to a major section without having to scroll. Often, however, the writer has not organized the spreadsheet very well, so Sheet 2 is a module that really fits between Sheet 1, parts 1 and 2. In text, a writer should not insist that the reader jump back and forth between sections; most writers know that. Text is written to flow from top to bottom and left to right. The outline helps the reader keep track of the main ideas. But hyperjumping back and forth from Sheet 1 to Sheet 2 can be nauseating.

Three-dimensional data, intuitively, seems naturally put on multiple sheets. We picture in our minds a set of sheets layered like blankets on a bed, and we assume that the reader will immediately grasp this image as well, at the first view.

Separate sheets seem to be useful for specially formatted reports. The report is formatted one way on one sheet, and formatted a different way on a different sheet. Never mind that a change in one place often requires chasing down the same annoying details everywhere else.

Multiple sheets may be useful for databases that change in size. The large set of changeable data is on one sheet, with the query and reports on other sheets. Related to this, those who used to advocate a stair step structure, to avoid problems with row or column insertion, now advocate the use of multiple sheets. By putting blocks on different sheets, changing one block will not affect the other blocks.

These uses for multiple sheets seem to be compelling, but I am going to make a case for fewer sheets. In my own experience, I have never seen the need for more than one, though I have sometimes seen that a second one did not hurt. Multiple sheets are a negative feature - something you can do easily, but probably should do sparingly, if at all.

D. Mather, “A framework for building spreadsheet based decision models,” Journal of the Operational Research Society, 1999, vol. 50, pp. 70-74.

The suggestion that multiple sheets are bad is controversial (Mather, among many others, disagrees), and one sheet may not be optimal in every case (just in every case that I have seen).

What is wrong with multiple sheets? Multiple sheets create a surprising number of usability and readability problems.

First a definition. A spurious cell is a cell that simply points to another, as in “G8=Input!G15.” A spurious cell has exactly one precedent. Mathematically, this is the same as a = b. The writer cannot see data on another sheet, so he adds cells that simply refer to it.

Multiple sheets breed spurious cells and labels, the way dogs have fleas. Spurious cells appear as information is “copied” from other sheets, usually from the “Input” sheet. The information is there “just in case the reader wants to see it.” It is ironic that an “input” should be thus copied onto other sheets - is it an input, or is it a formula? A reference to a constant is a formula, but the writer wants to remind us of constant data. Such copying can duplicate the entire “input block” on every sheet! Failure to be concise in the number of sheets motivates the use of “otiose symbols,” as Higham put it.

Spurious cells give the reader the temptation to check back and forth, to check the formula visually. If the reader understood the meaning and context in the previous screen, his first reaction will be to go back and look at it - “What was that again? - oh yes, this is the same as that.” A waste of time. We will discuss spurious cells again later.

 

Context comes from building the spreadsheet from left to right and top to bottom, and from short visual distances between related items. Multiple sheets dislocate related blocks; the reader must remember a context from one screen to the next. When moving between screens, the reader worries, “I hope it's laid out the same as the previous one I just spent all this time learning.” If it is, the reader has good luck, and has only to remember the layout from the previous screen to understand the new one. If not, if it is a differently designed “module” altogether, the reader must learn a new format. Heaven forbid that it be the same data type broken up or transposed!

Here, SCREEN3 has been badly rearranged from the original, sorted by Shift, rather than by Name. The reader will prefer that SCREEN 2 and SCREEN3 had the same layout, rather than this jarring discrepancy.

Multiple sheets are hard to navigate. Compared to navigating one sheet, keystrokes and mouse controls to change sheets are more complicated. Most people do not know the key sequence for changing sheets (Ctrl-Pg Up, Ctrl-Pg Dn). So they must point with the mouse. But most users know how to navigate a single sheet just tab or Pg-Up and Pg-Dn.

Multiple sheets make information harder to find. The search function in Excel does not scan different sheets by default, but only the selected sheet. To search several sheets at once, the reader must click each sheet tab while pressing the control or shift key. Following this, the search will scan the selected sheets. Most users will not know such arcana. Here, we have tried to find the word “Formulas” on the Formulas sheet, but the search fails.

To be perfectly fair, Excel cannot search the sheet tabs, so the search in the figure would fail even if the user had selected all sheets, unless the word “Formulas” were in a cell somewhere. The point is that a multi-sheet search is harder than a search on one sheet.

Multiple sheets even make it harder to find blank. Writers frequently leave blank sheets in the file, because they may need them later, and do not bother to delete them. This can be disconcerting to the reader. Above, is there information on Sheet6? There is no way to know except to open up Sheet6 and look, a waste of time. Imagine sending a business proposal with a bunch of blank pages at the end! So if you must use multiple sheets, at least delete the blank ones.

Multiple sheets make auditing tools harder to use. As we saw above, an arc of precedence to off-sheet cells displays an uninformative “off the sheet” icon. The reader cannot visually see how cells are related, because they are not on the same sheet. It is like having an important reference or definition at the end of the book, rather than where we need to see it. You can click on the arc, and Excel will open a dialog box to let you go to the off-sheet cell, but auditing is no longer a purely visual process. It is just the same as reading the formula, and then clicking your way to the reference.

A common argument for using extra sheets is because the data are three-dimensional. However, what should be done if the data have four or more dimensions? Or worse - suppose the writer has a nice three-dimensional model on multiple sheets, then his boss tells him to add another dimension. Now his elegant structure becomes convoluted. The writer should address the problem more directly by finding a logical layout for the multi-dimensional data in just one or two visual dimensions.

Assign.xls has at least three dimensions - ex-president, the day of the week, and the shift. We may consider “assignments, preferences, and constraints” to be a fourth dimension. The original writer chose to use this last dimension as the one to align across sheets - one sheet for assignments, one for preferences, and one for constraints - but he could have chosen the dimension of ex-president instead.

 

 

 

 

What about large spreadsheets? Isn't it more convenient for large files if they are broken up in different sheets? In fact, large spreadsheets usually can be reduced considerably in size simply by moving everything to one sheet, then eliminating the duplicate labels and spurious cells.

For example, this spreadsheet requires only one sheet, three screens tall, but it duplicates mathematically the above monster “OTBT,” which has 5 sheets and 11 screens.

The structure here is quite typical of the better multi-sheet models that I have seen. The writer thought that separating the Input, Part A, and Part B into different sheets was an improvement. For this type of data structure, I think there may be a still better way.

The spurious cells made you look back and forth, didn't they? Suppose the boss comes along and asks us to make just one little change: have a two-week planning horizon instead of just one. Now we have to fix every sheet.

You can test your model structure by asking yourself what you would have to do if you needed to make a change like this. Change “days” to “hours,” or “presidents” to “government bodies.“

Here, I transposed the previous model, and put it all on one sheet. Then I deleted the duplicate information. The duplicate information should have been a clue that the data belonged together. Also, the duplicate information added significantly to the bulk of the file.

Spreadsheets on multiple sheets are often big because they are on multiple sheets. By aligning all the data on the shared dimension, we find a concise model that fits within the eyespan.

This quote from Tufte makes the point better than I can.

Tufte, Edward, Envisioning Information, 1990, Graphics Press, Cheshire, Conn., p. 24.

In flatland, after all, every opportunity to spread additional data over an already-available dimension must be cherished.

But in the wild, spreadsheets are not even as simple as the one above. More commonly, a big multi-sheet model will have complex flows, like this one.

Well, this is a headache. This spread-out model is confusing, even with the arrows. Matching references from sheet to sheet can be quite challenging. Because we cannot see the references, we are likely to create one-off errors, where formulas point to the header label at the top or left. Links between sheets are harder to get right than links on one sheet - we need to clicky-click back and forth to make the link and then to make sure it is right.

The data is inanimate - if you change a number in one place, you do not immediately see the result in another.

The context is shattered in pieces. Imagine a book that holds a map of a large city. Each page is a different section. To get from one place to another, you first have to find the page with your current location. Next, you find the page with your destination. Finally, you have to trace through several pages to find the way to your destination. Because you cannot see the big picture all in one place, you may not be sure that you have the best route.

By aligning the data on a single sheet, in left to right and top to bottom flow, we find that the monster becomes a single map of whole cloth.

Furthermore, with the inputs on the margin, we get visual immediacy. If we change an input, we can see the effect instantly, without changing screens.

Tufte wrote,

Tufte, Edward, Envisioning Information, 1990, Graphics Press, Cheshire, Conn., p. 67.
William Strunk and E.B. White, The Elements of Style, third edition, 1979, MacMillan Publishing Co., Inc., New York, p. 28.

Constancy of design puts the emphasis on changes in data, not changes in data frames.

And again, from Strunk & White,

The position of the words in a sentence is the principal means of showing their relationship. Confusion and ambiguity result when words are badly placed. The writer must, therefore, bring together the words and groups of words that are related in thought and keep apart those that are not so related.

Separating information onto different sheets by any dimension causes a loss of view and a breaking of context. The basic rule is that if cells can be aligned on any dimension (e.g. time or people), then the cells belong on the same sheet. If you do use separate sheets, at least use them only for fundamentally different data types that have little direct relationship.

Multiple sheets lengthen formulas. Having an Input sheet and a Formulas sheet requires the formulas to have a reference to the name of the Input sheet. By placing all the information on the same sheet, the formulas can be shortened.

A single sheet will calm your reader - there's a guarantee of less information to check, less screen area to search. A single sheet helps the reader feel there will be less effort required to understand the spreadsheet, and make the spreadsheet more predictable. Here, for a different model, the writer has taken the effort to put everything on one sheet, and also to delete unneeded sheets.

In Excel, you can easily set the default number of sheets to one. Select Tools, Options, General. Change Sheets in new workbook to 1.

Now we can get rid of some screen clutter by eliminating the display of tabs altogether (in Tools Options View).

Incidentally, irrelevant noise can take the form of too many tool bars. If we turn on Excel's Standard, Formatting, Auditing, and Drawing toolbars, we can display almost 162 cells in 640 by 480 screen resolution. Almost 162, because the last column is truncated slightly.

If we turn off these toolbars, most of which are rarely used, we have less noise on screen, and we can see more of the spreadsheet, 198 cells, about 22% more. The trick is to customize the tool bars and to learn to turn them on and off quickly.

Of course, your reader is responsible for his own screen. Help him by writing your spreadsheets on a single sheet.