The Art of the Spreadsheet. Copyright 2008 John F. Raffensperger
1. Why is spreadsheet style important?
2. Make your spreadsheet read from left to right and top to bottom.
3. Omit unneeded bytes.
4. Omit unneeded sheets.
5. Organize blocks with care.
6. Attend to blank space.
7. Omit unneeded cells.
8. Format with caution.
9. Show all the information.
10. Spreadsheet errors.
11. How to audit a spreadsheet.
12. Suggestions for operations researchers.
13. Teaching the art of the spreadsheet.
Appendix. Checklist for a spreadsheet.
How much effort should you spend on checking a spreadsheet? The answer depends on the amount of money involved. Of course, you should spend more time auditing a spreadsheet that depends on more money. The time required to check each formula depends on the readability of the spreadsheet, but bet on more than a minute per formula.
Sometimes the best way to audit a spreadsheet is to rewrite it. Rather than throwing away the previous author's work, you can edit the spreadsheet so that it is in good spreadsheet style. As you do this, you will come to understand the spreadsheet quite well, and be able to check the author's assumptions against your own. In effect, the final spreadsheet will be co-written by you and the first author. So here are some steps you can take to rewrite a spreadsheet, auditing as you go.
Suppose you have just been given The Spreadsheet Of Linguini that optimistically justifies a $200 million project, and you are supposed to make sure it is correct.
We will assume that the spreadsheet has little or no Visual Basic code, it does not query outside databases, and it does not require any special add-in utilities. It is a stand-alone monster.
We will also assume that nothing is password protected (or that you have the password). If you are asked to audit a password-protected spreadsheet, I advise you to decline. You cannot properly audit a password-protected spreadsheet.Raymond R. Panko and Ralph H. Sprague, Jr., “Hitting the wall: errors in developing and code inspecting a ‘simple’ spreadsheet model,” Decision Support Systems, vol. 22, no. 4, April, 1998, pp. 337-353.
Generally, you should not try to audit your own spreadsheet. We like our own work. We have affection for it that gets in the way of our objectivity. This is the overconfidence that Panko and Sprague (1998) discovered.
Plan on saving multiple backup copies. Make sure at every stage that your spreadsheet can reproduce the results from the original. To do this, you need to be sure that the new spreadsheet is mathematically the same as the original, or that you can document a clear error in the old one. If you save a new version frequently, you can go backwards a previous version, if you find you have broken the spreadsheet.
The process is one of pruning and cutting, and converting to good style. As you do this, the issues will become apparent. Here is an overview:
Step 1. Put the spreadsheet on a single sheet.
Step 2. Turn off all non-numeric formatting. Force the display of all rows and columns.
Step 3. Format formulas.
Step 4. Delete unnecessary rows and columns.
Step 5. Delete unnecessary dangling cells.
Step 6. Rearrange so the spreadsheet's logic flows from left to right and top to bottom.
Step 7. Nest and erase to eliminate spurious cells. Simplify formulas.
Step 8. Format numeric cells for description.
Make sure you get the same numerical results at teach stage. If not, undo!
Select Sheet 2. Press Go To, Special, Last cell. This will move the cursor to the last cell for which memory has been allocated.
Holding the Shift key, move the cursor to A1. This will select everything active on the sheet.
Press Control-x to cut everything on the sheet.
Select Sheet 1. Select Sheet 2. Press Go To, Special, Last cell. Press the down arrow once, then scroll left in the same row to the A column.
Press Control-v to paste Sheet 2 on to Sheet 1.
Delete Sheet 2.
Go back to Step 1 and repeat until only one sheet remains.
Save the file under a new name.
Here, I have moved the Formulas sheet to the Input sheet, and then deleted the blank Formulas sheet.
You may see vast expanses of unnecessary blank space between the information in the top. We will get to those later.
You may overwrite blank references. If #REF errors appear, undo, then look at the instructions for finding blank references in Step 4.
What if everything won't fit on one spreadsheet? Come back to this step later - it probably will, after we have done some work!
Fonts, alignments, borders, colors, etc., are crayon obscuring the numerical model. Do not let the writer make you look where he wants you to look, because this is probably not where the error will occur. Treat every cell as an equal for now. The goal is to find the numerical structure of the spreadsheet apart from its graphical or typographical structure.
Highlight the entire sheet by either pressing Control-A, or by clicking on the button just above the row headers and to the left of the column top.
Format Cells Alignment Right, and Orientation as 0 degrees. Later, you will need to align the longer labels as left. Clear any checks in Wrap text, Shrink to fit, and Merge cells.
Select Format Cells Font, and make everything Arial, Regular, 10 points, Underline None, with Color Automatic.
Select Format Cells Border None.
Format Cells Patterns No Color.
Format Cells Protection, and make sure Locked contains a check. Make sure Hidden does not contain a check.
Format Column Width 12.
Format Row Height 16.
Save the spreadsheet with a new name.
It may be helpful to widen the A column, since it often contains long labels. But at this point, do not worry too much about being able to read every label.
Select Edit, Go To, Special, Formulas. This will highlight all formulas in the spreadsheet.
With all the formulas highlighted, format the cells in some way. Try using a very light gray with a medium gray border.
Save the spreadsheet with a new name.
Now anything with default formatting is a formula or text.
This method of finding constants can be tricked if the writer inserts an equal sign in front of the number. Keep an eye out for such evil, and if you find it, correct it by removing the equal sign.
If you find that the spreadsheet contains many constants and few formulas, you can format the formulas instead. Format in a way that minimizes the amount of formatting, but still differentiates constants from formulas.
Warning: you may be about to see some major errors in your spreadsheet.
Delete blank external rows and columns. An external blank row is a row between the apparent last cell (where the data looks like it ends) and the true last cell (Go To, Special, Last cell).
Delete unneeded internal rows and columns, that appear within and between tables.
Delete blank rows and columns to condense the spreadsheet and to be concise with blank, to reduce the footprint of the spreadsheet. An ideal spreadsheet would fit on one screen in 10-point type. Unfortunately, most complicated work requires more space.
Be aggressive with deletion. Are separator columns and rows really important? If a row has one useless label in it (such as the name of the project or the SCREEN number), consider deleting the entire row. Delete columns or rows with repetitive labels, like the days of the week here; use Window, Freeze Panes instead. Separate tables can often be combined into a single table.
Try to put cells within the eye span, within reason, so you can see it in one screen.
Now what could go wrong here? Why might serious errors appear? We hope that we are only changing the visual layout of the model. This step is not intended to touch the numerical model in any way. However, if the spreadsheet perversely depends on blank cells, they are most like to appear at this point, when you are deleting rows and columns.
Here is a sorry example of a perverse spreadsheet (displayed with View Zoom at 15%). A blank cell, E85, in the middle of a vast blank area is referenced by formulas far below, as shown by the blue arc of dependence. When I used Delete Row to move the tables together, #REF errors appeared everywhere. The spaghetti fell off the plate and made a mess on the floor.
The spreadsheet may be correct anyway, if these references are not relevant to the last numeric cell. No need for panic yet, though a blank reference is a red flag indeed.
If you find #REF errors suddenly appear after you delete a blank row or blank column, you can go through the blank cells with the auditing toolbar to determine which blank cells are referenced.
Or, if you wish, you can use some Visual Basic code to find all blank references. This macro, FindBlankReferences(), inserts a red zero (as in “danger!”) in any blank cell with a dependent. You can then delete apparently blank rows and columns with more confidence (though cells could still be hidden). Once you have made sense of the blank references, set them back to the default format.
Sub FindBlankReferences() Application.ScreenUpdating = False Dim c, precedentCell For Each c In ActiveSheet.Cells.SpecialCells(xlCellTypeFormulas) If NumberOfPrecedents(c) >= 1 Then For Each precedentCell In c.DirectPrecedents.Cells If IsEmpty(precedentCell) Then precedentCell.Formula = 0 precedentCell.Font.ColorIndex = 3 ' Red End If Next precedentCell End If Next Application.ScreenUpdating = True End Sub Function NumberOfPrecedents(cellij) As Integer NumberOfPrecedents = 0 On Error Resume Next NumberOfPrecedents = cellij.DirectPrecedents.Count End Function
By now, you will probably be getting a vague feel for the spreadsheet. Now we start editing the numerical part.
Find the last numeric cell (not the last cell in memory). Which one gives the answer, which one tells you whether to do that $200 million project?
Select the last numeric cell, then repeatedly press Trace Precedents, until no more blue arrows appear.
Most likely, you will see some numerical cells with no blue arcs of precedence. These cells are dangling. Unless you have a compelling reason to keep these dangling cells (like you are solving a math model such as a linear program), delete the dangling cells. Yes, delete them, formats, labels, and all, with Edit, Clear, All. The goal is to delete the “cells that are there just for your information,” and retain the cells that supposedly produce the justification for the $200 million project.
Save the spreadsheet with a new name.
Another way to find dangling cells is with a Visual Basic macro, such as the one here. FindDanglingCells() will chase down all the dangling cells and turn them green. You can then choose the one dangling cell as the last numeric cell, and delete all the others, with their labels and formats.
Sub FindDanglingCells() Application.ScreenUpdating = False Dim c For Each c In ActiveSheet.Cells.SpecialCells(xlCellTypeFormulas) If 0 = NumberOfDependents(c) Then c.Font.ColorIndex = 43 ' Green End If Next c Application.ScreenUpdating = True End Sub Function NumberOfDependents(cellij) As Integer NumberOfDependents = 0 On Error Resume Next NumberOfDependents = cellij.DirectDependents.Count End Function
If you delete the unnecessary dangling cells, you can run this routine again to try to find more.
At this point, you are probably getting quite familiar with your Spreadsheet of Linguini, and it may even be looking less like linguini and more like a clean model.
Now things get hard. The goal is to have all cells refer only to cells above and to the left. If a cell refers to a cell to the right, and or below, work to move it.
Using Control-x and Control-v, move constants to the top of the sheet. You may have to insert some blank rows to make room.
Using Control-x and Control-v, move the dangling cell(s) to the bottom of the spreadsheet.Warning: using the Auditing Toolbar clears the Undo stack. Save your file after any change, under a new name!
On the last numeric cell, click the Auditing Toolbar Trace Precedents button once. This will display the last numeric cells' direct precedents.
Using Control-x and Control-v, move the last numeric cell's direct precedents down to a row (or rows) immediately below the rest of the spreadsheet, but above the last numeric cell. You may have to move an entire column, row, or table, and you may first have to insert rows to make room.
Keep an eye out for #REF errors. If these appear, undo the changes, and try to find out what is wrong.
Continue moving cells, putting direct precedents towards the bottom, so the logic flows from left to right and top to bottom.
The goal here is to reduce the number of cells and reduce the number of characters in each cell. Rather than a sequence of steps, here is a list of some things you can do.
Simplify formulas. Perhaps the first thing to do, since the model is all on one sheet, is to use search and replace to eliminate sheet names within formulas.
Look for spurious cells and eliminate them with nest and erase. Here is some Visual Basic code which will find spurious cells and turn them green. This code turns green cells which have either exactly one precedent or exactly one dependent. For these cells, nest and erase will usually improve the spreadsheet.
Sub FindSpuriousCells() Application.ScreenUpdating = False Dim c For Each c In ActiveSheet.Cells.SpecialCells(xlCellTypeFormulas) If 1 = NumberOfPrecedents(c) Or 1 = NumberOfDependents(c) Then c.Font.ColorIndex = 43 ' Green End If Next c Application.ScreenUpdating = True End Sub
If the spreadsheet is wider than it is tall, consider transposing it. (Create a new sheet, Select All on the old sheet, then use Edit, Copy, and Edit Paste Special, Transpose.) This will tend to put more of model within the eye span. After transposition, long labels at the top, which may span multiple rows, will fit more neatly into one column at the left. Check that the the model still produces the correct numbers.
Edit labels aggressively to eliminate acronyms and abbreviations, yet still make the labels shorter and neater. Sentence case, rather than upper or title case, will allow more text to fit in a cell.
Remember, at every point, make sure that the numbers stay the same. If the numbers in the modified spreadsheet are different from the numbers in the original, you have probably made a mistake. Undo!
So part of auditing a spreadsheet is reconstructing it. It is an enormous amount of work, and the satisfaction at the end is roughly comparable to the satisfaction one feels after editing someone else's writing. It is not even as pleasant as mopping the basement. But now you probably have a much better idea whether that $200 million project will work.
Here, consistency is a powerful ally. Make sure all cells in units of currency are formatted as currency. Make similar numeric cells display the same number of decimal places, and not more than are numerically significant. Take some time to make sure that every numeric cell has the proper numeric format and alignment.