The Art of the Spreadsheet. Copyright 2008 John F. Raffensperger

1. Why is spreadsheet style important?

2. Make your spreadsheet read from left to right and top to bottom.

3. Omit unneeded bytes.

4. Omit unneeded sheets.

5. Organize blocks with care.

6. Attend to blank space.

7. Omit unneeded cells.

8. Format with caution.

9. Show all the information.

10. Spreadsheet errors.

11. How to audit a spreadsheet.

12. Suggestions for operations researchers.

13. Teaching the art of the spreadsheet.

Appendix. Checklist for a spreadsheet.

How many different types of spreadsheet errors are possible? The mind reels. Honestly, no one knows. The best we can do is give a taxonomy of the known beasts.

I will not cover macros and Visual Basic. These probably belong more in an art of computer programming book. However, avoid gratuitous dialog boxes to capture inputs that can be handled by typing directly into a cell. Prefer Excel's native formulas to custom formulas or macros. Less documentation and explaining is needed when the spreadsheet has fewer complications.

There are several strategies for preventing and discovering errors in spreadsheets.

1. Use a good spreadsheet design, with short arcs of precedence. Format all the numbers properly. Design the spreadsheet for readability and auditability.

2. Use “reality checks.” Given the inputs, use the back of an envelope or a calculator to calculate the outputs, even approximately. Are the outputs at least close to an intuitive value?

3. Check the measure units. For example, if you add apples to oranges, you should be aware that the result is fruit pieces. This is laborious work when done by hand, especially when the formulas have scientific functions (such a logarithms, exponents, or trigonometry).

As of this writing, I am aware of only one developer working on auditing software that will check measure units, but such software should be quite powerful, catching many kinds of errors, even complex logic errors. An automatic check of measure units will catch references to blanks, to labels, and to the wrong cells; it will catch bad logic, and possibly even overwriting formulas with constants.

4. Have someone else check your spreadsheet. You should routinely have someone else check your inputs, even if the model is proved correct.

5. Compare the output to known values. For example, if you are trying to create a statistical function, you may be able to find a table in a book to confirm your calculations.

6. Use shadow modelling. Shadow modelling is where you create the same model with different software, or another person (or team) creates the same model in Excel. Some professional spreadsheet auditors (e.g. Price-Waterhouse-Coopers, Inc.) use an algebraic modelling program to check spreadsheet models.

It makes little sense to use a modelling language to check a spreadsheet that has not yet been implemented - the point of using a spreadsheet is the easy of construction. Using the algebraic modelling program for the application itself would be wiser. However, if a spreadsheet model is widely implemented in an organization, auditing the model with a separate program will be useful.

A couple people are working on software to create Excel spreadsheets directly from algebraic modelling languages. These are interesting developments, and may popularize algebraic modelling, which would be good.

Detection: Always check your inputs. Before making a decision based on the spreadsheet, show the inputs to someone else, separately from the rest of the spreadsheet.

Some numerical errors can be discovered by formatting, e.g. dollars or percents. If the input should be a percentage such as 0.20, either 20 or 0.20 will appear as 20%, which may surprise you, but you will probably notice, especially if you wanted 2,000%. “1000000” has too many zeroes in a row to parse visually, but “1,000,000” is instantly recognizable. One zero too many or too few means the value is wrong by a factor of 10. Format matters.

Prevention: You can use data validation formulas, but these add bulk to the spreadsheet and distract the reader. Remember that the data validation function is hidden, and can be hard to change.

Detection: Always check your inputs. Before making a decision based on the spreadsheet, show the inputs to someone else, separately from the rest of the spreadsheet.

Prevention: For time sensitive data, show the date when the value was entered, immediately adjacent to the data. You can create a formula that displays an error when the value becomes out of date. For example, if the entry date of the value is 1 December 2002, and the value is correct for two months, you could use the following:

A2 = 'Price entered on: | B2 = 1-Dec-02 |

A3 = 'Months price is valid: | B3 = 2 |

A4 = 'Is price valid today? | B4 = NOW() <= B2 + B3*30 |

You understood what the model should be, but accidentally made it wrong. This error is often due to bad layout.

Often, an entire column or row points to the wrong column or row. Almost certainly the arcs of precedence are too long.

Detection: Use Excel's standard auditing tools.

Prevention: Make sure the reference is correct when you create the formula. The long the arc of precedence, the more time you should take to make sure that the correct cells are referenced. If the reference is adjacent, you should check that the reference is correct, and use the auditing toolbar. If the reference is off-screen, you should check twice with the auditing toolbar. If the reference is on a different sheet, you should triple check - literally check three times.

This could be an innocuous “one off” mistake at the beginning or end of a list, e.g. inventory control. The start and end formulas may have to be different.

Detection: Excel XP automatically detects blank references, but earlier versions do not. If you have an earlier version, you can detect blank references with the Visual Basic code in the “How to audit a spreadsheet” section.

Prevention: Design your spreadsheet to have short arcs of precedence, so you can see the precedent cells on the same screen. When you copy formulas, routinely check immediately the first and last formulas in the row or column.

This is similar to pointing to a blank.

Detection: The cell may show an error message (if “transitional formula evaluation” is not checked).

Prevention: Design your spreadsheet to have short arcs of precedence, so you can see the precedent cells on the same screen.

For example, you calculated the area of a pizza as PI()*diameter^2, when area is PI()*radius^2.

Detection: Occasionally check your spreadsheet with the “reality check” test. For example, you can approximate the area of a 12" diameter pizza as something less than if it were square, 144 square inches, but PI()*12^2 = 452.4, which is far too big.

Try a range of numbers. Zero is often interesting, as it uncovers divide-by-zero errors (which is usually an error of logic!). Then try negative numbers. Then try extremely large values, such as 10^10.

Check the measure units. Measure units multiply and divide in an analogous way to numbers. For example, (0.5 kilograms/tile)*(400 tiles) = 200 kilograms. The tile units cancel out. If we were hoping for square meters, we should check the input measure units.

Use shadow modelling. A few large consulting firms have begun using shadow modelling to check their clients' spreadsheet. In shadow modelling, a different person or team writes the same business model as the spreadsheet model. Usually, the shadow model team does not use spreadsheet software, but instead uses an algebraic modelling system. The shadow modelling system then can be used to check the spreadsheet.

Prevention: Write each formula very carefully! Check a formula's output against known values.

Detection: Check against the real world. Validate your model by finding out how closely it predicts the actual system. Compare to someone elses's model.

Prevention: Where possible, start with an existing logical specification from the domain literature or field.

If you accidentally overwrite a formula with a constant, you can detect it by occasionally reformatting all constants (or all formulas), so that constants and formulas are formatted differently. Then the error will show up instantly.

Excel 2002 displays formulas that are different from the surrounding, but this fails when the different cell is a constant.

There are very few true Excel bugs, but sometimes the program does not behave the way we expect.

A construction firm filed a lawsuit against Lotus Corporation because Lotus 1-2-3 did not behave as the analyst expected, and the firm underbid on a big project. The analyst had inserted a row at the bottom of a block, but the sum() under the block did not automatically update to include the new row.

Excel XP has built-in smarts that can sometimes avoid the error. Suppose you wrote the list in cells B1:B6, and then you wrote the SUM(B1:B6) in cell B8. Later, you entered the 56 in cell B7. Excel would automatically modify the formula in cell B8 from SUM(B1:B6) to be SUM(B1:B7).

Detection: check dependent formulas when you make a change.

Prevention: experience. Learn the software.

Some people think that spreadsheets have rounding errors, but in fact, modern spreadsheets work at the precision of the underlying microprocessor. What is displayed may look like a rounding error (usually because a fixed number of decimals are shown), when it fact the numbers are correct.

Spreadsheet errors are caused by using a spreadsheet. The worst spreadsheet error is to use a spreadsheet when you should have used a different kind of software.

The spreadsheet's strengths are that it is (deceptively) easy to write, and data, calculations, and their changes can be instantly viewed on the screen.

A spreadsheet has weaknesses. First, it is easy to make mistakes. Second, it lacks scalability - it is hard to enlarge a database of 2,000 to a database of 200,000. Third, it is hard to automate, especially if it must connect to a corporate information system.

These strengths and weaknesses suggest that spreadsheets are best for applications that are small and informal, where the formulas can easily be checked, where the interest is in seeing immediate changes in the data, and where the reader wants to see all the data and all the calculations.

Here are some cues that you should probably switch your application to some other program.

Raymond Panko, “Applications Development: Finding Spreadsheet Errors,” www.iweek.com, 29 May 1995.- Your spreadsheet is turning into a monster, even though you have tried to be concise. Size is a critical cue, because the typical spreadsheet contains errors in about 4% of the cells. A larger spreadsheet is less likely to be correct.
- You want or need to hide information from the reader.
- You suspect that you will need to enlarge the model or database significantly, or you will need to make the application available to many other people.
- You are trying to build a corporate information system that must scale up.
- Someone has audited your spreadsheet and found many errors.

Be honest with yourself, even though you have invested a lot of time in your spreadsheet. Is this really the right software?

If your firm depends on spreadsheets for critical systems, consider implementing a policy of “spreadsheet sunsets,” which means that each spreadsheet has a planned expiration date, when the spreadsheet should be replaced with a more robust system.