The Art of the Spreadsheet. Copyright 2008 John F. Raffensperger

1. Why is spreadsheet style important?
2. Make your spreadsheet read from left to right and top to bottom.
3. Omit unneeded bytes.
4. Omit unneeded sheets.
5. Organize blocks with care.
6. Attend to blank space.
7. Omit unneeded cells.
8. Format with caution.
9. Show all the information.
10. Spreadsheet errors.
11. How to audit a spreadsheet.
12. Suggestions for operations researchers.
13. Teaching the art of the spreadsheet.
Appendix. Checklist for a spreadsheet.

9. Show all the information.

Thirty-seven ways to hide information in Excel

Why learn ways to hide information in Excel? Because we have many secrets, and we have a Machiavellian paranoia of losing our intellectual capital? No. We learn ways to hide information in Excel so we can audit spreadsheets written by people with a Machiavellian paranoia. Auditing a spreadsheet is about exposing information.

If you want to hide information in Excel, there are many ways to do it. Here are a few of them.

Do not share the file. This is the most common way of hiding information, and the most effective.

Hide the sheet. You need at least two sheets first, then: Format, Sheet, Hide.

Hide the row: Format, Row, Hide.

Hide the column: Format, Column, Hide.

Hide the cell and protect the sheet: Format, Cells, Protection, Hidden, then Tools, Protection. This shows a display, but hides the formula: =if(1, “Peace!”, “Attack at dawn.”).

Make the column too narrow: Format, Column, Width, 0.

For formulas that are likely to be zero, use Tools, Options, View, and clear the Zero values box. For example: =IF(1, 0, “Attack at dawn.”).

Use a formula that returns a blank: =IF(1, “ ”, “Attack at dawn.”).

Create a complicated formula that displays the information, but format it as text (with Format, Cells, Number, Text, or just use a leading quote character), so the formula is displayed rather than the output.

Format the font with Wingdings: Format, Cells, Font, Font, Wingdings. This displays unintelligible characters.

Format the font to be 1 point high: Format, Cells, Font, Size, 1.

Format the font to be 72 points high: Format, Cells, Font, Size, 72, then Format, Row, Height, 12.

Format the font color the same as the background: Format, Cells, Font, Color, then pick white.

Format the cell background the same as the font color, but this is more obvious: Format, Cells, Patterns, then pick black.

Enter 20 leading spaces before the string, then: Format, Cells, Alignment, Indent, 15.

Enter 20 leading spaces before the string, then: Format, Cells, Alignment, Horizontal, Wrap text, then Format, Row, Height, 12.

Enter 20 leading spaces before the string, then: Format, Cells, Alignment, Vertical, Justify, then Format, Row, Height, 12.

Enter 20 leading spaces before the string, then Format, Cells, Alignment, Left, then enter anything in the cell to the right.

Enter 20 trailing spaces after the string, then Format, Cells, Alignment, Right, then enter anything in the cell to the left.

Enter 20 trailing spaces after the string, then Format, Cells, Alignment, Orientation, and select either 0 or 90 degrees, then Format, Row, Height, 12.

To hide a value, format it as time or date, likely to confuse the reader.

To hide a value, fix the number of decimals to a large number, such as 16, then narrow the column. However, the reader will see the telltale “#####” characters.

To hide a value, use custom numerical formatting: Format, Cells, Custom, Type, and enter a string in quotes.

Use conditional formatting: Format, Conditional Formatting, then select Cell Value Is, then “not equal to,” then use an unlikely value such as -100000000, then Format, Font, Color white, or Format, Font, Pattern black.

Use a data validation message: Data, Validation, Settings, Allow Decimal, and Data greater than, then enter an unlikely value such as -100000000, then in Error Alert, insert the secret message “Attack at dawn.” This is especially effective if you only put it in the title. The reader is likely to focus on the text and ignore the title.

Use custom numerical formatting: Format, Cells, Custom, Type, and enter three semicolons to show nothing. This works for strings and numbers.

Split the window or freeze panes with the secret cell off screen.

Use Insert, Comment, add the secret message, then Tools, Options, View, Comments, None.

Use Insert, Name, Define, to name a constant. This constant could be a number or a string.

Add the secret message as a string after a formula, such as =SUM(A1:A10)+“Attack at dawn.”. Then use Tools, Options, Transition, Sheet Options, and check Transition formula evaluation.

Use Tools, AutoCorrect, Replace “password” With “message”. This is especially effective if AutoCorrect is turned off.

Insert a background picture (with Format, Sheet, Background) with the same color as the font color. You can make this particularly disconcerting by using a screen image of Excel (or just a cropped image of a section of spreadsheet) itself as the background image.

Put the data in a Visual Basic function or dialog box and hide the module.

Cover the data with a chart, graphic, or VB dialog box.

Put the data as a constraint in a Solver dialog box.

Put the data as a constant or label in a chart. Optionally format the graph to obscure the data, such as everything white or transparent.

Name a sheet with the secret message, then Tools, Options, View, Window options, and clear the Sheet tabs box.

There are more ways. As spreadsheet programs grow in features, the number of ways to hide information will grow too. If I could use understatement, please avoid such tricks.

 

Put labels on the left

Western languages read from left to right. Your reader is likely to look for the introduction, for the explanation, for the headline, on the left. Then, when he has understood that, he will look for the number on the right. This is the key thing to remember when labeling. The reader expects the label on the left, not a formula.

Nicholas J. Higham, Handbook of Writing for the Mathematical Sciences, 1993, SIAM, p. 25.

Higham recommends, “Avoid starting a sentence with a mathematical expression, particularly if the previous sentence ended with one, otherwise the reader may have difficulty parsing the sentence.” So do not put formulas or data in that special “text only” location, the A column. Label columns, too, of course.

Of course, elsewhere in the spreadsheet, labels belong on the left, too.

A cell has context. It gets its meaning from the cells that it depends on. If the cell is close to those cells, the cell will be in context. Understanding labels, even imperfect labels, is easier, since the labels will also be in context. If you violate the “put the label on the left” rule, or if your label is not expressed very well, your reader will have a lower probability of understanding the cell.

Below left, the constant cell labeled “#Work” is used by a cell on a different sheet. What does “#Work” mean? We want to understand it, but we are at a disadvantage because the cell is not in context. Below right, the cell previously called “#Work” has been moved adjacent to the cell that uses it, and relabeled “Shifts/week.” Even if the reader does not understand the label, he will have a chance at figuring this out, because the cell is near its related cells.

Spell out labels correctly

Tufte, Edward, The Visual Display of Quantitative Information, 1983, Graphics Press, Cheshire, Conn., p. 183.

For some reason, three-letter acronyms and computers go together. Some people call them TLAs (the three-letter acronyms, that is). Acronyms and abbreviations inhibit comprehension. I will cite Tufte (1983), but I suppose if I have to work hard at making a strong case, it is a lost cause. Acronyms and abbreviations require interpretation, even when the reader knows what they mean. Writers of text should rarely abbreviate - neither should writers of spreadsheets.

There is the amusing story of the analyst who gave written instructions A and B, meaning “Do A first, then do B.” The factory supervisor interpreted A as “after” and B as “before.”

“Commonly understood” means everyone knows it, even an 8-year-old child. It is usually better to have the extra space of a wider column or an extra row than to take the chance that the reader will not understand.

We are tempted to abbreviate especially when most of the data are much narrower than the labels. Here, we have single-digit data. Abbreviating the days is motivated by the righteous desire to put more within the eyespan. We can probably get away with days of the week, but probably nothing that is from a particular domain, profession, or art. So use the grade school child test - if a grade school child probably knows the abbreviation, you can use it safely.

Use correct case. The U.S. Navy used to pass coded messages via a system that permitted only upper case. Similarly, Morse code has no case, so messages were decoded in upper case. We occasionally see old movies with URGENT messages, from the military or Western Union, in all upper case. We remember the urgency from the movie, and somehow translate that to thinking that upper case in a spreadsheet will seem urgent and important.

Text is not clearer or easier to read if written in capitals. All upper case is hard to read. It is harder to find the beginning and end of a sentence, and it is not what readers expect.

Tufte quotes Josef Albers,

Tufte, Edward, The Visual Display of Quantitative Information, 1983, Graphics Press, Cheshire, Conn., p. 183, quoting from Josef Albers, Interaction of Color, New Haven, Conn., 1963, revised edition 1974, p.4.
Chicago Manual of Style, 13th Edition, 1982, The University of Chicago Press, Chicago, IL, sections 12.38 and 12.66.

Ophthalmology has disclosed that the more letters are differentiated from each other, the easier is the reading... [W]ords consisting of only capital letters present the most difficult reading -- because of their equal height, equal volume, and with most, their equal width.

The Chicago Manual of Style gives this suggestion:

Capitalization of items in the stub [row heading] of a table is invariably sentence style, and no periods are used at the ends of the items.

Later:

The word TABLE may be set in full caps, but titles and headings should never be: full caps take up too much room and are hard to read.

Besides readability, there is an extremely pragmatic reason to use correct case.

MULTIPLE SHIFT ASSIGNMENT MODEL

takes up more horizontal space than

Multiple shift assignment model.

A shorter eyespan, more differentiated characters, a stronger signal.

Use the spell checker. Spell checking is easy, takes just a few minutes, and is cheap quality control for the readability of your spreadsheet. The spell checker may discover abbreviations that you think are natural and understandable. Most likely, your reader will comprehend your writing more easily if you give in to the spell checker, in a wise way, as a writer of prose would.

You don't need inserted comments

Excel has an Insert Comment feature. An inserted comment is a complement to, not a substitute for, an adequate label. Simply copying the label into the comment is useless, but one or two sentences of fresh explanation will help. Write comments in concise, complete sentences. First write too much information, and then edit out extra words.

The problem with these comments is that they are hidden, except for a small triangle, unless the mouse hovers over the cell. Even the triangle display may be turned off.

If you want to add extra documentation with inserted comments, place them consistently, so that the reader knows where to expect them. The reader will soon notice comments in the top label of every column and the left label of every row. Scattered comments in random cells are hard to find.

But start with good cell labeling, which is not hidden.

Use range names sparingly, if at all

Why is the writer tempted to use range names? There are two reasons. But range names are probably a negative feature.

First, the writer wants the reader to see a “meaningful” formula in the formula bar. But “meaningful” for the writer can mean “hidden” for reader. Since range names hide the reference, the careful reader (or auditor) must do extra work to find the reference.

The reader's ability to understand the meaning depends mightily on the writer's ability to communicate, on the writer's care in selecting good names, and even the writer's ability to spell. If the reader finds the names “monthly_houses_built” and “monthly_huses_built” in the same sheet, there will be a felt need to compare them.

Range names tend to have more characters than a cell address, making formulas longer, as in the monster, Owner_s_Required_Rate_of_Return_on_Equity. Which takes more time, to type out this long name, or to click on the correct address? Also, as we see here, these long range names cannot be viewed in the address box at top left.

If the same range name appears on different sheets, which one is the “real” one? We might guess that the writer has a spurious reference.

Once the developer has begun using range names, where should he stop? Should every single cell parameter be named, or just cells that are used “often”? To be consistent, many or all cells should be named. Otherwise, the developer has to remember which cells are named and which are not.

So using range names for enhanced meaning is tricky at best.

The other reason the writer is tempted to create a formula without needing to point and click on the address, because that address is far away. The writer does not want to have to remember the cell address, but only the meaning. So he can simply type in “Discount_rate” instead of “J55”. But this is hard to remember, too! Was it “Discount_Rate”? No - let's try “DiscountRate.” It is not as convenient as the writer hoped. The writer can view range names (and jump to a named cell) by clicking the dropdown on the cell address box (or "name box"), just to the left of the formula bar; but this feature is not well known.

This second temptation comes about because the spreadsheet has long arcs of precedence. The formula under construction is far from J55. It would be better to rearrange the spreadsheet so the cell with the discount rate is close to the cells that directly depend on it.

Then there is the range name with which the reader may just disagree, such as Print_Area. Which area is that? Is it the one I want printed, or the one that the writer wants me to print? If I make changes to the sheet, this Print_Area range name is useless.

Range names stay in the sheet, even when we would rather they quietly disappeared after deletion: DiscountRate=Input!#REF!

Range names often become relics, as the defined name is later forgotten, and then refers to a blank, or an apparently random, set of cells.

Range names take a lot of work to change or delete. Deletion is especially hard. To delete a range name, first search and replace the name with the correct address. (Be sure you replace the name with the correct address!) Then use Insert Name Define Delete to get rid of the range name.

Also, range names sometimes collect gibberish characters, as in "INP.

Range names can refer to the wrong range, or the writer may use the wrong name. These are hard to trace. Here, we see that the writer named the wrong cell “Principal.” Writing a formula with “Principal” substituted for “B5” will produce the wrong result. On inspection of the formula, the reader will see #8220;Principal,” and assume this refers to B5, when it in fact refers to C5. This must be checked by pressing F2 (to see the reference cells highlighted in colored borders, assuming the reference is on the same screen), or with the auditing toolbar (assuming it is on the same sheet), or with a very careful inspection of the range name list with Insert, Name, Paste, Paste list.

In editing a spreadsheet, the writer can become confused as to which range a name refers. Inserting a row or column inside a named range expands the named range. However, inserting a row or column adjacent to a named range does not expand the named range - but it expands the formatted area, giving the impression that the named range has expanded!

Auditing requires the auditor to prepare a list of range names, then to review the list carefully to make sure the names match the proper cells. If range names were not used in the first place, this step could be eliminated. Range names are convenient for the writer, but not for the auditor or the reader. Rather than making understanding easy, range names add an extra layer of abstraction.

There is a perverse way - no, a maniacal way - to misuse range names: by naming a constant. Here, “discount_rate” does not name a range. It names the number 0.09. Now the number 0.09 is hidden inside the Define Name dialog box! The writer should have put the discount rate in a clearly labeled cell. And the Trace Precedents button will do nothing.

Range names can be confused with the natural language interface. Here is an example. Without defining a name, type these labels in cells A1:A3:

a
b
d

then in cells B1:B3, type:

2
4
5

In cell B5, type the formula

=a+b+d

You get the answer 11. These are not defined range names. The feature uses Excel's natural language interface.

Now, there are some mighty interesting things here. First, you can not use “c”. You can not even name a range “c”. (Excel uses “c” to mean the current cell.) That's okay, let it go.

Second, you can put the same information on the same worksheet in columns D and E, with the same labels, different numbers, and the same formula, and you get the correct result from columns D and E. The same formula appears, E5=a+b+d. Excel works the way it should and the way superstar Excel developers might expect, but this is really going to throw off a reader who is not brilliant with Excel!

Third, define a range name, “d”, say, cell G1. Type 100 in that cell. The natural language feature automatically updates the natural language formulas, surrounding “d” with single quotes, so the two “=a+b+d” become “=a+b+'d'”. Excel does not confuse the defined range name “d” with the natural language “d”. But now we have absolutely lost our reader.

You might respond that the parameters a, b, and d should appear only once in a spreadsheet. Maybe. If you're going to copy worksheets to use the same formulas for different data, that problem still occurs, though it is not so bad. The confusion with the range name still occurs. Only the plain address is authoritative.

Fourth, go to cell D8, type “=d”. Wham, Excel is lost, you get an error. For C8=d, you get zero, no error.

Fifth, if you change the label of A3 from “d” to “e”, the natural language interface automatically adjusts the formula, so cell B5=a+b+e. That's great, but it's harder to remember a name that can be changed so easily.

The need for range names goes away if the writer uses good layout and short arcs of precedence. Plus, the reader can visually see the result, on screen, in real time, when he makes changes. If we use a spreadsheet as simply a place to allocate memory for a whole lot of constants and formulas, with no intention whatever of having the reader actually look at it or interact with it, then we can do what we like. If we use a spreadsheet like a spreadsheet, we don't need range names.

Never hide cells.

A spreadsheet with hidden cells is perverse, because hidden cells are effectively blank and inaccessible dependents. The logic is literally hidden. Do not write in invisible ink.

Here, cell I32 is formatted as hidden. There is a formula in the cell with “Surplus = $1 Million”, but the formula will not appear in the formula bar at the top. Auditing - or understanding - is impossible.

Hiding cells often goes hand-in-hand with password protection. Besides protecting confidential data, writers password-protect a spreadsheet for a couple different reasons.

A worst case is where a writer tries to “prove” a point (such as the financial viability of a major project) with a spreadsheet in which formulas are hidden and password protected. Unfortunately, hiding cells or preventing changes in a spreadsheet can feel patronizing to the reader and reduces the reader's confidence in the spreadsheet. If the reader takes the time to audit a spreadsheet with hidden cells, the model cannot be proved correct, because he cannot see the formulas.

Writers sometimes want to prevent a reader from changing part or all of the spreadsheet (especially formulas). The concern is that the reader may wreck the spreadsheet, and this can happen. Again, the reader may feel patronized. The writer assumes the reader cannot improve the work. Preventing derivations from the spreadsheet diminishes its utility.

Protection to prevent changes should be done sparingly. If you protect a spreadsheet just because some of your readers tend to change things, there is no reason to also hide formulas.

In Excel, see Tools, Track Changes.

Excel allows the writer to password protect the spreadsheet in such a way that he can track changes made by others in a shared spreadsheet. Tracking changes is good. Hiding information that others deserve to see is bad.

If you want to prevent changes, teach your readers what they need to know. Use the change-tracking feature to find out who has changed the spreadsheet, and then take the time to help that particular person learn about spreadsheets.

 

Take the time to get it right

Just as writing requires editing, a spreadsheet sometimes must be rewritten two or three times, as we find that we have misoriented data types, understood the problem better, or found a more succinct way to express the model. If the spreadsheet is for someone else's consumption, take the time to make it right and make it readable.

Stewart, William, and Flanagan, Jack, “Spreadsheet Design: Some Simple Principles,” Australian Accountant, Dec. 1987; 57(11): 56-59.

Try to make your spreadsheet resemble a system that your reader already knows. The reader may already be familiar with an existing or related form; the spreadsheet may be automating or extending an existing system (Stewart and Flanagan 1987). Try to give your reader instant recognition when they see your spreadsheet.

Show it to someone else in printed. Can they make sense of it without seeing it on screen? If they can, then you have probably produced a good spreadsheet. If they cannot, you may have more work to do.

But don't try to “idiot-proof” the spreadsheet. Writers try to “idiot-proof” a spreadsheet to make it easy for the reader. They want the reader to simply enter the data and admire the output, without the need to understand the logic. The assumption is that the reader (1) has no interest in the logic, (2) could not understand the logic even if he were interested, (3) or might break the spreadsheet.

Strunk, William, and White, E.B. 1979, The Elements of Style, third edition, MacMillan Publishing Co., Inc., New York, p. 70.
Tufte, Edward 1983, The Visual Display of Quantitative Information, Graphics Press, Cheshire, Conn., p. 81.

Strunk and White (1979) said, “No on can write decently who is distrustful of the reader's intelligence, or whose attitude is patronizing.”

Tufte (1983) wrote “Contempt for graphics and their audience, along with the lack of quantitative skills among illustrators, has deadly consequences for graphical work: over-decorated and simplistic designs, tiny data sets, and big lies.”

Idiot-proofing, a heinous attempt to simplify, involves lots of formatting, heavy lines around the inputs, many colors, an input sheet, a summary sheet, password protection, etc. In short, idiot-proofing means creating a monster.

Instead, have the fewest cells necessary to produce the result, flow the logic from top to bottom and left to right, and put related cells close together. And put it all on one sheet. “Keep it simple” could be restated as “Keep it small.”

Here are two spreadsheets that model the same information. One has 5 sheets and 11 screens. The other is on 1 sheet, 3 screens. Which would you rather use and audit?