The Art of the Spreadsheet. 8. Format with caution.. Copyright 2008 John F. Raffensperger
1. Why is spreadsheet style important?
2. Make your spreadsheet read from left to right and top to bottom.
3. Omit unneeded bytes.
4. Omit unneeded sheets.
5. Organize blocks with care.
6. Attend to blank space.
7. Omit unneeded cells.
8. Format with caution.
9. Show all the information.
10. Spreadsheet errors.
11. How to audit a spreadsheet.
12. Suggestions for operations researchers.
13. Teaching the art of the spreadsheet.
Appendix. Checklist for a spreadsheet.
Rule 1 of formatting is “Don't format.” Rule 2 of formatting is “Format to give your reader information.” If in doubt, use Rule 1. This is a little tongue in cheek, but only a little. Err on the side of Rule 1.
Above, the writer lost control of the formats. He was attempting to increase signal, but managed only to increase noise. Below is the same spreadsheet, reformatted.
A format is descriptive if it displays information and if the reader already knows the format. A format is decorative if it displays no information. Decorations are almost always noise.
Writers format cells for various reasons: to get attention, to try to look professional, from perceived necessity (they feel they have to format), to encode data types (“blue means interest rates, purple means dollars, green means widgets”) and to make a cell self-descriptive. These types of formats overlap somewhat, and are variously decorative or descriptive. The worst are cryptic; the best are crucial.
When a writer tries to decorate, he should follow rules of style for those elements. If you are a trained graphic artist and you know your way around a quantitative model, good on you, let's see some well-decorated spreadsheets. But most spreadsheet writers do not have a degree in graphic arts, and should not attempt to decorate a spreadsheet. It is a big job just to get the numerical parts right, and the numerical parts are the most important.
Use formats to get attention or to look professional very sparingly. To look professional, strive for a clean look rather than distracting big titles and heavy colors. Decoration in a spreadsheet is like crayon in a business letter. Format to explain to the reader, not to impress the reader. To get attention on a few specific cells, use a logical layout and concise labels.
Here, the colorful spreadsheet at left is overwhelming. There is so much formatting, that the formatting is meaningless. Would you write a business letter like this? In the monochrome sheet, it has been calmed down, simply by turning off most of the formats. It's a relief to get rid of the noise. Isn't it amazing how much a spreadsheet can be improved by simply turning off the formatting? Also, we have information - the cells in gray are constants, and this key is noted at the top (the rest are labels or formulas). Why does the Formulas sheet have constants? Shouldn't constants be on the Inputs sheet? So we see that a format should instantly tell us something useful, something about the data or the structure of the spreadsheet. This last one is by no means perfect. It still has spacing and alignment problems, but now those problems are obvious.
Diminish ink. Do careful visual editing. Have less clutter. Use grey instead of strong colors and black lines.
Use only one font size. Large fonts take up more screen space than small fonts. Large fonts take time to produce, and they require changes in row heights, which affect the height of the entire spreadsheet. Less information can fit on one screen, within the eyespan.
Under the heading, “Self-Promoting Graphics: The Duck,” Tufte wrote,Tufte, Edward, The Visual Display of Quantitative Information, 1983, Graphics Press, Cheshire, Conn., p. 183.
When a graphic is taken over by decorative forms or computer debris, when the data measures and structures become Design Elements, when the overall design purveys Graphical Style rather than quantitative information, then that graphic may be called a duck in honor of the duck-form store, “Big Duck.” For this building the whole structure is decoration, just as in the duck data graphic.
Tufte's duck is a graphic that consists of so much decoration, that if the decoration were removed, there would be little or nothing left. These can be quite amusing, actually.
The “template” here looks mighty impressive.
But killing the formats demonstrates that we have a duck, seven calculations on eight inputs. In the calmer sheet, we now can see lots of blank space that could be deleted.
Now we can clean it up.
When word processors first allowed anyone to print with a multitude of pretty fonts, everyone did. The result of lots of pretty fonts was pretty ugly. One might even say the result was font ugly.
You can kill all the ugly formats at once by selecting the tab at the top left of the spreadsheet (above the row number 1 and left of the column letter A), then select Format Cells. Now make all fonts the same, say Arial, 10 points, Regular, automatic color. Turn off all borders. Make the background pattern automatic, too. Voilá! A better spreadsheet.
Avoid multiple styles in one cell, e.g. bold and italic, and use these rarely.
Underscored spreadsheet text is ugly.
For the writer, default formats take no time to apply. For the reader, the default format is best in most cases.
Rarely does formatting add information to text. The Chicago Manual of Style prescribes:Chicago Manual of Style, 13th Edition, 1982, The University of Chicago Press, Chicago, IL, section 12.66.
For some kinds of tables, italic or boldface may work for titles; but both are hard to read in large doses and difficult to handle aesthetically so it is better not to use them for column heads.
Avoid using many colors. There are two reasons. First, readers may be colorblind. (For some reason, many writers discount colorblindness.) The second reason affects everyone.
The ironic problem with color is that it tends to prevent interpretation via a visual process. Rather, it forces the reader to use a verbal process:Tufte, Edward, The Visual Display of Quantitative Information, 1983, Graphics Press, Cheshire, Conn., p. 183
A sure sign of a puzzle is that the graphic must be interpreted through a verbal rather than a visual process...Color often generates graphical puzzles... Attempts to give colors an order result in those verbal decoders and the mumbling of little phrases.
So color encoding forces the reader to refer to the key, and we end up, not with a visual process, but a linguistic one.
To make matters worse, we each have our own idea of what a given color should mean. In Envisioning Information, Tufte goes on for fourteen pages about problems with color. Here are quotes from pages 81 and 93:Tufte, Edward, Envisioning Information, 1990, Graphics Press, Cheshire, Conn., pp. 83, 93.
The often scant benefits derived from coloring data indicate that even putting a good color in a good place is a complex matter. Indeed, so difficult and subtle that avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no harm. [Italics in original.]
... Color itself is subtle and exacting. And, furthermore, the process of translating perceived color marks on paper into quantitative data residing in the viewer's mind is beset by uncertainties and complexities. These translations are nonlinear (thus gamma curves), often noisy and idiosyncratic, with plenty of differences in perception found among viewers (including several percent who are color-deficient).
Here, cells with blue text on gray background are decision cells. Cells with black text on gray background are formulas. Cells in purple are constraint formulas. Cells in yellow are constants. Not shown are cells in turquoise, which are formulas not referenced by the main part of the model. The green bar is just a divider. Got that?
Some day, the advice “Write spreadsheets in monochrome” may be as common as “Don't put constants in formulas.”
Format numbers according to their meaning, using widely understood conventions to make information self-descriptive. A format of self-description is a format that gives information that the reader is very likely to know, without any need for a key. Formatting currency with a dollar sign is a format of self-description. Formatting currency in green is not.
If the number is a percent, format it as a percent, so it displays “%”.
Format currency appropriately. If the number is money, format it to display a monetary symbol, such as “$”. Spreadsheet readers who are accountants may be comfortable with negative values in red. However, a cell may be formatted to have red text, possibly confusing even the accountant. Also, red prints as black on a monochrome printer. A negative sign is unambiguous.
All cells of a given data type should display the same number of decimal places - the reader wants a number to self-descriptively indicate its magnitude by its width.David Freeman, “How to make spreadsheets error-proof,” Journal of Accountancy, May, 1996, vol. 181, no. 5, pp. 75-77.
Freeman also recommends using the ROUND() function so that the number displayed on the screen is the same as the number represented in the cell, something sure to give operations researchers heartburn. If you really want to wreck the precision, select Tools, Options, Calculation, Precision as displayed. Information is permanently lost.
However, the motivation to quash precision is sometimes understandable. Spurious decimal places, such as “Sales Forecast for 2006: $7,984,362.21” can be quite amusing. The solution is to format the numbers appropriately and to think a bit about significant digits.
What are significant digits? A digit's significance depends on the accuracy of the values used to compute it. Suppose we carefully measure a stainless steel rod and find it is 1.0000 meters long. Then we cut it about in half with an old hacksaw. Is each piece 0.5000 meters? No, each piece is about half a meter. There are lots of pesky rules about finding significant digits, but generally, this intuitive definition is sufficient; spreadsheet writers have a general idea of the precision of their data. Roughly speaking, your model has as many significant digits as its least precise number.
The millions are significant in “Sales Forecast for 2006: $7,984,362.21.” The hundreds of thousands might be too. But hiding the insignificant hundreds and tens and ones is hard, without resorting to ROUND() or some other unfortunate trick. Verbally, the “Sales Forecast for 2006: $7,984,362.21” would best be reported as “about $8 million,” without resorting to a rounding formula. On screen, assume the best of your reader. Assume that your reader will understand significant digits. But also turn off the pennies places to show that you understand significant digits.
Show cents only if they are significant, otherwise decrease the number of displayed decimal places to zero. Show one decimal place (“$24.4”) if the number is not in ones (e.g., if it is thousands), and label the cell, row, or column appropriately (e.g., with “(000)”). Large numbers (“1000000”) are much easier to read if they are formatted to show separators (“1,000,000”), as shown here.
An important way to cue your reader to quantity is with the width of the number. At a glance,
$7,984,362.21 looks similar in size to
because they have about the same width. To discern the difference, we have to search for the decimal place. By making the number of decimal places the same, the difference in magnitude pops out. In the margin image, the unformatted number with the smallest width is the largest in value.
Write the spreadsheet as though you were preparing a paper document. Think about how the format will look to your audience. Print it out and look at it. Does it make sense printed? If so, then the spreadsheet will probably be clear on screen.
Understanding the spreadsheet printed is an acid test for understanding the spreadsheet on screen. Problems in print will tend to appear on screen, though they may not be as obvious on screen, due to screen clutter and the cell grid. Here, when we can see the grid, we can easily associate each number with its currency symbol. In the middle, we have simply dropped the grid, and the association of number to currency symbol is easily confused. Last, the currency symbol is adjacent to the value, and the confusion is reduced. The crude solution is to print the grid, but that is unnecessary.
On a screen, spreadsheets seem interesting, interactive, alive, useful, full of promise, and authoritative, even to the writer. If they are printed, they lose the flimsy “on television” cachet. They don't glow anymore. They don't often have color. Their formulas are dead. The fact that it was produced On A Computer is lost on the reader, as it should be; the writer has lost all false authority and maybe some false confidence, too.
On a monochrome laser printer, gray usually prints fine, but yellow becomes white. Blue prints as black.
Does the structure and layout of the numbers make any sense? How does the spreadsheet look as a calculator? As a disconcerting experiment, try deleting all the labels from a spreadsheet. Do this with Edit, Go To, Special, Constants, and tick only Text. (Clear the checks from Numbers, Logicals, and Errors.) Then print it out.
If your unlabeled spreadsheet appears to be scattered numbers on a page like the one here, you probably should reorganize it. We see lots of empty space, misaligned columns, and unformatted numbers. Once you have seen how bad your spreadsheet really is, you can delete your experiment and fix the original.
Right justify numeric cells. Justify column headings the same as the numeric cells underneath them - to the right.
The most common format of necessity is the column width. Columns must be wide enough to display the data in each cell, but narrow enough to get a reasonable amount of data on each screen.
Try to make all columns as narrow as possible, subject to two constraints, and with one exception. The first constraint is to give all columns (with one exception) about the same width. The second constraint is to avoid abbreviating labels, unless the abbreviation is well-known. The one exception is the A column, which should contain longish labels.
Why have narrow columns? Here, we turn again to Tufte: “The Shrink Principle has wide application. Graphics can be shrunk way down.” The goal is to increase the data density, put more information within the reader's eye span, while maintaining an orderly and readable spreadsheet. Here, abbreviating the names of the days allowed higher density.
We would like to reduce column widths and reduce blank space, without losing labeling. The fast and ugly way to adjust columns is to select the entire spreadsheet and adjust all columns at once with Autofit. Autofit makes the columns uneven and does not allow labels to run across cell boundaries.
Even or nearly even widths look calmer than uneven columns.
It can be a challenge to write clearly, without abbreviations, and have good labels while keeping reasonably narrow and even column widths.
To make the column widths more even, look for the widest label and improve that.
Try to put row headings only in the A column; this also keeps the label on the left where the reader expects it. Do not be tempted to widen the row - a wide column is easier to read than a thick row. A wide column allows the reader to read primarily from left to right. The Chicago Manual of Style:Chicago Manual of Style, 13th Edition, 1982, The University of Chicago Press, Chicago, IL, section 12.77.
You may be able to save the requisite number of characters by running over some of the stub [row heading] items, by dividing words in the column headings, or by introducing abbreviations. A few characters of excess width may be accommodated by setting column heads one size smaller...In an emergency you can have the type in the column headings set on edge, so they read up the page rather than across, but this is undesirable if any other solution is available, because vertical heads are hard to read.
Why keep all column widths about the same width? There are two reasons.
First, readers try to find meaning in a width. We see a special column width and it jumps out. So we look for information in it. Most of the time, there is none. A column width carries no more meaning than a margin width.
This suggests you should not use thin columns simply as dividers. The first thing a reader will do in a narrow column is make sure it is empty, which is an annoying and unnecessary chore.
Second, uniform columns are more aesthetically pleasing. The spreadsheet looks better, and the eye easily becomes accustomed to the spacing, naturally looking for data where it should be, at even intervals. Give all numbers about the same screen area.
Looks silly in print, doesn't it?Tufte, Edward, The Visual Display of Quantitative Information, 1983, Graphics Press, Cheshire, Conn., p. 162.
Avoid varying row heights. A thick row will tend to force the reader to read primarily downwards rather than from right to left.
But more importantly, line-wrapped labels force information downwards off screen, and make the layout itself a puzzle.
In row 1 here, the wrapped text in C1 and D1 create large blank spaces to the left and right. Similarly, the text in A2 forces the entire row downwards. By keeping the text to one line, as in the bottom example, more information is brought upwards into a smaller eyespan. All that is lost is useless blank space.
In the top example, the eye first focuses on the phrase “Quantity to produce”, then scans right to the ones and the dollars, then zig-zags back to the “Total profit”. Why is “Total profit” all by itself? It uses up an entire row, and it depends on cells above and to the right.
In the top example, is “Materials used” a formula or a constant? From the meaning of the label, it would seem to depend on the materials required, but then that information should be to the right of the requirements. Sure enough, “Materials used” is a formula, so it was moved to the right where it belongs.
The double-line labels and the separate row for “Total profit” force the Materials section so far downwards that the writer felt the need to duplicate the row 1 header in row 6. Double-headers are great in baseball, but are bad layout in spreadsheets!
After some modest editing, the labels have been fixed, and the space is used more effectively. We then erase all the meaningless format, and add meaningful format. The result is a cleaner, quieter, smaller spreadsheet.
For the same reason, do not rotate text (Tufte 1983, p. 183). Rotated text forces the reader to turn her head to read your carefully crafted label. Best to write left to right.
Here, the writer rotated the text because the long names, such as “Bartholemew”, are too long to fit into the narrow column. Rather than have uneven columns, he rotated the text, and set the screen to view at 75%. The problem can be solved transposing the table. Even viewed at 100%, the block is more compact. Column A is so convenient for long labels!
The writer had another reason for rotating the the text - it looks “cooler”, more like an application. Sure enough, the second version looks boring, like a plain old spreadsheet.
Notice that the values of a given data type, such as Days/week, are now much closer. At a screen resolution of 1024x768, in the top example, the Days/week values are 60 pixels apart. In the bottom example, the Days/week values are only 17 pixels apart (even though zoom is at 100%), and comparing Days/week values should therefore be easier. In the top version, comparing values requires visual hopping across the wide blank space. In the bottom version, the values can be scanned more smoothly, top to bottom. The implication is that we should try to align the repetitive data vertically rather than horizontally.
Do not shrink fonts - the reader's display may be small. Your boss may be reading your spreadsheet on a laptop, on the airplane, with the seat in front leaned back.
Spreadsheets with boxes are almost always improved by removal of the boxes. Why?
Tufte favors no grid. Ideally, information should be “transparently organized by an implicit typographical grid, defined simply by the absence of type.”Tufte, Edward, Envisioning Information, 1990, Graphics Press, Cheshire, Conn., pp. 55, 62-63.
Gray grids almost always work well and, with a delicate line, may promote more accurate data reading and reconstruction than a heavy grid. Dark grid lines are chartjunk. When a graphic serves as a look-up table (rare indeed), then a grid may help with reading and interpolation. But even then the grid should be muted relative to the data.
... Careful visual editing diminishes 1+1=3 clutter. These are not trivial cosmetic matters, for signal enhancement through noise reduction can reduce viewer fatigue as well as improve accuracy of readings from a computer interface, a flight-control display, or a medical instrument.
...Unless deliberate obscurity is sought, avoid surrounding words by little boxes, which activate negative white spaces between word and box.
Unfortunately, a spreadsheet's cells are objects. They are not the same as blank, even when empty. So many problems arise, and we must compromise a bit. Especially on screen, a light grid is useful to show us the location of each cell. The grid displays the cells as objects. Anything beyond this is probably chartjunk.
If you need a grid, use delicate lines, the lightest gray possible. On screen, the default gray grid works fine. You may need something slightly darker to print and reproduce. The best possible approach is to layout the table so it needs no grid when printed, but this is difficult graphic art. It is beyond most of us. Use the lightest gray grid reasonably possible.
A helpful structural distinction to make in a spreadsheet is the one between constant input data and formulas. The reader always wants to know which is the “original” data, that is, the assumptions, and which is derived from that. He will want to think about how the formulas work only after discovering the input data.D. G. Conway and C. T. Ragsdale, “Modelling optimization problems in the unstructured world of spreadsheets,” OMEGA, 1997, vol. 25, no. 3, pp. 313-322.
Conway and Ragsdale (1997) wrote, “Use shading, borders and protection to distinguish changeable parameters from other elements of the model.”
Why is it important to distinguish constants from formulas? Constants can be interpreted faster than formulas, so the reader will see obvious constants with relief. The old approach to making this distinction was to put the data in a separate block, but this produces long arcs of precedence, and moves information out of context.
The worst case is input that is text, e.g. “Favorite ball player:” The writer wishes that the reader would type something (“Babe Ruth”) into the adjacent cell. But at first glance, the input field may simply look like a label.
Tufte, Edward, The Visual Display of Quantitative Information, 1983, Graphics Press, Cheshire, Conn., p. 183.
Without a compelling reason, and only if you are very sure that your reader will naturally and automatically know what the encoding means without your needing to add a key, do not try to use color to encode data types beyond distinguishing formulas from constants. Additional encodings will not naturally convey information and can easily become cryptic (Tufte 1983). Instead, use a good block layout and clear labeling to make other distinctions between data types.
Edwards, John S., and Finlay, Paul N. 1997, Decision Making with Computers: the Spreadsheet and Beyond, Pitman Publishing, London.
Edwards and Finlay (1997) distinguish between constant data and formulas by protecting formulas. This prevents accidentally overwriting a formula with a constant. Their argument is legitimate, but formula protection can be irritating, especially if a formula is wrong and password protected. Cell protection is a hidden format - no visual cue indicates protection - so other formatting is required to indicate which cells may be changed. Readers should be informed that the spreadsheet is protected and told how to turn it off.
Try occasionally checking to see which cells are constant. To find all numeric constants in Excel, select Edit, Go to, Special, then check only Constants and Numbers. This highlights all constants at once, and they may be formatted all at once.
Which cells are constant in Assign.xls? The constant cells are the Assignments and Preferences blocks for each ex-president. Here, the constants are formatted gray. My first reaction upon seeing so many constant cells was amazement.
Note that the What'sBest! add-in automatically made the decision cells blue.
With proper formatting, the reader would probably see that Assign.xls is rather simple. Most cells are preference data or decision variables. Should decision variables be formatted as inputs? After all, the solver chooses those values, not the reader or writer. Since the reader may wish to change these to play “what-if,” the decision variables should be formatted as inputs.
In the second spreadsheet, the formulas are gray. The second one is better - there is less formatting overall. So format constants differently than formulas, and format in such a way as to format the least.
This very light gray is not on the default Excel palette. The light gray must be added with Tools Options Colors Modify. Try using the lightest shade of gray that is not white - you will find that you get enough contrast with the least irritation.
For the border, try a medium (40%) gray, which prevents cells from bleeding into one another. Here, I have modified the Chart fills color palette, adding several shades of gray. These colors are available for any fill, not just chart fills.