The Art of the Spreadsheet. Copyright 2008 John F. Raffensperger

1. Why is spreadsheet style important?
2. Make your spreadsheet read from left to right and top to bottom.
3. Omit unneeded bytes.
4. Omit unneeded sheets.
5. Organize blocks with care.
6. Attend to blank space.
7. Omit unneeded cells.
8. Format with caution.
9. Show all the information.
10. Spreadsheet errors.
11. How to audit a spreadsheet.
12. Suggestions for operations researchers.
13. Teaching the art of the spreadsheet.
Appendix. Checklist for a spreadsheet.

12. Suggestions for operations researchers

Sorry, this section is still unfinished. - jfr 25 May 2008

Proper writing of spreadsheets is of special concern to operations researchers and educators. Operations researchers use spreadsheets to produce mathematical models for clients; these models can be large and complicated. Educators should teach good habits to new users and must read students' work. The combination is especially important.

Operations research and management science are now widely taught with spreadsheets, so teachers of mathematical modeling must also teach spreadsheet use. Winston (1996), for example, enthusiastically urges the use of spreadsheets with real world applications in teaching management science. So we operations researchers have a special interest in the art of the spreadsheet.

Some interesting work in operations research done with spreadsheets is that of Neuwirth (1995) ??? REF NEEDED. In his article, he demonstrates how a spreadsheet can give insight into recursive relations for combinatorial formulas. He uses the visual environment to construct formulas visually. The layout on the screen elegantly implies the mathematical relations. In his words,

Using the table approach we also will see that by visualizing certain geometric transformations of the tables under investigation we can derive proofs for conjectures in a very intuitive way.

So some interesting work can be done with spreadsheets to make operations research more understandable. One of the most common application in operations research is optimization. This section offers some suggestions for using solvers and for organizing linear and integer programs.

Using a solver

Solver in Excel

I don't like Solver. In all versions I have seen thus far (through Excel 2002), it has been numerically unstable. But worse -- it hides the model. Instead of having the model in the spreadsheet where we can see it, the model is buried in dialog boxes. We cannot see which cells contain constraints or decision variables without looking inside Solver's gory guts. In fact, we can have constraints and values that do not appear at all as cells, just like defining a range name for a constant as we saw above.

It is possible that an apparently unused input is actually used in a formula hidden in Solver. The reader has no way to tell except by opening and carefully inspecting the Solver Parameters dialog. The arcs of precedence give no indication that Solver is using a reference.

But Solver can hide constants, too. Here we see a mild-mannered spreadsheet with six constants and one formula. But open up Solver and we see all kinds of secret information, hidden formulas and constants. Except for decision variable cells and the objective function cell, it is possible to put an entire linear program inside Solver.

Because so much can be hidden, and because we must dig through dialog boxes, Solver constraints are hard to debug and hard to edit. A large model can be a nightmare, all clicky-clicky through “Add,” “Change,” and “Delete.” Errors are too easy to make. Here are two suggestions, if live with Solver you must.

First, just as you should not put a constant in a formula, do not put a constant (other than the constants 0 and maybe 1) in a Solver dialog box.

Second, use only non-negativity constraints. Put all your constraints explicitly in the spreadsheet, so infeasibility shows up as a negative value in the constraint formula. This way, you can probably put in all the constraints with a single line in the Solver constraints box. Show the model in the spreadsheet. Do not hide it in a dialog box.

In the model above, the Solver constraint “$B$1 >= 3-$C$1” becomes lower model's formula in F4, “SUMPRODUCT($B$2:$D$2,B4:D4)-E4.” It is now identical in structure to all the other formulas, such as the objective function displayed here. The formula can be conveniently copied to cells F3 through F8. So the model is completely exposed in the spreadsheet, and will be much easier to modify.

How did I get this formula? “$B$1 >= 3-$C$1” can be rearranged to “$B$1+$C$1-3>= 0.” Cells $B$1 and $C$1 are the decision cells, which are $B$2 and $C$2 in the new model. Cells B4, C4, and D4 are simply the constraint coefficients: 5*1+2*1+5*0-3 >= 0.

One last thing. Linear programs solve much faster if you select Solver Options and Assume Linear Model.

What'sBest!

What'sBest! (LINDO Systems 1996) is a commercial spreadsheet solver add-in, produced by LINDO Systems, Inc. A demonstration version can be downloaded from http://www.lindo.com. Assign.xls, which has been used in this book, was written by LINDO Systems.

In contrast to Solver, What'sBest! puts everything in the spreadsheet. No part of the model is hidden under a dialog box.

What'sBest! makes decision variables blue, but blue looks black when printed. So I sometimes also make them italic. This way, when notes are photocopied in black and white, my audience can see which cells are decisions.

Here is a tip for working with integer programming. Format integer variables as general, with no fixed number of decimal places. This way, if the model returns a fractional solution, the fraction is displayed rather a rounded (false and infeasible) integer value. The rounded display may look feasible, but is infeasible if it is actually fractional.

Linear and integer programming model structures

A given model can be written in a spreadsheet in various ways. Different spreadsheet structures can inform the writer and reader, and suggest different ways to think about a given problem. That is, after deciding on a structure and building the model, the writer will then be informed by that structure in different ways. By way of example, I will use a nine-city traveling salesman problem (TSP), and solved with What'sBest.

The TSP can be modeled many ways. Here are presented a matrix format, and a record format, and a tableau format. Here is the standard math programming formulation (with hand-waving for the complicated subtour breaking constraints):

Minimize

subject to

, for each i,

, for each j,

subtour breaking constraints,

xij= 0 or 1 for all i, j.

Here is the What'sBest! model. The spreadsheet shows the optimal solution.

The model layout is mostly a matrix form, which is compact and easy to manage. The layout follows the structure of the data and the decision variables.

The subtour breaking constraints, however, must be tacked on separately, almost as an after thought, but this follows the order of the solution method - solve the model, find a subtour, add a subtour breaking constraint, solve the model again. Eventually, we find a feasible tour, which is also optimal. Note that the decision cells on the diagonal are turned off, as non-adjustable by the solver.

Consider a different layout for this problem. Instead of two matrices, we can write the model in list form, with a row for each pair of cities.

Here is our TSP model in list form. The top part of the model is in the top graphic, and the bottom part in the bottom graphic.

Columns E through M contain 1 to indicate an outflow, else 0.

Columns N through V contain 1 to indicate an inflow, else 0. (Of course, these could be converted to constants with no loss.)

Columns W, X, and Y contain the subtour-breaking constraints, with the coefficients in rows 4 to 84, and the constraint formulas in row 86.

This format is much less compact than the above model. It takes at least 3 full screens to see the whole model, in contrast to the matrix form above, which could be viewed in a single screen. Also, the coefficients in columns E through V are a bit annoying. (If What'sBest! supported the SUMIF() function, they would be unnecessary.)

Another drawback to this list format is that a record may accidentally be left out, especially if the From and To fields contained city names rather than numbers. With numbers, we can scan vertically, and a jump from 1 - 4 to 1 - 6 would suggest that 1 - 5 is missing, but this would be much harder if the labels were city names rather than numbers.

However, this list format brings a new way to look at the problem and the solution. All flow constraints can be put in a single row at the bottom. Subtour breaking constraints are naturally and easily handled with a column, rather than tacked on as an afterthought. Best of all, this format allows us to sort the information in a couple different ways.

First off, we can sort by distance. Numerous ideas suddenly come to mind. We could probably leave out the longest distances and still get the optimal solution, or at least a good solution. Do we need symmetry everywhere? Probably not. So if the model were too big to solve, we can easily delete rows and still find a solution. At the bottom, we find all the degenerate links, which can certainly be deleted. After deleting all links that are longer than the average distance, and after deleting the degenerate links, the model fits on one and a half screens, while still giving the optimal solution (though optimality would not be guaranteed in the general case).

A heuristic comes to mind - why not sort in ascending order of distance, then select the smallest reasonable link? We could easily write a bit of Visual Basic to accomplish that for us. This would give us an excellent upper bound for our model.

We can also sort by the solution. If we find a subtour, it is easy to add a column at the right to constrain it away in the next run. Finding such a subtour could probably be done automatically. Also, we can see the solution concisely in a small area of the screen.

So how we display a model in the spreadsheet can affect how we think about it and how we might want to modify it.

A third way to set out our model is to mirror the structure of the conventional format for writing the algebra of a math model. The top is the objective function, and each following line is a constraint.

Some adjustment must be made for a spreadsheet format. In the spreadsheet, the objective function is actually at the far right, calculated with SUMPRODUCT(); similarly for the constraints. I put the What'sBest formulas in the same location as the “=” operator in the algebra, to the left of the right hand side constant of the constraint. However, the What'sBest formulas are not operators, but really the whole constraint. So the mapping from algebraic layout to spreadsheet layout is at best an approximation.

This model is 5 screens wide, much less visually compact than the other two. It is really just the same as the list format we saw before, but transposed, and re-labeled with some operations research jargon. Now, how could it be less compact than the previous model, when it is the same thing transposed? Simple: cells are 38 pixels wider, but only 17 pixels tall. There are no more than 20 columns per screen, but 28 rows. So, as we have seen before, a spreadsheet set out vertically tends to be more compact.

This structure is most appropriate for an operations research course, where the attention is on the model structure itself rather than solving the business problem. The linear programming style format is an abstract structure, but it is very general. Virtually any linear or integer programming model can be structured this way, and the format will be immediately recognized by a management scientist or operations researcher. Use it for teaching linear programming. Use a concrete structure for a business audience.

The decision tree

A spreadsheet handles tables and matrices beautifully, but a tree structure is somewhat more difficult. A decision tree can look fine on paper, and decision tree software can make decision trees look as they do on paper, even inside a spreadsheet. Unfortunately, the result is often visually low-density, spread over many screens. A simple calculation becomes confused into complexity by bad layout.[ADD MORE]

Using spreadsheet images

Put a gray box around the screen shot. Screen shots sometimes contain white that would appear to bleed into the surrounding white paper. A gray box gives some contrast.

Crop screen shots of blocks to show only the important parts. To show a spreadsheet, most of the time the reader is not interested in the title block, or menu bar, scroll bars, or an empty formula block. Crop those out to improve the graphic. This allows you to enlarge the graphic so the interesting part is larger.

Crop edges to be neater. Sometimes clipping just 0.05 centimeters from a side is enough to make a screen shot look much cleaner. Otherwise there can be distracting edge lines.

Put the cursor on an informative formula before you take the screen shot. This displays the formula for your key cell in the formula bar.