Excel Modeling Teaching Guide
Excel Modeling Teaching Guide
Spreadsheet
Modelling and
Implementation
By Paul Mireault
Structured Spreadsheet Modelling and Implementation
by Paul Mireault
All rights reserved. This book or any portion thereof may not be reproduced or used in any manner
whatsoever without the express written permission of the publisher except for the use of brief
quotations in a book review.
Second Edition
ISBN 978-0-9948834-4-5
URL: www.ssmi.international
Email: book@ssmi.international
Table of Contents
Table of Contents
Table of Tables
Table of Figures
Preface to the Second Edition
Foreword
Part I: Introducing SSMI
Chapter 1 Understanding Spreadsheets
Managing Spreadsheet Risk
Understanding the Spreadsheet Life Cycle
Developing Information Systems
Developing a Spreadsheet
Part I Overview
Part II: Learning the SSMI Methodology
Chapter 2 Understanding the SSMI Methodology
Learning the Components of a Structured Model
Using Modelling Rules
Case Study: Marco’s Widgets
Chapter 2 Overview
Chapter 3 Developing a Simple Model
Chapter 3 Overview
Chapter 4 Implementing a Simple Model
Using Three-Tier Architecture
Using Cell Names
Using Modules
Moving from Formula Diagram to Spreadsheet
Step 1: Data Sheet
Step 2: Model Sheet
Step 3: Interface Sheet
Chapter 4 Overview
Chapter 5 Developing a Repeating Sub-Model
Modelling Without a Repeating Sub-Model
Modelling With a Repeating Sub-Model
Conceptualizing Aggregate Functions
Chapter 5 Overview
Chapter 6 Implementing a Repeating Sub-Model
Step 1: The Data Sheets
Step 1.1: Non-Repeating Data Variables
Step 1.2: Repeating Data Variables
Step 2: The Model Sheets
Step 2.1: The Non-Repeating Model Sheet
Step 2.2: The Repeating Sub-Model Sheet
Step 3: The Interface Sheet
Implementing Aggregate Functions
Exercise: Expanding Regions
Chapter 6 Overview
Chapter 7 Understanding Time in a Spreadsheet
Using the Inventory Formula
Chapter 7 Overview
Chapter 8 Developing a Temporal Model
Marco’s Widgets With Monthly Demand
Temporal Formula Diagram
Chapter 8 Overview
Chapter 9 Implementing a Temporal Model
Implementing a Three-Tier Architecture
Step 1: Entering the Data Variables
Step 1.1: Non-Repeating Data Variables
Step 1.2: Repeating Data Variables
Step 2: Defining the Calculated Variables
Step 2.1: The Model Sheet
Step 2.2: The Temporal Model Sheet
Step 2.1 (Continued): The Finished Model Sheet
Step 3: Implementing the Interface Sheet
Implementation Sequence
Chapter 9 Overview
Part II Overview
Part III: Learning Advanced SSMI Topics
Chapter 10 Modifying a Model
Modifying the Formula Diagram
Modifying the Spreadsheet
Chapter 10 Overview
Chapter 11 Using Advanced Modelling Techniques
Modelling Special Cases
Using an Indicator Data Variable
Using a Distribution Data Variable
Using an Allocation Data Variable
Using a Calculated Indicator Variable
Using a State Indicator Variable
Using State Variables
Example: Using State Indicator Variables
Looking Forward in Time
Chapter 11 Overview
Chapter 12 Managing Spreadsheets
Checking for Errors
No Change
Wrong Direction of Change
Wrong Magnitude of Change
Negative Values
Extreme Results
Using Model Management Formulas
Input Validation Formulas
Model Validation Formulas
Unit-Changing Formulas
Protecting Worksheets
Chapter 12 Overview
Part III Overview
Conclusion
Appendix A Excel Toolbox
Naming Cells
Naming Single Cells.
Naming Rows or Columns.
Naming Ranges.
Using Names
Formatting Cells
Variable Definitions
Numbers
Input Variables
Conditional Format
Rounding
Using Date Arithmetic
Date System
Date Display
Building Text Strings
Table Lookup
Using the LOOKUP Function
Using the MATCH Function
Using the INDEX Function
Appendix B Glossary
Table of Tables
Table 2 1 Example of a Formula List
Table 2 2 Simplicity Rule Examples
Table 3 1 Complete Formula List, Simple Model
Table 5 1 Formula List, Marco’s Widgets South Region
Table 5 2 Formula List, Repeating Model
Table 8 1 Distribution of Annual Demand
Table 8 2 Temporal Model
Table 8 3 Formula List, Temporal Model
Table 10 1 Formula List, Revised Temporal Model
Table 11 1 State Indicator Variables
Table of Figures
Figure 1 1 Information System Analysis, Design and Implementation
Figure 1 2 Physical Model, SQL Statements
Figure 1 3 Logical Model, DSD of a Cruise Ship
Figure 1 4 Conceptual Model, UML Diagram of Cruise Ship
Figure 1 5 Spreadsheet Analysis, Design and Implementation
Figure 1 6 SSMI Process
Figure 2 1 Example of a Formula Diagram
Figure 2 2 Two Modelling Alternatives, Normal View
Figure 2 3 Two Modelling Alternatives, Formula View
Figure 2 4 Diagram of a Complex Formula
Figure 2 5 Diagram With Simple Formulas
Figure 2 6 Single Complex Formula vs. Many Simple Formulas
Figure 2 7 Spreadsheet With an Error
Figure 2 8 Relationship between Demand and Price
Figure 3 1 Input and Output Variables
Figure 3 2 Introducing the Data Variables
Figure 3 3 Defining Demand Using the Forward Approach
Figure 3 4 Defining Profit Using the Backward Approach
Figure 3 5 Defining Total Cost Using the Backward Approach
Figure 3 6 Defining Variable Cost Using the Backward Approach
Figure 3 7 The Final Diagram
Figure 4 1 Excel Sheet Names
Figure 4 2 Implementing a Module in a Block Structure
Figure 4 3 Step 1, Defining the Data Variables
Figure 4 4 Step 1, Naming the Data Variables
Figure 4 5 Step 2, Referencing the Variables Used In a Formula
Figure 4 6 Step 2, Defining the Calculated Variable
Figure 4 7 Step 2: Naming the Calculated Variable
Figure 4 8 Step 2, Including a Border
Figure 4 9 Step 2, Formula View of the First Block
Figure 4 10 Step 2, Repeating the Block Structure
Figure 4 11 Step 2, The Finished Model
Figure 4 12 Step 2, Formula View of the Completed Model Sheet
Figure 4 13 Two Ways of Representing a Subtraction
Figure 4 14 Step 3, Entering Input Values
Figure 4 15 Step 3, Input Values for Data Sheet
Figure 4 16 Step 3, References For Output Values
Figure 4 17 Analysis of a Single Scenario
Figure 5 1 Region-Independent Variables
Figure 5 2 Region-Dependent Variables
Figure 5 3 Calculating Revenue South
Figure 5 4 Calculating Demand South
Figure 5 5 Calculating Variable Cost South
Figure 5 6 Calculating Total Cost South
Figure 5 7 Calculating Fixed Cost South
Figure 5 8 Calculating Unit Cost South
Figure 5 9 Formula Diagram Without a Repeating Sub-Model (Three Regions)
Figure 5 10 Formula Diagram With a Repeating Sub-Model
Figure 5 11 Formula Diagram With Total Profit Added
Figure 6 1 Step 1.1, Naming Single-Value Data Variables
Figure 6 2 Step 1.2, Multiple-Value Data Variables
Figure 6 3 Step 1.2, Naming Multiple-Value Data Variables
Figure 6 4 Step 1, Verifying Data Variable Names
Figure 6 5 Step 2.1, The Model Sheet
Figure 6 6 Step 2.2, Setting Up the Repeating Entity
Figure 6 7 Step 2.2, First Block of the Regions Sheet
Figure 6 8 Step 2.2, Defining Regional Demand
Figure 6 9 Step 2.2, Formatting the Variable Row
Figure 6 10 Step 2.2, Naming the Variable Row
Figure 6 11 Step 2.2, Complete Model for One Region
Figure 6 12 Step 2.2, Selecting the Full Model of One Region
Figure 6 13 Step 2.2, Copying the Full Model
Figure 6 14 Step 3, Entering Input Values
Figure 6 15 Step 3, Referencing Input Values in the Data Sheet
Figure 6 16 Step 3, Reference Formulas For One Region
Figure 6 17 Step 3, Output Variables For All Regions
Figure 6 18 Setting up the Calculation of a Non-Repeating Variable
Figure 6 19 Aggregate Formula for Defining a Non-Repeating Variable
Figure 6 20 Naming a Variable that Uses an Aggregate Function
Figure 6 21 Referencing a New Output Variable
Figure 6 22 Adding a New Region to the Data Sheet
Figure 6 23 Adding a New Region to the Regions Sheet
Figure 6 24 Adding a New Region to the Model Sheet
Figure 6 25 Adding a New Region to the Interface Sheet
Figure 7 1 Inventory Model, One Month
Figure 7 2 Inventory Model, Three Months
Figure 7 3 Inventory Model With Split Variables
Figure 7 4 Inventory Model in a Repeating Sub-Model
Figure 7 5 A Three-Month Inventory Model
Figure 8 1 Variables Carried Over From the Non-Temporal Model
Figure 8 2 Calculating Monthly Demand
Figure 8 3 Inventory Variables and Formula
Figure 8 4 Calculating Inventory Cost and Sales Cost
Figure 8 5 Calculating Variable Cost, Option 1 of 2
Figure 8 6 Calculating Variable Cost, Option 2 of 2
Figure 9 1 Structured Model
Figure 9 2 Sheet Names of the Temporal Three-Tier Architecture
Figure 9 3 Step 1.1, Entering Single-Value Data Variables
Figure 9 4 Step 1.1, Naming Single-Value Data Variables
Figure 9 5 Step 1.2, Naming the Initial Values of Multi-Value Data Variables
Figure 9 6 Step 1.2, Defining the Repeating Entity and its Data Variables
Figure 9 7 Step 1.2, Naming the Repeating Entity and its Data Variables
Figure 9 8 Step 1.2, Validating a Multi-Value Data Variable
Figure 9 9 Step 2.1, Starting the Model Sheet
Figure 9 10 Step 2.2, Starting the Months Sheet
Figure 9 11 Step 2.2, First Block of the Temporal Model Sheet
Figure 9 12 Step 2.2, Defining the Inventory End Formula
Figure 9 13 Step 2.2, Initializing a (t–1) Variable
Figure 9 14 Step 2.2, Preparing the Inventory Beg Formula
Figure 9 15 Step 2.2, Defining Inventory Beg
Figure 9 16 Step 2.2, Finishing the Inventory End Block
Figure 9 17 Step 2.2, The Full Temporal Model
Figure 9 18 Step 2.2, Formula View of the Full Temporal Model
Figure 9 19 Step 2.2, The 12-Month Model
Figure 9 20 Step 2.1 (Continued), Defining Result Variables
Figure 9 21 Step 2.1 (Continued), Setting Up the Variable Cost Formula
Figure 9 22 Step 2.1 (Continued), Reference Formulas for all Periods
Figure 9 23 Step 2.1 (Continued), Defining Variable Cost
Figure 9 24 Step 2.1 (Continued), The Completed Model Sheet
Figure 9 25 Step 2.1 (Continued), Formula View of the Completed Model Sheet
Figure 9 26 Step 3, Input Variables in the Interface Sheet
Figure 9 27 Step 3, Input Variables in the Data Sheet
Figure 9 28 Step 3, Output Variables in the Interface Sheet
Figure 10 1 Starting Formula Diagram
Figure 10 2 Adding Sales
Figure 10 3 Influence of Sales on Sales Cost and Inventory End
Figure 10 4 Calculating Monthly Revenue
Figure 10 5 Defining Sales
Figure 10 6 The Final Formula Diagram
Figure 10 7 Inserting Rows for a New Block
Figure 10 8 Definition Block for Quantity Available
Figure 10 9 Definition Block for Sales
Figure 10 10 Definition Block for Monthly Revenue
Figure 10 11 Modifying Definition Blocks, Inventory End and Sales Cost
Figure 10 12 Formula View of the Modified Model
Figure 10 13 Copying the Modified Month in All Time Periods
Figure 10 14 Deleting the Old Total Revenue Definition Block
Figure 10 15 Inserting a New Total Revenue Definition Block
Figure 10 16 The New Total Revenue Definition Block
Figure 10 17 Testing the Modified Model
Figure 11 1 Original Model With Regions
Figure 11 2 Modified Model, Special Tax
Figure 11 3 Modelling With an Indicator Data Variable
Figure 11 4 The Loan Amount Data Variable Definition
Figure 11 5 The Loan Indicator Data Variable Definition
Figure 11 6 The Loan Calculated Variable Definition
Figure 11 7 Distribution Data Variable
Figure 11 8 Scholarship Calculation
Figure 11 9 Modelling an Allocation Data Variable
Figure 11 10 Irregular Case Variables
Figure 11 11 Modelling a Repeating Calculated Indicator Variable
Figure 11 12 Example of State Indicator Variables
Figure 12 1 Using a Unit-Changing Formula
Figure A 1 Naming Single-Value Cells
Figure A 2 Naming a Row
Figure A 3 Naming Columns
Figure A 4 Naming Ranges
Figure A 5 Name Auto-Completion
Figure A 6 The “Paste Name” Dialogue Box
Figure A 7 The “Use in Formula” Icon
Figure A 8 Formula Rounding and Display Rounding
Figure A 9 Discount Table, Approximate Lookup
Figure A 10 LOOKUP Function
Figure A 11 Product List
Figure A 12 LOOKUP by Product Description
Figure A 13 LOOKUP by Product Code
Figure A 14 MATCH Function
Figure A 15 INDEX and MATCH
Figure A 16 INDEX and MATCH (Formula View)
Preface to the Second Edition
The main reason for producing this second edition comes from feedback I received from spreadsheet
developers as well as academics to whom I presented the SSMI methodology.
I did a lot of computer programming during some periods in my studies and my career, and the term
parameter has a very specific meaning to me: it is a value supplied to a program (or a function, or a
module) so it can perform some calculation and produce a result. So, it was natural for me to use that
term to refer to the constants that are used in a spreadsheet. When talking to people who did not have
a programming background, like most spreadsheet developers working in different business activities
like finance and accounting, and like many academics teaching in similar areas, I almost always got
the reaction “Oh! You mean data.”
Since I developed SSMI for the spreadsheet developers, not for programmers, there was no reason
for me to use a term that did not convey the proper meaning. So, I did a major nomenclature change:
parameter became data throughout the book.
I also took the occasion to correct errors in the text and in some figures.
Foreword
In my 35-year teaching experience, I’ve always striven to make sure that my students learn the best
strategies for creating effective, error-free spreadsheets. What interests me most, after my decades of
teaching, is the way spreadsheet literacy is taught—or, often, not taught.
I’ve never liked the way Excel is typically taught in business schools, where lectures are either
template-based or “point-and-click”-oriented. In these types of courses, students rarely understand
how to conceptualize a problem and translate it into a spreadsheet, from start to finish. So, I
developed a methodology that would teach students exactly that: how to use Excel.
(Of course, Excel isn’t the only spreadsheet software available today. But since it’s arguably the most
widely available tool, and the one students are most likely to be familiar with, in this book I focus on
that particular program.)
Whenever I address audiences—whether they’re undergraduates, MBAs or executives—I always ask
them: “How did you learn to use Excel?” Most people say they “learned by doing”—sometimes with
the help of colleagues, who would show them a few tricks. Only a few actually attended classes, and
these were generally short (from a few hours to a couple of days).
Such brief exposure doesn’t make a person very adept at using this tool. Those classes generally
show only what Excel can do—the different functions available—rather than what people really need
to know: how to use the software in a business context.
This approach gives no real instruction on modelling, to contextualize how the tool is used—an
aspect that’s often neglected. As a result, the experience is like trying to learn a language by reading a
dictionary. You may learn the meaning of the words (the what), but not the syntax, structure or
subtleties (the how).
The same can be said of books. They may explain all Excel’s features, menus and functions; they may
even describe how to use the program in specific contexts such as finance, accounting, or marketing.
But they do not always offer the business cases that users need.
Another drawback is that they generally present students with a “blank canvas,” or at best only a
limited number of templates—and leave it to readers to figure out how to modify or adapt them to
their specific needs.
My book, however, takes a different approach. Instead of showing readers many examples to teach
them how to develop spreadsheets, it presents a methodology that they can apply to a wide range of
situations and needs.
Excel itself follows precise rules, specifically designed to reduce the possibility of errors—both
during the initial creation of a spreadsheet, and during its later maintenance or modification. My
methodology builds on this foundation to make the process easier and less error-prone.
In my view, two of the major problems that bedevil new Excel users are the illusions of simplicity
and productivity—as I describe below.
The Illusion of Simplicity. One of the reasons Excel is such a powerful and popular tool is its
relative lack of structure. This “free-form” concept often appeals to novice users, who are comforted
by its lack of intimidating detail. But as they become more experienced with the program, they often
impose some structure of their own. They may identify input cells with colour, format the labels that
describe the cells’ content, or separate data from formulas in different areas of their spreadsheets.
In my view, this apparent need for self-imposed structure indicates that the simplicity of Excel is an
illusion. More to the point, this lack of structure hinders the program’s ability to be shared,
maintained and audited.
Let me illustrate this with an analogy. Many industries have adopted codes—rules that guide people
on what the accepted norms are. All kind of professionals—plumbers and engineers, dentists and
architects, drivers and chefs—are taught the importance of norms, and must follow strict guidelines.
Electricians, for example, must use certain colours to differentiate specific types of wires. Nobody
disputes the benefit of this: all-black wiring would be problematic, even dangerous. The different
colours signal to all electricians exactly what has been done, allowing one person to work on wiring
installed by someone else—even many years later.
The same is true for software programmers, who also often use coding conventions for tasks such as
naming variables. As with electricians, this allows one programmer—perhaps the person responsible
for maintenance—to see at a glance what the original programmer did, and what previous
modifications may have been made. Otherwise, they would have to spend hours trying to understand
the original intent.
This is a major benefit of such standards: they allow work to be handed off. This concept—passing
on a spreadsheet to someone else, either for their personal use or for further development—is one of
the major benefits of the SSMI approach.
I’d like to emphasize this point, simply because—under existing conditions—the handing off of a
spreadsheet from one person to another is often simply not possible. Most users have no training in
using norms, and are usually on their own when defining the terms they need.
And because any spreadsheet they create is based on nothing more than their personal preferences, it
can’t easily be interpreted by others who are unfamiliar with the norms they invented and used. (It
may even become a mystery to the creator, who a few weeks later may have forgotten the original
principles!)
That fact leads to the second common illusion.
The Illusion of Productivity. Most people fire up Excel in order to solve a specific problem—and
then they sit and stare at Cell A1, thinking: “What do I do now?!” They may start by writing a few
labels and formulas. They decide that these don’t do the job; they delete their work; look up some
Help topics; go in another direction; try another approach; find it’s a dead end; and so on.
In fact, a study of the typical spreadsheet creation process found that most participants spent very
little time planning their strategy before launching into the creation process; and they then spent 21%
of their time pausing—presumably to engage in the above activities[1].
Eventually these people either achieve some kind of result, or give up in frustration. But either way,
this is a futile and time-wasting process. Such users are under the impression that they’re being
productive, but that’s an illusion.
So what can be done to disillusion Excel users? I like to think that the best strategy is to learn to use
the SSMI methodology, which I introduce in this book. Quite simply, its goal is to make it simpler for
you to become more productive at creating effective spreadsheets.
I developed the process from proven concepts used in the fields of information systems and software
engineering, and fine-tuned it over twenty years of teaching. The aim of this methodology is to
reorient users to splitting the creative process in two: first developing a model, and then
implementing the model as a spreadsheet. This book teaches you to treat those activities as separate,
with the one a necessary precursor to the other.
What exactly does Structured Spreadsheet Modelling and Implementation mean?
Structured Spreadsheet. As I mentioned above, Excel’s relative lack of structure hinders many users
from using the program as well as they might. It limits the extensibility of a spreadsheet, and its
ability to be shared. SSMI methodology focuses on teaching developers how to structure their
spreadsheets. A well-constructed spreadsheet should be quick and easy for people to understand—so
it can be easily handed off to other developers, or understood by colleagues, bosses, clients, auditors,
etc.
Modelling. The basis of the SSMI methodology is the separation of creative and mechanical tasks.
The activity of spreadsheet modelling—the creative process—is how you conceptualize the problem.
The idea is to build the model without even touching Excel. This allows you to focus on the
conceptual process, and relieves you of the need to refer to the software.
Implementation. This is the industry term for translating the conceptual model of the problem into a
spreadsheet. Since effective implementation depends almost totally on effective modelling, the SSMI
methodology renders this mechanical process relatively straightforward.
Using this methodology has many benefits—both for users (the people whose work depends on the
spreadsheets), and even more for developers (the people who create the spreadsheets). When
developers follow SSMI methods, both they and the users find it faster and easier to:
understand what they’re doing when they create a spreadsheet, which reduces the probability
of errors;
review a spreadsheet, and test it for errors;
explain to others how a spreadsheet works;
modify a spreadsheet to adapt it to changing situations;
maintain a spreadsheet throughout its lifecycle;
manage a spreadsheet in an organizational setting.
To cover all this material, and to show you in detail how to become a spreadsheet expert, this book is
divided into three parts.
Part I provides a general overview of the spreadsheet development process (Chapter 1).
Part II introduces a case study: Marco’s Widgets, a fictional company I’ll use throughout much of the
book to illustrate SSMI principles and processes. Part II also explains the three main types of
spreadsheet model, and outlines how to apply the methodology—including how to:
understand basic SSMI concepts (Chapter 2)
develop a simple model (Chapter 3)
implement a simple model (Chapter 4)
develop a repeating sub-model (Chapter 5)
implement a repeating sub-model (Chapter 6)
understand time in a spreadsheet (Chapter 7)
develop a temporal model (Chapter 8)
implement a temporal model (Chapter 9).
Part III looks at more advanced spreadsheet techniques, including how to:
modify a model (Chapter 10)
learn modelling techniques (Chapter 11)
manage spreadsheets (Chapter 12).
Finally, the book concludes with Appendices that give extra information and reference material.
Part I: Introducing SSMI
Chapter 1: Understanding Spreadsheets
Managing Spreadsheet Risk
Understanding the Spreadsheet Life Cycle
Developing Information Systems
Developing a Spreadsheet
Chapter 1 Understanding Spreadsheets
How to use this chapter
This chapter is a leisurely read. Pay attention to the importance of the conceptual model, which is a
way of representing your problem without referencing the technology that will be used for the
implementation. This is a key concept of the SSMI methodology.
Managing Spreadsheet Risk
For many people nowadays, the spreadsheet is a work tool they can’t imagine living without. A
powerful and flexible way to accomplish many common tasks, spreadsheets are widely used in a
variety of fields. Despite this ubiquity, however, the sad truth is that most spreadsheets—over 90%,
according to some research—contain at least one error[2].
And all these errors can lead to serious consequences. Depending on the area the spreadsheet is used
in, results might include losses of things such as profits, share value, investor or shareholder
confidence, financial reputations, or even loss of jobs or careers. Other damage might include false
declarations, public embarrassment, overestimation of revenues, underestimation of costs, extra audit
costs, and so on. Spreadsheet errors can cause a whole lineup of horror stories.
Some of the most notorious of these include the following news items.
An accounting error in a financial reporting spreadsheet that wrongly valued a British
pension-fund deficit, leading to a £4.3 million write-down of profits (and the resignation of
the company’s CEO)[3].
At the 2012 London Olympic Games, an error in a spreadsheet caused 10,000 more tickets to
be sold than there were seats for a synchronized swimming event—leading to a lot of
frustrated spectators[4].
An error in a decision support spreadsheet caused the loss of $2.4 billion for the financial
giant JPMorgan Chase. The company’s internal review procedure failed to catch the error,
which meant that the difference in rates was divided by their sum, rather than by their
average[5].
An error in a data analysis spreadsheet led some economics researchers to wrong conclusions.
Unfortunately, this was discovered only after their research paper was published, and used as
a basis for government economic policies[6].
The consequence of such a high error rate is that many executives now distrust the accuracy of
spreadsheets, and worry that decision-makers are not getting the full benefit from the time and effort
spent building complex models[7]. One result has been an increase in the activity of spreadsheet
verification. Some large businesses now have internal groups dedicated to auditing the organization’s
spreadsheets.
But although this aspect of risk governance and control is important, it’s also often an extremely
tedious and time-consuming task—and hence is sometimes even skipped altogether. This is especially
true when spreadsheets lack consistency. This is why the SSMI methodology insists on building
consistent spreadsheets right from the beginning of their life cycle.
Understanding the Spreadsheet Life Cycle
Most spreadsheets start their lives as productivity tools, developed to meet a specific need. To
illustrate the real-world usefulness of the SSMI methodology, let’s look at a typical such tool.
David works in the marketing department of a kitchen appliance company, where he’s responsible for
product management. He built a spreadsheet to keep track of the activities in the product lines he
supervises. Because he’s the only user, and he knows its workings intimately, he doesn’t need any
documentation. Any codes or norms are in his own head; any errors affect only him; and any
corrections he makes don’t need to be passed on to others.
However, his spreadsheet makes David so productive that his colleagues (the other product
managers) clue in to the existence of this useful tool, and beg him for a copy. David’s a nice guy,
willing to share, so he spends some time showing them how to use it. As word spreads, and more
people want in on the action, he has to spend more and more of his time training his colleagues and
answering their questions. He decides to document his spreadsheet by writing labels that are more
descriptive.
Colleagues like his tool so much that they start suggesting modifications and enhancements, to make it
more useful to them. David is in a quandary. On the one hand, he derives some satisfaction from being
appreciated by his peers. But on the other hand, the task of improving his creation is taking many
hours of time away from his own work. Unless he can convince his boss to acknowledge the valuable
contribution he is making to the overall organization, his spreadsheet will actually have a negative
impact on his own performance.
Let’s say that David’s spreadsheet does become an accepted business tool, one his colleagues all
depend on. But a few years later, he leaves the company. What happens then? People can continue to
use his program—but unless another expert can be found to maintain and upgrade it, it will gradually
become less and less useful. It may become the kind of tool commonly known to engineers as a black
box: a device that people use without actually knowing how it works. David’s former co-workers
may still plug information into his system, and get results out of it. But unfortunately, nobody knows
exactly how the results were obtained.
In fact, such a story is quite common. Research has shown that the lifespan of the average spreadsheet
is five years, and in that time it’s usually used by twelve different people [8]. Given this fact, it is
important to create spreadsheets that can be understood, maintained and modified by any of the
people who use it during its lifetime. This means reducing a spreadsheet’s opacity. In this situation,
for instance, if David had created a structured spreadsheet from the beginning, he would not have had
to spend so much time explaining it, and training his coworkers.
To address these issues, it helps spreadsheet developers to be familiar with some basic concepts
from other fields, such as software engineering and computer science. Particularly important is the
field of information systems, which I’ll describe below.
Developing Information Systems
I’ll begin by briefly showing you how information systems are generally developed—the typical
process of analysis, design and implementation.
An information system is the set of tools—computer programs, databases, and processes—that are
needed to make an organization function efficiently. Over the years, Information Systems specialists
have devised a methodology to help the Information Technology (IT) developers to create systems
that satisfy their users’ requirements. This methodology ensures that the creators fully understand even
complex processes before they begin the actual programming.
In this section, I will describe the three models of the typical development process for information
systems:
the physical model is a set of data tables created with an SQL statement, and managed by a
database management system;
the logical model is represented by a data structure diagram, a description of data tables that
specifies the relationships between them;
the conceptual model is represented with a diagram in Unified Modelling Language.
These three models are illustrated in Figure 1 1. In this simplified view, they are used in succession.
Even though the process appears to be sequential, feedback loops allow for changes to previous
steps. I explain the three models (in reverse order) in more detail below.
Modelling is not an easy task for the developer. It’s a creative process that creates links between
variables using formulas to compute the values. To do this successfully, you must have a certain
facility with math and logic.
You should also be familiar with the principles of the specific field you’re working in. For instance,
if you’re building a model designed to make decisions in the field of business management, it’s
important to know about the concepts of managing inventory, sales force, human resources, etc.
While input and output variables are usually fairly easy to determine, the calculated variables require
some imagination. This is not necessarily the kind of artistic imagination you need to write a book or
paint a picture. Rather, it’s a more structured way of thinking that can visualize the relationships
between variables, and represent them as precise formulas.
Later in this chapter, I’ll introduce a case study that will be used as a step-by-step demonstration of
the SSMI methodology. My goal is to help you to understand the fundamentals by using a relatively
simple business situation, with common problems. I’ll explain how to create spreadsheets for these
models, which can be used for more efficient decision-making. This process can then be scaled up to
deal with larger and more complex models.
Learning the Components of a Structured Model
The modelling process we’re about to develop has two major components: a Formula List, and a
Formula Diagram.
The Formula List is a definition of all the variables, giving their name, values or mathematical
formulas. These variables are categorized into four types:
data variables
input variables
calculated variables
output variables.
An example of a simple Formula List is shown in Table 2 1.
Table 2 1 Example of a Formula List
The Formula Diagram: a graphic representation of the variables and their relationships. Figure 2 1 (a
copy of Figure 3 7, from the next chapter) shows how a simple Formula Diagram looks.
Figure 2 1 Example of a Formula Diagram
I define all these types in more detail below; but first, a note about the formatting you’ll see in the
following material. To give you a quick visual guide, I’ve set all the variable names and formulas—
everything you can enter into an Excel cell—in a bold and italic font. This means when you see
something in that kind of type, you know it’s a cell entry.
Using Modelling Rules
When a formula contains only a few elements, it’s easy to explain and share with others—and also
easy to maintain or modify.
Mathematically, the same result is obtained with each of the two alternatives shown in Figure 2 2. But
one is more easily comprehensible than the other. In the top left-hand cell are four numbers:
$2,500,000, 13,062, $180 and $4,851,160. Looking at this, a reader might need time to understand
and validate the underlying formula.
By contrast, the right-hand side of the spreadsheet shows the structured implementation created from
the Formula Diagram. Without even having to look at the formula in the cell, each block can be
readily understood.
The Rental Cost formula is represented in the left side of the table as one big formula. Whereas, to
the right, it has been broken down into five very simple formulas, creating four new variables. At a
first glance, it’s much easier to understand the set of short formulas on the right, each restricted to one
operation, as opposed to decipher the complex formula on the left.
The same principle—breaking down a complex formula into several simple formulas to improve
readers’ comprehension—applies to the two diagrams shown in Figure 2 4 and Figure 2 5. Again,
it’s easier to explain the diagram of Figure 2 5 to others.
Figure 2 4 Diagram of a Complex Formula
Figure 2 5 shows a diagram with simple formulas, which facilitates its implementation as a
spreadsheet. This deconstructed model gives a spreadsheet developer (either the original builder of
the Formula Diagram, or another person) a clearer idea of how to build the physical model.
Figure 2 5 Diagram With Simple Formulas
Once again, looking at the two representations of Rental Cost in Excel, you can see an evident
simplicity with broken down formulas. Figure 2 6 shows how the complex formula and the simple
formulas would appear in a spreadsheet. Readers can quickly understand each calculation on the
right, but would be hard pressed to verify the calculation on the left.
Figure 2 6 Single Complex Formula vs. Many Simple Formulas
Breaking down a complex formula into many simple formulas also has the advantage of making errors
easier to notice and correct. Figure 2 7 shows a spreadsheet that contains an error. Without
comparing the numbers with those in Figure 2 6, can you find the error in the left side? You should
find it easily in the right side.
Figure 2 7 Spreadsheet With an Error
Let’s take these basic principles of conceptual modelling, and apply them to a case study.
Case Study: Marco’s Widgets
Marco is the owner of a widget plant. He specializes in manufacturing platinum widgets, which he
sells to distributors worldwide. Each widget costs $180 to make, and Marco’s plant has a fixed cost
of $2,500,000 per year. In an attempt to increase his market share, Marco hired a marketing specialist
to evaluate the potential demand for his product. She found that the relationship between demand (D)
and average selling price (P) could be approximated by the following formula:
D = 376,000 x 1.009-P
This relationship is illustrated in Figure 2 8.
The Formula Diagram is designed to help you clearly visualize the variables in the Formula List. As
you can see in Figure 3 7, each element of the Formula Diagram is shown in a specific way:
Data Variables are represented by triangles,
Input Variables are represented by squares,
Calculated Variables are represented by circles,
Output Variables are represented by ovals.
To indicate the relationships between variables—when one is part of the formula of another variable
—the diagram uses arrows. Every calculated variable should have at least one incoming arrow.
Creating a Formula Diagram is a skill, and practice and experience are important. The following
seven steps illustrate some basic techniques to help the process.
In the example shown in Figure 3 1, based again on Marco’s situation, you can see that the input
variable (Price) is on the left side of the diagram, and the output variable (Profit) is on the right side.
This means you must now determine which variables and formulas—represented by the question mark
—will eventually link the former to the latter. Identifying these is the creative task of the conceptual
model.
When you draw arrows from each of the three variables used in the definition formula to the variable
being defined, you see the diagram shown in Figure 3 3.
Let’s add the two new calculated variables to the Formula Diagram, shown in Figure 3 4.
With this information, you can add one new variable and two arrows, as illustrated in Figure 3 5.
Finally, you get to actually create a spreadsheet! First, though, I want to talk about some development
strategies, borrowed from best practices in the fields of information systems, software engineering
and computer science. There are three of these, described in detail below.
Using Three-Tier Architecture
Using Range Names
Using Modules
Using Three-Tier Architecture
This concept was developed to improve the performance and management of information systems. It
consists of separating the three main aspects of systems; building them separately; and connecting
them with the appropriate relationships.
These three parts are:
the interface, the interactive tool that allows people to use the system;
the application, the programs that implement the system’s business logic;
the services, the application’s auxiliary services—such as getting data to and from a database,
or accessing network resources.
In terms of spreadsheets, this three-tier architecture is achieved with the use of single-purpose Excel
worksheets, each one having a specific task. The three basic sheets you’ll always need are Data,
Model, and Interface.
The Interface sheet is where you enter the input and output variables. Spreadsheet users will
work mainly in this sheet, assigning values to the input variables and examining the results
obtained as output variables.
The Model sheet is where you define the calculated variables, with blocks of simple formulas.
The Data sheet is where you define all the data variables, including the inputs. These values
can be either constants, or references to the Interface sheet, as long as you name them here. (As
well, this sheet should not contain any formulas—except for unit changing or validation
formulas that are not part of the model. This is explained in Chapter 11.)
The creation of these three sheets is illustrated in Figure 4 1.
In some situations, you may find that sets of variables have similar formulas. The obvious solution
seems to give them different names: Profit Region A, Profit Region B and Profit Region C, for
example. But if you use the straightforward modelling technique shown earlier, the resulting model is
unwieldy and difficult to modify. The simpler way is to identify variables and formulas that are
similar, and group them into a repeating sub-model.
Let’s take a look at how a repeating sub-model can reduce complexity. First, let’s develop a model
without this modelling technique. Then you can do it again with a simplified Formula Diagram.
In Marco’s case, he sells his widgets in three different regions: South, East and North-West. He now
wants a model that will show him his profit per region, as well as his total profit. In terms of demand
for his product, he knows that the regions break down as follows:
South: 48%
East: 23%
North-West: 29%.
Previously, Marco had used an across-the-board Unit Cost of $180. But now that he is considering
regions, he can separate that figure into Manufacturing Cost and Delivery Cost: The
Manufacturing Cost is $120, irrespective of region; and the Delivery Cost varies by region:
$50 for the South region
$80 for the East region
$60 for the North-West region.
To calculate the profit for each of these regions, Marco must allocate the Fixed Cost to each region,
with the same distribution as the demand.
Modelling Without a Repeating Sub-Model
Let’s first look at how you would build the model with the methodology shown in Chapter 4.
Beginning with the South region, you start the Formula Diagram with the six variables that are region-
independent—that is, they do not depend on the specific regions. These variables are presented in the
formula list below.
Another important variable is Demand, which is defined with the following formula.
The diagram resulting from adding the region-independent variables is shown in Figure 5 1.
Figure 5 1 Region-Independent Variables
As you can now see, all the other variables from the previous model are region-dependent: that is,
they change according to the region. These variables are added below in the formula list, which also
allow you to define the Profit South formula.
If you leave the South Region and continue to the East Region, you notice that the variables and the
formulas are similar. The only difference is that the variables use the suffix East instead of South.
The same is true of the North-West region, with the suffix N-W. Without using a repeating sub-
model, you would end up with an extremely complex Formula Diagram—as illustrated in Figure 5 9.
Figure 5 9 Formula Diagram Without a Repeating Sub-Model (Three Regions)
This model has obvious shortcomings. The most serious one is that it’s very complicated, and not at
all easy to understand. It also does not scale at all well. Imagine how incomprehensible it would
become if you had to expand it to cover more regions, such as provinces, states or countries! Canada
has 13 provinces and territories, India has 29 states, the USA has 50 states, France has 96
departments—so expanding the model to cover such divisions is completely infeasible.
Another shortcoming of this model is that any modifications (such as adding variables) would have to
be repeated many times—and repetition always increases the risk of introducing errors. For these
reasons, the repeating sub-model is easily the best solution.
Modelling With a Repeating Sub-Model
The advantage to using a repeating sub-model is that it greatly reduces complexity. The key to its
effectiveness lies in naming the variables with suffixes. Instead of assigning one variable to each
region, you instead use one variable to represent any region. This allows you to replace the variables
Delivery Cost South, Delivery Cost East and Delivery Cost N-W by one sole repeating variable:
Delivery Cost. Following Marco’s calculation of regional delivery costs at the start of this chapter,
you can define this variable as a set of three values: $50 (South), $80 (East) and $60 (N-W).
When you use the same variable name more than once, you need to somehow distinguish its different
roles. For instance, if you use Demand to represent both a single value and multiple values, you
might rename the two Total Demand and Regional Demand.
Applying the same principle to other repeating variables, you can redefine them as Distribution,
Regional Demand, Delivery Cost, Unit Cost, Revenue, Variable Cost, Regional Fixed Cost, Total
Cost, and Profit.
In your Formula Diagram, you represent a repeating sub-model by using a rectangle with a dotted
border. In the top right-hand corner of the rectangle, enter the name of the repeating entity (in this
case, Region). Then place all the region-dependent variables inside this sub-model rectangle, with
all the other variables outside—as illustrated in Figure 5 10.
Figure 5 10 Formula Diagram With a Repeating Sub-Model
All the variables outside the rectangle represent single values, and all the variables inside the
rectangle represent multiple values.
Conceptualizing Aggregate Functions
Looking at the current diagram, Marco notices that one important output is missing: the Total Profit.
He wants this included in the result. To make the necessary adjustments, you should look at Figure
5 10, above, to see where you can insert that new variable.
The Total Profit is calculated by summing all the regional Profits. Because this new output is a
single-value variable, it’s outside of the rectangle. You should produce something similar to the
adjusted Formula Diagram, in Figure 5 11.
Like the simple model shown previously, the repeating sub-model follows a precise process. This
time, rather than the basic three sheets described earlier, you’ll need five sheets. The process of
creating them again follows the three-tier architecture, as outlined below:
Step 1: Two data sheets, one for the non-repeating data variables (Data) and one for the
repeating data variables (Data-Regions);
Step 2: Two model sheets, one for the definition formulas of all non-repeating calculated
variables (Model), and the other for the definition formulas of all repeating variables
(Regions);
Step 3: The Interface sheet.
Step 1: The Data Sheets
Then you modify your spreadsheet to take this change into consideration. In the Data sheet, enter the
following information:
change the name of the South region to Upper South,
change the values $50 to $62, and 48% to 30%,
insert a column after Column B,
in the empty Column C, add the values Lower South, $30 and 18% for the data variables
Region, Delivery Cost and Distribution.
These adjustments are shown in Figure 6 22.
This chapter presents a special case of the repeating sub-model: the temporal model. When you
introduce time periods to a model, you can use their implicit ordering to reference the next period, the
previous period, or even two or three periods ago.
Using the Inventory Formula
To understand how to develop a temporal model, you need to use the inventory formula. Businesses
use this to calculate their inventory at the end of a certain period (without having to send an employee
to physically count the items on the shelves). For the month of January, for instance, the formula can
be written as:
Inventory End January = Inventory End December + Purchases During January – Sales During
January
This formula is illustrated in Figure 7 1.
Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec
8% 9% 10% 15% 11% 7% 5% 10% 6% 5% 6% 8%
As well as the distribution of demand, Marco wants to add another improvement to his model. His
first version did not consider monthly inventory separately, so he included the cost of the inventory in
the data variable Unit Cost. But now that he can calculate his monthly inventory, he wants the model
to track its cost as well. To do this, he splits the former Unit Cost into two variables:
inventory cost (Unit Inventory Cost = $20)
the cost of producing and selling the widgets (Unit Sales Cost = $150).
The Unit Inventory Cost will be used with the inventory at the end of each month and the Unit Sales
Cost will be used with the monthly Demand.
At the moment, Marco’s machines are set to produce 1,200 widgets per month. Resetting the machines
to produce at a different rate is a complicated operation, one that can only be done once a year.
Marco wants to use the model to evaluate the impact on his profit of different monthly production-rate
scenarios. The next section will examine this process.
Temporal Formula Diagram
To start developing a temporal model, draw a dotted-line rectangle in your Formula Diagram to
represent the repeating sub-model. Inside it, insert your repeating variables (all those with a value
corresponding to its instance in time); and around it, place the single-value variables. Figure 8 1
shows such a model, using the variables from Chapter 5 and Chapter 7.
Now continue building the model, using either the forward or the backward approach. With the
former, from Demand and Monthly Distribution you can calculate the formula as shown below.
Since this new variable is calculated using a temporal variable, it’s in itself a temporal variable; so
you should put it inside the temporal sub-model rectangle, as shown in Figure 8 2.
Figure 8 2 Calculating Monthly Demand
You now have all the elements you need to calculate the variables and formulas of the inventory
formula. Monthly Demand represents sales (output), and Monthly Production represents purchases
(input). The two new formulas to calculate the inventory are shown below.
The initial value of a variable involved in a time reference formula, Inventory End in this case, is
specified in its definition formula. This will ensure that the first column of the model’s
implementation is the same as all the others.
The initial value is implicit, and we don’t need to show it in the Formula Diagram, as illustrated in
Figure 8 3.
Figure 8 3 Inventory Variables and Formula
This allows you to calculate the monthly Inventory Cost and the monthly Sales Cost as shown below.
The last chapter examined two equivalent models, which differed only in the way they calculated
Variable Cost. This chapter will look at how to create a temporal model from the following
structured conceptual model, shown in Figure 9 1.
Figure 9 5 Step 1.2, Naming the Initial Values of Multi-Value Data Variables
Now define the repeating entity and the multi-value data variables. In this case the repeating entity is
Month, so enter it and the multi-value data variable names into Column A. Next enter the name of the
instances, alongside the entity name, beginning with the previous month (to let you model the (t–1)
references properly). Since this model starts in January, the first name in Column B is December, as
shown in Figure 9 6.
Figure 9 6 Step 1.2, Defining the Repeating Entity and its Data Variables
Finally, name the entity and the multiple-value data variables. Select the entire rows (by clicking on
the Row Number on the left side). Then use the Create Names From Selection dialogue box and make
sure that Left column is the only checkbox selected, as shown in Figure 9 7.
Figure 9 7 Step 1.2, Naming the Repeating Entity and its Data Variables
When dealing with data variables that represent a distribution, you can verify their correctness by
checking that their sum is 100%. For that purpose, you may want to insert a validation formula below
the data variable to check its sum, as shown in Figure 9 8. (Validation formulas are presented in more
detail in Chapter 12.)
In some situations, you may need to modify a model—either after it has been built, if you discover
some flaws during testing; or later in its lifecycle, if you decide to reconfigure it to meet new
requirements. Such corrections and enhancements to satisfy changing needs are called maintenance.
To illustrate the steps of a maintenance operation, consider Marco’s Widgets. In Chapter 8, you may
have noticed an important flaw: the model assumed that Marco’s monthly production level would
always allow him to satisfy demand. In practice, Marco could possibly produce less than the demand.
Let’s re-examine that final model from Figure 8 5. It’s shown again in Figure 10 1.
However, the diagram is not yet complete: Sales is not defined, and Monthly Demand influences no
other variable. You have to find the relationship between these two variables. You know that Sales
may, at times, equal Monthly Demand; but by definition it can’t be higher than Monthly Demand.
When might it be lower? When Marco does not have enough products to sell. To address this
situation, you can add the variable Quantity Available to our model, and define Sales as shown
below.
The key to developing a good repeating sub-model is building a model that’s valid for all instances of
a repeating entity. It’s important that you not program exceptions into the model to influence its
behaviour. Rather, you should transform any exceptions into a regular situation by carefully choosing
values for repeating data variables.
Modelling Special Cases
In some irregular cases, one or more instances of a repeating entity does not follow the same rules as
the others. In the case of Marco’s Widgets model with regions from Chapter 5, for example, there
might be a special tax levied only in the East region. To model this fact, you might think of using an IF
function, as defined in this formula:
Tax = IF(Region= “East,” Special Tax, 0).
The problem with this approach is that the formula contains the constant East—and constants in
formulas complicate future maintenance. There may still be problems even if you create a new data
variable, Special Tax Region, and change the formula to:
Tax = IF(Region=Special Tax Region, Special Tax, 0)
In this case, you assign the value East to the new data variable. But what would happen if another
region also imposed a special tax? What would happen if the tax rate in the other regions was
different? Obviously, these would require complex IF functions.
Similarly, looking at Marco’s temporal model from Chapter 8, if you want to model an increase in
inventory cost starting in October, you might use the formula:
Monthly Unit Inventory Cost =
IF(Month>=10, Unit Inventory Cost, Unit Inventory Cost + Cost Increase).
However, this is also problematic, for two reasons. First, the formula mixes an operator and a
function, which goes against the Simplicity Rule. Second, it’s not flexible. What would happen if the
increase in monthly inventory cost changed, or if the increase was spread gradually over a few
months?
Luckily, there is a simple solution to these problems. You just approach all irregular cases as if they
were regular, and as if they existed for all instances of a repeating entity. But if you create a variable
to handle such a case, it’s important that you apply it to all instances of the repeating entity.
You must choose which approach you prefer: either a single value used consistently in all instances,
or a set of values with one value for each and every instance.
This rule lets you avoid the use of an IF function to treat irregular cases. In our first example of
Marco’s regional model from Chapter 5, to model the special tax, you start with the original model,
re-illustrated in Figure 11 1.
Figure 11 1 Original Model With Regions
Then you introduce a multi-value data variable, Special Tax (with values of 0%, 5% and 0%). This
data variable, used with Regional Demand, calculates the Tax variable with the following formula:
Tax = Regional Demand * Special Tax.
You can see the new variable in Figure 11 2.
Figure 11 2 Modified Model, Special Tax
Where in the diagram do you specify that the special tax is only for the East region? The answer is
Nowhere. The model describes the calculations that need to be made in each region, and you do not
model exceptions. In the Data sheet, you specify that the Special Tax is only for the East region. In
fact, this approach also allows you to specify different tax rates for different regions, making it more
flexible. You do not need to change the formulas if the situation changes.
There are three different ways of modelling special cases. These are often interchangeable, and you
can select the one that best suits the way the data is presented. These three approaches involve
different types of data variable:
an indicator data variable,
a distribution data variable,
an allocation data variable.
To illustrate these approaches, let’s consider a new case: Josie is a student and she wants to model
her budget for the next school year. Among the activities the model has to track are her incomes and
expenses:
$2,250 in tuition fees, spread over three payments ($950 each in September and January, and
the remaining $350 in June);
a student loan of $2,915 she will receive in October;
a $2,085 scholarship she will receive (in three equal parts in November, January and March).
Using an Indicator Data Variable
This is a repeating data variable to which you assign the value 1 when the modelled event does occur,
and the value 0 when it doesn’t occur. The multi-value indicator data variable is usually accompanied
by a single-value data variable to calculate the repeating calculated variable to be used for each
instance. To calculate this, you simply multiply the indicator data variable and the single-value data
variable. Since the indicator is always either 0 or 1, the repeating calculated variable takes the value
of the single-value data variable when the indicator is 1, and the value 0 otherwise.
In Josie’s budget, you use Loan Indicator as the indicator data variable, Loan Amount as the value
to be used, and Loan as the amount she’ll receive every month. Figure 11 3 illustrates the model
portion.
Since you have a (t–2) reference here, the temporal model will have two initialization periods,
Weeks numbered -1 and 0. The user needs to supply an initial value for Trip Odometer End for
Week 0, and two initial values for Maintenance Start for Weeks -1 and 0. As the developer, you
must also initialize the variable Maintenance in Progress End in Week 0. For this, you have two
choices:
you supply an initial value—in which case you run the risk of having an inconsistent initial
situation (such as Maintenance Start of Week 0 set to 1, and Maintenance in Progress End
of Week 0 set to 0);
you define formulas in the initialization columns to ensure that the value of Maintenance
Indicator End is always correct.
This chapter addresses basic approaches to the management and maintenance of spreadsheets,
focusing on verification—in the hope that you’ll be able to avoid that 90% error rate. This body of
knowledge isn’t one you can pick up quickly, though: spreadsheet management can be a full-time
occupation, even a profession. The two basic tools you’ll find most useful are error checking, and
model management formulas.
Checking for Errors
Unfortunately, this activity is often glossed over by many spreadsheet developers, even experienced
ones. There are three different types of common errors:
logic errors: a formula may be wrong, or an important variable omitted;
implementation errors: a bad reference or a wrong function;
usage errors: a wrong value may be entered, or a correct value in the wrong place (these
types of errors usually happen during decision-making, analysis or operations).
As you can tell by now, some of these errors are specific to certain stages of the spreadsheet process:
developing or conceptualizing the model, implementing the logic, or using the finished product.
All errors are hard to identify, but usage errors are especially so. If you’re creating a model for other
people to use, obviously you’re not responsible for any errors they make. Still, you can play an
important role by providing users with the tools to more easily spot usage errors.
Developers themselves are usually bad at finding their own mistakes: they’ve been involved with the
model for so long that they may look straight at a wrong formula without noticing anything amiss. And
it’s important to catch any errors (existing or potential) as soon as possible, to prevent them from
accumulating. Think of your spreadsheet errors as like compounding interest charges: the longer they
exist, the costlier they are. You might call such a situation spreadsheet debt.
That’s why this section offers a few techniques to help you identify the presence of errors. Once you
know they exist, it’s easier for you to find where they are. One useful strategy, if you’ve become too
familiar with a model to spot any mistakes, is to have a colleague verify your spreadsheet.
Another strategy is to test whether the values of your calculated and output variables make sense in
terms of your original input variables. How do you recognize an obviously wrong value? Here are a
few warning signs to look out for.
No Change
When you change the value of an input data variable, is there a calculated variable that doesn’t
change? The absence of such a change—in a variable that, according to your Formula Diagram, has a
direct or an indirect relationship with the changed variable—is a sure sign that there’s an error in
some formula path linking the two.
Negative Values
A negative profit is not unusual for a business having a bad sales period. But watch out for times
when your spreadsheet tries to tell you that you have negative sales, or a negative number of
employees.
Extreme Results
To find any errors or inconsistencies in your model, try some extreme values for input variables and
data variables. Do the output values make sense? If not, try to identify the range where they do. It’s
normal for a model to have ranges of values for its input and data variables.
Using Model Management Formulas
As a developer, a useful way for you to enhance a spreadsheet—one that makes it easier to use, and
prevents users from inputting values that don’t make sense—is to include model management
formulas. There are three types of these:
input-validation formulas
model validation formulas
unit-changing formulas.
This section explains each formula in detail.
Unit-Changing Formulas
Input data may come from sources that are outside the control of users. They may also be in
measurement units that don’t match users’ own model: temperatures in Fahrenheit rather than Celsius,
for example, or distances in miles rather than kilometers.
As you use the model, if it’s built with one set of units but data is supplied in another, you have two
options when you refresh the values of the input variables.
The spreadsheet instructs you to convert the values manually to the proper units.
You enter the values as you obtain them, and use unit-changing formulas in the spreadsheet to
convert them to the proper units.
The first method is error-prone and time-consuming. The second method is not only quicker, since it
requires less manipulation, but it also leaves the original data intact and visible—allowing you to
validate the input if needed. However, it’s not quite error-proof. For example, a source may change
its units, thereby rendering the use of the unit-changing formula incorrect.
Should the Formula Diagram show the unit-changing formulas? No, it should not. Unit-changing
formulas do not contribute to the model, and they are usually only needed in specific circumstances
that could change over time.
Your model should use a single-unit system, to avoid user confusion and to facilitate future
maintenance. It’s your choice whether to use metric or imperial, but make sure you don’t have some
variables using one system, and the rest using another.
As you create the spreadsheet, name the cell that contains the changed value. As shown in Figure
12 1, Cell B4 contains the formula to change the units of the source data recorded in Cell C4 from
Fahrenheit degrees to Celsius degrees. It also shows that the cell name Average Temperature refers
to Cell B4.
Variable Definitions
To visually distinguish where variables are defined, it’s a good idea to format them in bold-italic. In
the case of calculated variables, it’s also useful to set them off with a top border to help you
differentiate the variables used in the formula (above the line) from the result (below the line).
Numbers
Formatting cells as you develop a model is a time-wasting activity. Whether for currency (price or
cost), or for values such as quantities, distances, grades or percentages, you should limit yourself to
only the most basic formatting. Use a thousand separator (comma or space, depending on your
country’s standard), and use an appropriate number of decimal places. You should be consistent in
this: if you represent percentages with two decimals, do so throughout the model.
(It’s best not to use any other formatting in the model sheets. You may do some formatting in the
interface and data sheets, if you wish; but save it until the end, so it won’t interfere with your
implementation process.)
Input Variables
If you wish, you may identify Input cells with colour shading. It should be light enough so that the
cell’s content is easily visible, but dark enough so that the colour is visible in a black-and-white
printout.
Conditional Format
This is a way to attract the user’s attention to results when some predefined condition is met. For
example, a high profit may be coloured green, and a loss red.
Rounding
The number of digits to the right of the decimal point is called the value’s precision. Dollar values,
for example, often have a precision of 2, though other currencies may be rounded to three figures.
Excel has two types of rounding: Value Rounding and Display Rounding.
Value Rounding changes the precision of a number, and all subsequent calculations use the
new precision.
Display Rounding affects only the value shown on the screen or in a printout. All subsequent
calculations use the full, unrounded, precision.
The latter type can cause some apparent incoherence, as illustrated in Figure A 8.
Date System
Computer programs use a date system based on an origin date, which associates a serial number with
each date. The program counts the days sequentially, starting at 1 for the origin date. For example, if
January 1, 2015 is your origin date, then January 15, 2015 would be serial number 15, February 1 of
that same year would be serial number 31. December 31 2017 would be serial number 1096 (365, the
number of days in 2015, 366, the number of days in 2016, which is a leap year, plus 365, the number
of days in 2017), and so on.
Excel’s default origin date—serial number 1—is January 1, 1900; but you don’t need to know this, or
the exact serial number of the dates you use. Excel automatically computes their serial number and
stores it as the cell’s value; then (rather than displaying a serial number in your spreadsheet) the
program converts it to text.
Date Display
The way Excel displays a serial number is governed by the cell’s format. When you enter a value that
Excel interprets as a date, it calculates and memorizes a serial number. Then it automatically changes
the cell’s format to display the date in a format that resembles the input value. However, sometimes
Excel’s interpretation is not what you really want; you would rather have the date displayed right
away, as opposed to having to calculate it yourself based on the serial number displayed. In such
cases you can specify another number format.
Building Text Strings
Your spreadsheet might often contain labels identifying periods covered. For example, a sheet of
sales forecasts for the next year may have a cell containing the text Sales Forecast Oct-2015 to Sept-
2016. The next month, this would need to be changed to Sales Forecast Nov-2015 to Oct-2016. This
means you may often have to navigate through the different sheets and manually change the labels,
which is a tedious and error-prone task. So is using Excel’s Search and Replace feature, since this
may inadvertently change some values that shouldn’t be changed.
The solution to this problem is to create a special Labels sheet for labels and titles that change
periodically. This sheet is not part of the model, but you should develop it using the same techniques:
labels should be named and constructed with formulas, using calculated variables as needed. If you
do this, minimal input is required. For example, changing many labels containing Oct-2015 to Sept-
2016 to Nov-2015 to Oct-2016 can be done with just one input value.
Table Lookup
This feature is often used when data is presented in a tabular format, like a price list, a rebate form or
a class roster. In such tables, some columns have unique values: no two lines have the same value. In
a class roster, for example, student names may not necessarily be unique, but student IDs always are.
In a price list, product names and descriptions are usually not unique, but product numbers are. The
column used to identify each row of a table is the key column, and the lookup will be made there.
There are two kinds of lookup: exact lookup and approximate lookup.
Exact lookup is pretty straightforward: you want to look up a student’s name and program
from the Student ID number, or a product’s description, price and available quantity from its
number. If the key column doesn’t contain the lookup value, you need to know this fact so you
can treat the situation with an appropriate formula.
Approximate lookup can be considered as a form of range lookup: the key column indicates
the beginning of each successive range.
As an example, let’s consider a discount system where customers save more by buying a product in
bulk. The discount offered is:
0% for quantities less than 5
4% for quantities of 5–9
6% for quantities of 10–24
8% for quantities of 25–49
10% for quantities of 50–99
15% for quantities of 100 and up.
As you can see in Figure A 9 the key column is named From. If the quantity bought is 15—even
though there’s no actual row with the key value of 15—you’ll use the row identified by 10 to
calculate the discount: 6%. (The To column is not really important, but most tables include it for ease
of reading.)
Multiple-value A variable that represents a set of values, with one value for
variable each instance of its repeating entity.
Name A reference formula that refers to a variable by its defined
reference name.
Output The results the user wants to see.
Physical model The implementation of the information system, which
requires a technical vocabulary and knowledge specific to
that system.
Range name The name used to define either a single spreadsheet cell, or a
set of cells.
Reference A simple formula that refers to where a variable is originally
formula defined.
Repeating The subject of a repeating sub-model that is implemented
entity with each of its instances.
Repeating sub- A portion of a model where each variable represents a set of
model values, one for each instance of a repeating entity.
Sign-changing A way to represent a subtraction, this by including the
reference negative sign directly in the reference formula. It visually
formula indicates to the user when a value is subtracted.
Spreadsheet The layout of a spreadsheet; how information is organized
architecture and presented.
Spreadsheet The stages of a spreadsheet’s life, from its development to
life-cycle testing, maintenance, and end of use.
System analyst The intermediary between the user and the IT specialist, the
analyst produces a conceptual model from information
provided by the user, and translates it into a logical model
that can be understood by the IT specialist.
Three-tier A software engineering technique that consists of separating
architecture three major tasks performed by systems; building them
separately; and connecting them with the appropriate
relationships.
[1]
Brown, P. S., & Gould, J. D. (1987). An Experimental Study of People Creating Spreadsheets.
ACM Transactions on Office Information Systems. “Participants spent 21 percent of their time
pausing, presumably reading and/or thinking, prior to the initial keystrokes of spreadsheet creation
episodes.”
[2]
Panko, R. R. (2015). What We Don’t Know About Spreadsheet Errors Today: The Facts, Why We
Don’t Believe Them, and What We Need to Do. The European Spreadsheet Risks Interest Group
17th Annual Conference. London.
[3]
Daily Express (2011, October 7). Mouchel Profits Blow.
http://www.express.co.uk/finance/city/276053/Mouchel-profits-blow
[4]
Kelso, P. (2012, January 4). London 2012 Olympics: Lucky Few to Get 100m Final Tickets After
Synchronized Swimming Was Overbooked by 10,000.
http://www.telegraph.co.uk/sport/olympics/8992490/London-2012-Olympics-lucky-few-to-get-
100m-final-tickets-after-synchronised-swimming-was-overbooked-by-10000.html
[5]
Durden, T. (2013, December 2). How a Rookie Excel Error Led JPMorgan to Misreport its VaR
for Years. http://www.zerohedge.com/news/2013-02-12/how-rookie-excel-error-led-jpmorgan-
misreport-its-var-years
[6]
Konczal, M. (2013, April 16). Researchers Finally Replicated Reinhart-Rogoff, and There Are
Serious Problems. http://www.nextnewdeal.net/rortybomb/researchers-finally-replicated-reinhart-
rogoff-and-there-are-serious-problems
[7]
IBM Cognos BI and Performance Management. (2009, January). Spreadsheet-Based Planning:
Rough Road Ahead. Information Management.
[8]
Hermans, F. (2012). Analyzing and Visualizing Spreadsheets. PhD Thesis, Technische Universiteit
Delft.