Calibrating Building Energy Simulation Models

Energy & Buildings 253 (2021) 111533
Contents lists available at ScienceDirect
Energy & Buildings

journal homepage: www.elsevier.com/locate/enb
Calibrating building energy simulation models: A review of the basics to

guide future work
Adrian Chong a,⇑, Yaonan Gu a, Hongyuan Jia b,c
a
Department of the Built Environment, School of Design and Environment, National University of Singapore, 4 Architecture Drive, Singapore 117566, Singapore
b
School of Civil Engineering and Architecture, Chongqing University of Science and Technology, Chongqing 401331, China
c
SinBerBEST Program, Berkeley Education Alliance for Research in Singapore, Singapore 138602, Singapore
a r t i c l e i n f o a b s t r a c t
Article history: Building energy simulation (BES) plays a significant role in buildings with applications such as architec-
Received 3 August 2021 tural design, retrofit analysis, and optimizing building operation and controls. There is a recognized need
Revised 17 September 2021 for model calibration to improve the simulations’ credibility, especially with building data becoming
Accepted 28 September 2021
increasingly available and the promises that a digital twin brings. However, BES calibration remains chal-
Available online 04 October 2021
lenging due to the lack of clear guidelines and best practices. This study aims to provide the foundation
for future research through a detailed systematic review of the vital aspects of BES calibration.
Keywords:
Specifically, we conducted a meta-analysis and categorization of the simulation inputs and outputs, data
Building performance simulation
Building simulation
type and resolution, key calibration methods, and calibration performance evaluation. This study also
Calibration identified reproducible simulations as a critical issue and proposes an incremental approach to encourage
Reproducibility future research’s reproducibility.
Optimization Ó 2021 Elsevier B.V. All rights reserved.
Uncertainty
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1. Calibration in building energy simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. Calibration, validation, and verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3. Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4. Aim and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1. Search and eligibility criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2. Study selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Study context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1. Simulation engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2. Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4. Key calibration approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4.1. Optimization-based calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4.2. Calibration under uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2.1. Types and sources of uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2.2. Uncertainty quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3. Analytical tools and techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3.1. Analytical techniques by approach and application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.3.2. Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.4. Multi-stage calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5. Data requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1. Inputs and outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2. Most common observed outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
⇑ Corresponding author.
E-mail address: adrian.chong@nus.edu.sg (A. Chong).
https://doi.org/10.1016/j.enbuild.2021.111533
0378-7788/Ó 2021 Elsevier B.V. All rights reserved.
A. Chong, Y. Gu and H. Jia Energy & Buildings 253 (2021) 111533
5.3. Most common observed inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.4. Mapping calibration parameters to outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6. Calibration performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.1. Current approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.2. Evaluating probabilistic predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.3. Validation using out-of-sample data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.1. Inputs and outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.2. Calibrating urban-scale models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.3. Credibility or absolute predictive accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.4. Reproducible research in BES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Data availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Declaration of Competing Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1. Introduction Making scientific pronouncements about phenomena that can-

not currently be studied experimentally. E.g., Observing
1.1. Calibration in building energy simulation changes in building energy performance considering different
climate change scenarios or a scenario where occupancy and
Building energy simulation (BES) can broadly be defined as a building usage can become sporadic in the event of a pandemic.
physics-based mathematical model that allows the detailed calcu- Using computation to extrapolate existing understanding into
lation of a building’s energy performance and occupant thermal experimentally unexplored regimes. E.g., using BES to create a
comfort under the influence of various inputs such as weather, baseline for quantifying energy savings for buildings with mul-
building geometry, internal loads, HVAC systems, operational tiple interactive ECMs.
schedules, and simulation specific parameters. Originally intended
for use during the design phase, BES is increasingly being used 1.2. Calibration, validation, and verification
throughout a building’s lifecycle [1]. However, there are increasing
concerns about the model’s credibility within the building industry Calibration, validation, and verification are commonly used in
as significant discrepancies between simulated and measured the existing literature to indicate consistency between model pre-
energy use become more apparent with the rapid deployment of dictions and actual observations, which can be misleading since
smart energy meters and the internet of things (IoT) [2]. For they are not synonymous. The semantics of the words calibration,
instance, Turner and Frankel [3] analyzed 121 LEED buildings validation, and verification have been subject to philosophical
and found that measured energy use can be between 0.5 to 2.75 debates because of the paradoxical view that models are a repre-
times the predicted energy use. Mantesi et al. [4] showed that sentation of reality and thus by definition not true [9]. As such,
default settings and methods of modeling thermal mass can result there have been arguments that simulation models can never be
in up to 26% divergence in the simulation predictions. validated but can only be improved through invalidation [10]. Nev-
Previous studies [5,6] observed that the main causes of dis- ertheless, BES models are typically created for practical purposes
crepancies between predicted and actual energy performance such as architectural design, HVAC design and operation, retrofit
stem from: (a) specification uncertainty arising from assump- analysis, building operational optimization, urban-scale energy
tions due to a lack of information; (b) model inadequacy arising efficiency analysis, etc. Therefore, from a computational simulation
from simplifications and abstractions of actual physical building and engineering perspective, the idea of validation is not to estab-
systems; (c) operational uncertainty arising from a lack of feed- lish the truth of a scientific theory but to evaluate and quantify if
back regarding actual use and operation of buildings; and (d) the model is acceptable for its intended purpose.
scenario uncertainty arising from specifying model conditions Within this context, BES calibration, validation, and verification
such as weather conditions and building occupancy. Conse- can be formally defined following the guide by the American Insti-
quently, model calibration is often undertaken to match simula- tute of Aeronautics and Astronautics (AIAA) [11], which is also
tion predictions to actual observations better and increase the compatible with the US Department of Energy’s Advanced Simula-
model’s credibility for making predictions. Although calibration tion and Computing’s (ASC) definitions [8]:
is not essential for BES research, it is becoming increasingly
important for establishing model credibility. The International Calibration: The process of adjusting numerical or physical
Energy Agency’s Energy in Buildings and Communities modeling parameters in the computational model for the pur-
(IEA-EBC) Annex 53 also reported the significance of the devel- pose of improving agreement with experimental data.
opment and application of model calibration and uncertainty Validation: The process of determining the degree to which a
analysis for BES [7]. model is an accurate representation of the real world from
The typical reason for BES calibration is to more confidently the perspective of the intended uses of the model.
predict using simulation. Although predictions may be wrong, they Verification: The process of determining that a model imple-
can still be useful. Trucano et al. [8] list examples of such predic- mentation accurately represents the developer’s conceptual
tions that are also relevant for BES, and they include: description of the model and the solution to the model.
Simulating an experiment without knowledge of its results or 1.3. Related work

prior to its execution. E.g., retrofit analysis that compares the
cost-effectiveness of different energy conservation measures Over the past two decades, three literature review articles [12–
(ECMs). 14] have been published regarding BES calibration. In 2005 as part
2
of an ASHRAE initiated research project (RP-1051), Reddy [12] clas- Table 1

sified calibration methodologies from existing literature into four Search criteria for systematic literature review.
classes: (1) calibration based on manual, iterative, and pragmatic Criteria Description
intervention; (2) calibration based on a suite of informative graph- Keywords [‘‘calibration” OR ‘‘model calibration”] AND
ical comparative displays; (3) calibration based on specific tests [‘‘building performance simulation” OR ‘‘building energy model”
and analytical procedures; and (4) analytical/mathematical meth- OR ‘‘building energy modeling” OR ‘‘building energy simulation”
ods of calibration. In 2014, Coakley et al. [13] extended these clas- OR‘‘building simulation” OR ‘‘energy simulation” OR ‘‘whole
building energy model”]
sifications to include advancements in optimization techniques,
Database Scopus
Bayesian calibration, and alternative modeling techniques such Search 16 January 2021
as meta-modeling. Additionally, a broader definition of calibration date
approaches as either manual or automated was proposed. In the Limit to Year: 2015–2020
Language: English
following year, Fabrizio and Monetti [14] built upon the study by
Document type: Article
Coakley et al. [13] by discussing in further detail the issues affect- Source type: Journal
ing BES calibration. Subject areas: Engineering, Environmental Science, Energy
Journals: Energy and Buildings, Applied Energy, Automation in
Construction, Building and Environment, Solar Energy, Applied
thermal energy, Journal of Building Performance Simulation,
1.4. Aim and objectives Journal of Building Engineering, Building Simulation, HVAC and R
Research
Exclude Subject areas: Material science, Social Sciences, Chemical
Although there have been numerous BES calibration studies
engineering
over the past decade, most studies focused on applying a specific Total number of publications returned: 186
calibration methodology to specific case study buildings. Com-
bined with the lack of open code and data, BES calibration remains
challenging to replicate. Additionally, as described in the preceding 2.1. Search and eligibility criteria
paragraph, existing review articles focus on providing an overview
of current calibration methodologies. However, proper specifica- Table 1 presents the search strategy used to identify relevant
tion of model inputs and outputs is equally important. To date, publications from the Scopus database. The keywords ‘‘model cal-
there has been little quantitative analysis about model inputs ibration” and ‘‘building energy simulation” were used to identify
and outputs, calibration methods, and the criteria for evaluating an initial list of publications. To capture as many relevant publica-
calibration performance. The determination of all these details is tions as possible, synonyms that are interchangeable with ‘‘model
highly subjective, often requiring a high level of expertise, experi- calibration” and ‘‘building energy simulation” in the BES literature
ence, and domain knowledge. Despite its importance, there is little were included in the search string.
guidance on best practices to facilitate BES calibration. The initial search returned 2,762 publications. Limiting the
With the aim of enhancing reproducibility and enabling others search to English journal articles published between 2015 and
to build upon published work more easily, the objectives of this 2020 resulted in 781 publications. We limit the review to the
review article are to: immediate past six years to reflect recent trends and state-of-
the-art in BES. Additionally, the most recent review paper for BES
Synthesize relevant BES literature and the relationship between calibration was in the year 2014 [13] and 2015 [14].
various model inputs and outputs. Further refinements to the search criteria were made by includ-
Perform a detailed meta-analysis of the calibration methods ing relevant subject areas (Engineering, Environmental Science,
and measures of calibration performance currently utilized in and Energy) and explicitly excluding irrelevant subject areas
the existing literature. (Material science, Social Sciences, and Chemical engineering).
Provide recommendations to facilitate reproducibility in BES. These criteria excluded 338 studies and left 443 for the review.
The titles and abstracts of the 443 studies were subsequently
We believe that meeting these objectives will provide a solid screened to identify relevant publications and their corresponding
foundation and platform for future research to advance the current source journal, yielding 186 publications.
state of BES calibration. This is also the first systematic review on
the subject.
In Section 2 we describe the methodology of our systematic
review. Section 3 contextualizes the review with an overview of
2.2. Study selection
the simulation engine used, and the location of case studies. Sec-
tion 4 describes the state-of-the-art calibration approaches, includ-
The full papers of the 186 publications were subsequently
ing a comparison against the 2014 review by Coakley et al. [13].
screened for relevance to this review based on the following crite-
Section 5 analyzes the relationship between the inputs and outputs
ria: (1) the study involved the use of building energy simulation;
used for calibration. Section 6 summarizes the metrics commonly
(2) the study contains the application of calibration methods or
used to evaluate calibration performance. Section 7 discusses the
techniques; and (3) the study is not vague on the proposed calibra-
significant findings and identifies areas for future research. Sec-
tion approach and is not ambiguous on the model input(s) and out-
tion 8 concludes the paper.
put(s). Of the 186 studies, 107 were selected for the review.
2. Method
3. Study context
A systematic review was adopted to provide a comprehensive
and unbiased summary of evidence on calibration methods and This section provides the context to this review by summarizing
techniques, model inputs/outputs, and calibration performance the 107 selected papers regarding the simulation engine used and
metrics. the location of the calibration case studies.
3
Fig. 1. Geographic distribution of case study buildings and the corresponding scale of the simulation (component/system, building, or urban) (top plot) and the distribution of
the case study buildings based on the köppen climate zones (bottom plot.
3.1. Simulation engine white box simulation engines (EnergyPlus, TRNSYS, and DOE-2), an
RC network is a gray-box model that combines simplified physical
A majority of the papers reviewed (60%) used EnergyPlus for a representations of the building with operation data that are used to
variety of reasons (Table 2). EnergyPlus is an open-source whole- identify the model’s coefficients [19]. The benefits of an RC net-
building energy simulation engine that has been and continues work lies in having physical descriptions of the building while
to be supported by the U.S. Department of Energy (DOE) [15]. being computationally more efficient than white-box models. Fur-
Moreover, EnergyPlus supports many application software [16]. ther easing implementation, the development of RC models may
15% and 7% of the papers reviewed use TRNSYS and DOE-2 respec- also follow the well-established international standard ISO
tively. TRNSYS [17] is a transient systems simulation program that 13790:2008 [20] that was subsequently revised by ISO 52016–
is designed to provide flexibility in conducting energy simulations 1:2017 [21].
through a modular structure and extensive add-on component
libraries. DOE-2 [18] is a building energy simulation program that
3.2. Location
performs hourly simulation given descriptions of the building lay-
out, constructions, operating schedules, HVAC systems, and utility
Fig. 1 shows the geographic distribution of the case study build-
rates.
ings extracted from the papers reviewed across various simulation
What stands out in Table 2 is that Resistance–Capacitance (RC)
scales (top map plot) and within each köppen climate zones (bot-
networks (also known as lumped parameter models) were used in
tom bar-plot). From the figure, it is apparent that a majority of case
10% of the studies reviewed. Unlike the other three commonly used
study applications are located in the U.S. or Europe, with several in
4
Table 2 at the urban-scale and the remaining 8%1 for the calibration of a
Simulation engines used in the reviewed calibration applications (N ¼ 107). single building component or system. A further observation that
Simulation Engine Type Percentage of emerged from the data was that urban-scale case studies are located
Papers Reviewed only in the U.S. (54%), Europe (34%), and the Middle East (12%). None
EnergyPlus white-box 60% of the urban-scale studies were located in the tropics.
TRNSYS 15%
DOE-2 7%
Resistance–Capacitance (RC) Gray-box 10% 4. Key calibration approaches
Others NA 8%
In general, calibration approaches can be classified as either
manual or automated [13]. Automated approaches employ some
form of computerized processes to tune model parameters by max-
imizing the model’s fit to observations. In contrast, manual
China and South Korea. These applications are situated between approaches rely on iterative pragmatic intervention by the mod-
latitude 30 N and 65 N. Consequently, 96% of the studies belong eler. The number of papers utilizing an automated calibration
to an arid (dry), temperate (mild mid-latitude), or continental (cold approach has approximately tripled when comparing this review
mid-latitude) climate group. Only 4% of the applications were in to the review by Coakley et al. [13] in 2014 (Fig. 2).
the equatorial region characterized by a warm and humid climate In this review, a majority of the automated approaches employ
all year round. either mathematical optimization (58.5%) or Bayesian calibration
An inspection of the data in Fig. 1 reveals that over three- (33%), with several using sampling methods to select a subset of
quarters of the case studies are at the building scale (77%). 15% models with the best fit (8.5%). To aid future applications, Table 3
provides a list of packages, libraries, code repositories, and applica-
tions for sensitivity analysis, optimization and Bayesian
calibration.
4.1. Optimization-based calibration
Genetic algorithm (GA) [34–42], particle swarm optimization

(PSO) [43–51], and the Hooke-Jeeves (HJ) algorithm [52–55,51]
are the most widely used algorithms for optimization-based cali-
bration. Both GA and PSO belong to the class of evolutionary algo-
rithms that are population-based with a metaheuristic
characteristic. Specifically, the non-dominated sorting genetic
algorithm II (NSGA-II) algorithm has been extensively applied for
the optimization-based calibration of BES models [34–39] because
of its ability to obtain a better spread of solutions and convergence
than other multi-objective evolutionary algorithms [56]. Proposed
by Kennedy and Eberhart [57], PSO optimizes via swarm intelli-
gence and is inspired by the social behavior of organisms in groups
such as a bird flock or a fish school. Lastly, the HJ algorithm [58]
belongs to the family of generalized pattern search (GPS) algo-
rithms and has gained popularity in BES because the number of
function evaluations increases only linearly with the number of
design parameters [24].
Fig. 2. Comparison of the type of calibration approach (Automated or Manual) in A common feature of the GA, PSO, and HJ algorithm is that they
this review with that by Coakley et al. (2014) [13]. are all gradient-free. Therefore, they are suitable for optimization
Table 3
Applications, R packages, Python libraries and code repositories for performing sensitivity analysis, optimization, and Bayesian inference algorithms on building energy
simulation models.
Name Type Language Method(s) Ref.
Sensitivity Analysis
sensitivity CRAN R SRC, SRRC, PRCC, Morris, FAST, Sobol [22]
SALib PyPI/GitHub Python Morris, FAST, Sobol [23]
Optimization-based calibration
GenOpt Application Java GPSHJ, PSO [24]
jEPlus Application Java NSGA-II [25]
DEAP PyPI/GitHub Python NSGA-II, PSO [26]
ecr CRAN/GitHub R NSGA-II, PSO [27]
Bayesian calibration
SAVE CRAN/GitHub R Bayesian emulation, calibration, and validation following Bayarri et al. [28] with roots in Kennedy and [31]
O’Hagan [29] and Higdon et al. [30]
bc-stan GitHub R Bayesian emulation and calibration following Kennedy and O’Hagan [29] and Higdon et al. [30] [32]
pySIP GitHub Python Bayesian emulation and calibration for continuous time stochastic state-space (e.g. RC networks) [33]
Abbreviations: PyPI (Python Package Index); CRAN (Comprehensive R Archive Network); SRC (Standardized Regression Coefficients); PRCC (Partial Rank Correlation Coef-
ficient); SRRC (Standardized Rank Regression Correlation); FAST (Fourier Amplitude Sensitivity Testing); GPSHJ (Generalized Pattern Search Hooke Jeeves); PSO (Particle
Swarm Optimization); NSGA-II (Non-dominated sorting genetic algorithm II);
5
Optimization has also been used for the calibration of building

components and sub-systems such as models of BIPV [48], absorp-
tion thermal energy storage [67], and components of the air-
handling unit [49]. Likewise, optimization has been used to cali-
brate urban-scale building energy models (UBEMs). Santos et al.
[68] calibrated 56 buildings in a district using GA while consider-
ing the urban heat island effect. Zekar and Khatib [69] applied opti-
mization for the calibration of an urban-scale RC model. Since
numerical optimization can be computationally impractical at
the urban-scale, optimization-based calibration of UBEMs often
Fig. 3. Forward approach to uncertainty quantification by propagating input
uncertainties to obtain uncertainties in the output of interest. involves first parameter reduction in the form of day typing, zone
typing, and the use of archetypes.
frameworks that minimize a cost function that needs to be evalu- 4.2. Calibration under uncertainty
ated by an external BES program. Additionally, population-based
metaheuristic algorithms such as PSO and GA initialize the opti- Uncertainty is an inevitable characteristic of BES models
mization with a population of randomly distributed points to because of the complexity and interactions between different
reduce the risk of converging to local minima. However, situations building systems. In many building energy applications, uncer-
of falling far from the pareto-optimal front can be hard to detect, tainty management is an important aspect when accounting for
and therefore defining a stopping criterion is difficult. Although risk in the decision-making process. It is somewhat surprising that
guidelines [59–61] specifying thresholds for accuracy metrics such only 32 of the 107 (30%) papers reviewed involved some form of
as CV(RMSE) and NMBE are often used, it has been shown that uncertainty quantification during model calibration.
these are not proper stopping criteria for optimization-based cali-
bration [38]. Nonetheless, minimizing CV(RMSE) was also found to 4.2.1. Types and sources of uncertainty
be the most robust cost function under different combinations of In general, uncertainty can be classified as either aleatory or
error metrics, calibration output, and calibration dataset time res- epistemic [70,71]. Aleatory uncertainty (or irreducible uncertainty)
olution [38]. Constraints in the form of a specified range for the is the uncertainty caused by inherent variations or randomness of
modeling parameters are often added to prevent unreasonable val- the building system or sub-system under investigation that cannot
ues [62]. be explained by the data collected. In contrast, epistemic uncer-
Optimization-based calibration has been widely applied in BES tainty (or reducible uncertainty) is the uncertainty that arises from
(Fig. 2). A prominent example is the Autotune project that aims to a lack of knowledge (or data). The distinction between aleatory and
replace manual calibration with a calibration method that lever- epistemic uncertainty has merit in guiding the uncertainties that
ages supercomputing, large databases of simulation results, and have the potential of being reduced [72]. However, developing
an evolutionary algorithm to automate the calibration process BES models involves a significant degree of subjectivity that
[63,64]. Sun et al. [65] proposed a pattern-based optimization depends on the data available that may also evolve throughout a
approach that determines the parameters to tune based on the building’s lifecycle. As a result, most uncertainties are often a com-
identified bias in monthly utility bills. Yang and Becerik-Gerber bination of both aleatory and epistemic uncertainty, making it dif-
[66] performed independent single objective optimization at the ficult to distinguish between the two.
component, zone, and building level. The union of the independent Related to the types of uncertainty is the identification and clas-
solution sets is then used for the subsequent multi-objective sification of uncertainty by their sources, which forms an impor-
optimization. tant part of a comprehensive uncertainty quantification
Table 4
Analytical tools and techniques that were used to support the calibration process applied in the papers reviewed. Adapted from [13].
Acronym Name Description

SA Sensitivity analysis Used to provide insights on how variations in uncertain inputs map onto the outputs. Can be used to identify non-
influential parameters that are ignored during calibration or to help set priorities for future efforts (e.g. identify
important parameters for measurements or detailed investigation).
HIGH High-res data Utilizes data at hourly (or sub-hourly) resolutions as opposed to daily or monthly temporal resolution data
AUDIT Detailed audit Conducting detailed audits to gain a better knowledge of the building systems or sub-systems.
UQ Uncertainty quantification The assessment of parameter uncertainty as part of the calibration process.
EXPERT Expert knowledge/ templates/ Approach which utilize expert knowledge or judgment as a key element of the process. Often involves the use of
model database databases or templates of typical building parameters and components to reduce user input requirements.
PARRED Parameter reduction Involves reducing the number of model parameters. Examples include day-typing (reducing detailed schedules into
typical day-type schedules), zone-typing (aggregating spaces with similar thermal zones) or building archetypes
(grouping buildings into representative archetypes for urban-scale model calibration)
BASE Base-case modeling The use of measured base-loads to calibrate the building model. Calibration is carried out during the base-case when (a)
heating and cooling loads are minimal and the building is dominated by internal loads, thus minimizing the impact of
weather-dependent variables, or (b) internal loads are minimal and the HVAC system is not operating to better
characterize weather dependent variables such as the building envelope when internal temperatures are free-floating.
EVIDENCE Evidence-based model Approaches that implement a procedural approach to model development, making changes according to source
development evidence rather than ad hoc intervention. Often requires model development version controls to keep track of the
changes.
SIG Signature analysis The use of graphical analysis techniques to identify the impacts of different model parameters on the output of interest.
STEM Short-term energy monitoring On-site measurements for a short period of time. Typically used to identify typical energy end-use profiles and/or base-
loads.
INT Intrusive testing Intrusive techniques involving interventions in the operation of the actual building.
6
framework [71,73]. In BES calibration, the sources of uncertainty any model inadequacy or bias (Eq. 2). There have been several
can be classified as follows: applications of KOH’s approach in the field of BES calibration
[79–87], including a detailed guideline for its application in the
Parameter uncertainty: Uncertainty associated with influential field [32].
model inputs that are not known with certainty.
Model form uncertainty: Model discrepancy (also called model yðxÞ ¼ gðx; tÞ þ dðxÞ þ ðxÞ ð2Þ
inadequacy) that results from all assumptions, conceptualiza-
tions, abstractions, and approximations of the real-world phys- where, yðxÞ is the observed field measurement, gðx; t Þ is the output
ical processes. of the BES given observable inputs x and calibration parameters
Observation uncertainty: Uncertainties that result from obser- t; dðxÞ is the model inadequacy, and ðxÞ is the observation errors.
vation errors. Other noteworthy approaches include Bayesian hierarchical
modeling for the calibrating of urban-scale building energy models
4.2.2. Uncertainty quantification [88,89] and sequential updating taking advantage of Bayes theo-
Uncertainty quantification in BES can be broadly categorized as rem to keep the model up to date without losing past knowledge
either forward or inverse. Forward approaches quantify uncer- [85,90].
tainty in the model output(s) by propagating them from uncertain- Given the complexity of BES models, posterior distributions
ties in the model parameters (Fig. 3). Statistical sampling often cannot be derived analytically. Consequently, Markov chain
techniques such as Monte Carlo simulation or Latin Hypercube Monte Carlo (MCMC) is often used in Bayesian calibration to sam-
Sampling are easy to apply and the most common in the field of ple from the posterior distributions because of its flexibility and
BES [74–78]. straightforward application to complex problems. However, it is
Inverse approaches involve quantifying various sources of well known that performing Bayesian inference via MCMC is com-
uncertainties given a set of observations from the building system putationally expensive, especially when likelihood evaluations
being modeled. In particular, the calibration paradigm known as involve computationally expensive models such as in the case of
Bayesian calibration has gained popularity in BES due to its ability b also known as the potential
BES. The Gelman-Rubin statistic ( RÞ,
to naturally incorporate uncertainty and combine prior informa- scale reduction factor, is often used to determine if convergence
tion with measured data to derive posterior estimates of the model to a stationary distribution has been achieved [88,91,92,81,32].
parameters (Eq. 1). To alleviate the high computation cost of Bayesian inference,
metamodels have been proposed as surrogates of the energy
pðtjyÞ / pð yjtÞ pðtÞ ð1Þ
model. Gaussian processes (GP) [79,80,32,91,81,82,93,85,86] and
A notable approach within the Bayesian calibration paradigm is linear regression [94,95,83,96,87] are the most popular with com-
the formulation proposed by Kennedy and O’Hagan (KOH) [29]. peting trade-offs between computation cost and accuracy. Addi-
KOH’s approach differs from traditional approaches by allowing tionally, more efficient MCMC sampling strategies such as
for various sources of uncertainty and attempting to correct for Hamiltonian Monte Carlo (HMC) [81,97] and Approximate Baye-
Fig. 4. Analytical techniques (for both manual and automated calibration approaches) used in this review (top plot) and the review by Coakley et al. (2014) [13].
7
Fig. 5. Analytical techniques used in the papers reviewed grouped by the corresponding spatial scale (left plot) and calibration approach (right plot). Calibration processes
can involve more than one analytical technique. Therefore, the values do not add up to the total number of papers reviewed (N = 107).
sian Computation (ABC) methods [98] have also been proposed to monly applied in automated approaches. Comparatively, SA is
reduce computation cost. not as widely used in manual calibration approaches.
Moving on to model calibration at different spatial scales, it can
be observed that SA, high-resolution data, UQ, and building audits
4.3. Analytical tools and techniques
are prevalent at the building-scale. On the contrary, parameter
reduction and expert knowledge are the predominant analytical
Analytical tools and techniques are often applied to both man-
techniques for urban scale model calibration efforts. Parameter
ual or automated calibration approaches. Coakley et al. [13] list
reduction aims to reduce the number of model inputs by character-
these techniques with detailed explanations. Table 4 presents a
izing and grouping similar inputs to reduce the complexity of the
subset of the techniques [13] that is relevant to this review. As
model while preserving the final decision based on the full set of
can be seen from Table 4, We do not extend the classifications pro-
parameters. Well-known examples of parameter reduction
posed in [13] but augment their descriptions so that it encom-
techniques in BES are day-typing (grouping schedules with similar
passes the publications reviewed.
profile) and zone-typing (grouping similar thermal zones) [13]. At
the urban-scale, archetypes are commonly used to reduce the
4.3.1. Analytical techniques by approach and application number of model inputs and therefore the effort and cost of mod-
Fig. 4 provides an overview of the number of papers employing eling distinct buildings [95,100,89,87,92,101,102].
a certain analytical technique to assist or complete the calibration Archetype generation involves two steps, segmentation fol-
process. What stands out in the figure is that the application of lowed by characterization [103]. Segmentation divides buildings
sensitivity analysis (SA) and the use of high-resolution data (HIGH) with similar characteristics based on key parameters such as build-
have the highest frequency. By contrast, in the review by Coakley ing type, construction year or period, floor area, building height,
et al. [13], SA and the use of high-resolution data are not common and/or shape (if geometry data is not available) [95,100,89,87].
analytical techniques that form a part of the calibration process. What follows is the characterization of building construction and
The increase in the use of high-resolution data could be attrib- operation properties based on expert knowledge that involves
uted to the proliferation of IoT devices and sensor networks in deriving input values from existing databases, building codes and
buildings making hourly and sub-metered data more readily avail- standards, and representations of national building typologies
able for calibration. The increase in the use of SA could be associ- (also referred to as reference buildings). For example, several stud-
ated with the growth in the utilization of automated approaches ies [89,92,101] modeled construction properties (inferred from
(Fig. 2). This is further corroborated by Fig. 5, which shows the construction year) using information from the TABULA (2009–
breakdown of analytical techniques according to the calibration 2012) and EPISCOPE (2013–2016) projects [104,105] that were
approach (manual or automated) and the spatial scale (compo- aimed at providing national residential building typologies for var-
nent/system, building, or urban). The results, as shown in Fig. 5 ious European countries. Another example is the use of the U.S.
indicates that sensitivity analysis (SA), high-resolution data, uncer- Commercial Buildings Energy Consumption Survey (CBECS) and
tainty quantification (UQ), and building audits are the most com-
8
Fig. 6. Types of sensitivity analysis used in building energy simulation split by automated vs manual calibration approaches.
the U.S. Residential Energy Consumption Survey (RECS) databases ple sizes for accurate approximations of the sensitivity indices.
to derive detailed information on the construction and operation Suppose that there are t uncertain parameters, the approximate
of the buildings (e.g. insulation levels, internal loads and schedules, number of model evaluations required is approximately t for per-
mechanical systems, and hot water consumption) [106,95]. Like- turbation local SA methods; 10 100t for screening methods;
wise, Chen, Deng and Hong [102] derived input values based on 100 1000t for regression and RSA methods; and > 1000t for
the minimum energy efficiency requirements in the California’s variance-based methods [125]. Consequently, metamodels, surro-
building energy efficiency standards Title 24 while Krayem et al. gates or emulators are typically used in place of computationally
[107] defined internal loads and schedules following the ASHRAE expensive simulation runs required by computationally demand-
90.1 Standard. ing SA methods [125]. Specific choices of metamodels may also
provide sensitivity measures that can be used to rank model
4.3.2. Sensitivity analysis parameters according to their influence on the output of interest.
The results of this review confirm the close association between Examples include the use of random forest variable importance
sensitivity analysis (SA) and automated model calibration pro- [96,91,94] and estimates of marginal posterior using Gaussian pro-
cesses (Fig. 6). Only about 20% of the papers utilizing manual cesses [82].
approaches employ SA in contrast with 65% of the papers for auto- Screening methods are popular due to their low computation
mated approaches. A possible explanation is that equifinality cost compared to other global SA methods, making it suitable for
issues are especially challenging for automated approaches since BES models that are typically non-linear with high-dimensional
objective functions are normally designed to minimize discrepan- parameter space. The method of Morris [126] is the most estab-
cies between simulated and observed responses. This might pro- lished and widely used screening method. Sampling for the Morris
duce a model with a higher prediction accuracy, but it might not method is carried out by randomly selecting r starting points that
inform the modeler about the true parameter values [108]. In con- are perturbed One-at-A-Time (OAT). The computation cost is
trast, manual approaches adjust the calibration parameters based therefore rðt þ 1Þ for t model parameters. A measure of global sen-
on heuristics that are based on the expertise of an experienced sitivity is commonly obtained using the r trajectories to compute
modeler. Of the studies that employed a manual approach without the mean l [126] or the modified mean l proposed by [127]. In
sensitivity analysis, the dominant (86%) analytical techniques general, most studies rely on graphical plots of l (or l ) and r
employed include conducting detailed audits [109–115], utilizing for better interpretability when screening out non-influential
expert knowledge or judgment [116–118,100,119,106,107], imple- parameters [79,88,81,32,85,52,128,36,66,37,49,39,50]. Others con-
menting an evidence-based approach[120–122,100,112,123,124], sider only l (or l ) to rank and identify dominant parameters
or using high-resolution data [120–122,100,112,124]. [84,86,101,47,78,45].
It is evident from Fig. 6 that global sensitivity analyses are more Perturbation methods are the simplest type of SA and involve
commonly used in automated approaches. A possible explanation varying (perturbing) the model inputs from their base or nominal
is the ability of global methods to provide an overall view of the values One-At-a-Time (OAT). Compared with global SA methods,
importance of different inputs while considering their interactions the advantages of perturbation methods include (1) its ease of
[125]. Specifically, screening methods (46%) are the most popular application and interpretation [129], and (2) requiring the least
followed by perturbation (23%), regression (13%), metamodel number of model evaluations [125]. The order of influence is often
(10%), variance (4%), and regional sensitivity analysis (RSA) (4%) used to identify a subset of parameters to be calibrated
(Fig. 6). [130,65,131–133,128,134,135]. Sun et al. [65] illustrate this clearly
In this review, variance-based SA methods are not common by using parametric perturbation to identify a priority list of 17
because they are computationally demanding requiring large sam- calibration parameters that would be adjusted within a pattern-
9
4.4. Multi-stage calibration
Supporting multi-stage calibration through a combination of

data from building information models (BIM), as-built documents,
on-site audits, occupancy sensors, indoor environmental quality
(IEQ) sensors, the building management system (BMS), and
metered HVAC component energy consumption may not be a far-
fetched reality that is only possible for state-of-the-art buildings as
building data becomes more easily available and accessible. This
review found that more than 90% of the papers reviewed calibrated
the model against a maximum of one (62%) or two (29%) outputs.
About 8% of the studies calibrated the model against three outputs,
while only 1% performed the calibration using three outputs.
Multi-stage calibration is of interest because it is often pro-
posed to more accurately represent the building being modeled.
Since calibration is an under-determined problem [32], it is possi-
ble for a model that is calibrated at a coarser spatial or temporal
level to meet the most stringent error thresholds without accu-
rately representing the building at finer spatial or temporal levels
Fig. 7. Schematic illustrating the calibration process given that the simulation is a [37,139,99]. It has been argued that it is crucial to achieve simulta-
representation of reality. neous accuracy at multiple levels of the simulation to correctly
provide insights at the respective levels. For instance, Yang and
Becerik-Gerber [66] asserts that building-level accuracy is needed
based automated calibration framework. On the other hand, stud- to provide insights on overall energy performance but an ECM level
ies have also applied perturbation methods after the calibration to accuracy would be needed for estimating the energy savings
investigate possible causes for remaining discrepancies between potential of different ECMs. Similarly, Li et al. [140] showed using
simulated and measured data [136] or to determine if the cali- statistical hypothesis testing that an energy model calibrated for
brated model is robust to uncertainties in particular input factors one ECM cannot be used to accurately cross-estimate the energy
[137]. consumption of another ECM.
Regression methods refer to the use of regression or correlation Our review found that calibrating the model with data of the
coefficients to derive information about output sensitivity to vari- building under free-floating conditions is a dominant feature of
ations in input uncertainty. The input/output dataset is often gen- multi-stage approaches [76,141,51,50]. Such a base-case method
erated using Monte Carlo simulation or Latin hypercube sampling. is often employed because the number of uncertain parameters
Several types of regression or correlation coefficients have been is substantial reduced when there are little or no internal loads
used as sensitivity measures in building energy analysis [129]. (in particular occupancy) and the HVAC system is not operating.
The choice depends on linearity and monotonicity assumptions Additionally, it is well-known that occupancy is a highly uncertain
between the inputs and output [125]. In this review, we found that model input [142] with significant influence on the predictive
Standardized Regression Coefficients (SRC) was most commonly accuracy of a calibrated energy model [140,143,144]. Conse-
applied [94,62,91,96] with some studies employing partial rank quently, parameters like envelope material thermal properties
correlation coefficient (PRCC) [46] and standardized rank regres- and infiltration rates are more straightforward to identify during
sion correlation (SRCC) [138]. periods when the building is free-floating (See Section 5.4).
Fig. 8. Classification of model inputs when calibrating an energy model.
10
Fig. 9. Most common observed output used for calibrating building energy simulation models split by the temporal resolution used for the calibration and the scale of the
energy model (i.e., calibration of Component/ System, Building, or Urban scale building energy models).
5. Data requirements assumed to be responsible for the discrepancy between simulation

and measured output(s) are then calibrated. Fig. 7 illustrates the
5.1. Inputs and outputs typical calibration process with reference to Fig. 8. In this diagram,
observed input(s) refers to both input parameters and variables
As illustrated in Fig. 7, BES models represent aspects of reality whose value could be determined directly or derived/estimated
that are manipulated and experimented with using a variety of from available evidence or data. The model is just a representation
simulations [145,146]. Therefore, the goal is to have models ade- of reality and the selected calibration parameters are tuned to
quately represent actual building performance over a sufficiently match simulated to observed output(s).
wide range of inputs that encompasses the simulation aim and
thus application. Since BES models are complex computer models 5.2. Most common observed outputs
with many inputs and outputs, a comparison of the input and out-
put mapping is carried out through a meta-analysis of the existing It is apparent from Fig. 9 that most of the studies were con-
literature. To avoid confusion, we refer to the following input clas- ducted at the building scale. A clear trend of decreasing temporal
sification scheme (Fig. 8) for model calibration. data resolution as we move from component/system to building
First, model inputs can be observed or unobserved. We use the to urban scale building energy simulations can also be observed.
verb observed/unobserved instead of the adjectives observable/ A closer inspection of the figure shows that models of building
non-observable because what is observable may differ across dif- components and sub-systems are often calibrated using sub-
ferent calibration cases. We also make a distinction between the hourly or hourly data regardless of the type of outputs used for
model’s variables and parameters. Variables refer to inputs to the the calibration. Additionally, the use of HVAC energy was the most
model that varies over time and are not always observable. By con- common [81,51,148,49].
trast, parameters do not relate to time-varying values but to quan- Compared with components and sub-systems, electricity and
tities that influence the output or behavior of the model. In some dry bulb temperature are most frequently used for the calibration
contexts, a quantity could be either a variable or a parameter of energy models at the building-scale. Specifically, the adjustment
depending on how it is modeled. Window opening, for example, of parameters using indoor air temperature is often carried out
could be modeled as a variable using a schedule to define when during free-floating periods when the indoor temperatures are
the window is open/close or parameterized to be based on occu- allowed to float without any HVAC system intervening. Conse-
pant comfort [147]. Out of the unobserved parameters, those quently, outdoor air temperature is also frequently monitored con-
11
Fig. 10. Most common observed inputs used for calibrating building energy simulation models. The color indicates the class of the model parameter.
Fig. 11. Most common calibration parameters used for calibrating building energy simulation models split by sensitivity analysis and calibration approach.
currently to provide the boundary conditions of the simulation tion. In contrast, calibration against electricity and gas/steam
[79,149,150,43,110,134,120,36,45,76,151,37,97,141,152,50]. How- energy is generally carried out with monthly resolution data.
ever, what stands out in building scale studies is that calibration Turning now to urban-scale building energy models (UBEM),
against indoor dry bulb temperature [149,150,43,54,110,134, 137, about two-thirds of the studies used monthly or annual electricity,
35,131,120,55,45,122,76,37,97,38,77,78,50,124,90], HVAC energy gas/steam, total load/energy, and/or cooling load/energy for the
[135,139,121,66,48,153], and equipment electricity consumption calibration. The use of monthly or annual measurements for the
[53,134,143,139] is almost always carried out at an hourly resolu- calibration is not surprising because using higher resolution data
12
Fig. 12. The magnitude of the relationship between the calibration parameters and their corresponding observed outputs for calibrating building energy simulation models.
might be computationally intractable at the urban-scale. Addition- a significant influence on a building’s energy use [155]. In this
ally, UBEM studies often utilize utility data that are only available review, infiltration rates are typically derived from airtightness
at a monthly resolution [95,40,107]. values that are obtained using the blower door test
[156,34,111,36,45,37,100,113].
5.3. Most common observed inputs
5.4. Mapping calibration parameters to outputs
Fig. 10 provides an overview of the most commonly used
observed inputs. What stands out from Fig. 10 is the obvious use Fig. 11 lists the model parameters most commonly adjusted to
of local weather data (dry bulb temperature, solar radiation, rela- match simulation output to the measurements faceted by the type
tive humidity, wind speed and direction) as observed inputs to of sensitivity analysis conducted and whether the calibration uti-
the model. If local site measurements are not available, an annual lized an automated or manual approach. The figure reveals two
meteorological year (AMY) weather file from the nearest weather interesting observations. First, SA, especially global SA, is less likely
station is used. These observations indicate the importance of to be used when the calibration involves using schedules (occu-
using actual weather data for the calibration since the weather file pant, equipment, lighting, and HVAC operation). Second, auto-
forms the energy simulation’s boundary conditions. The weather mated calibration approaches tend to calibrate parameters such
file’s importance was also demonstrated in previous research that as material properties, infiltration rate, and internal load densities
showed that the annual building energy consumption and the compared to schedules. By contrast, manual approaches are
monthly building loads could vary by 7% and 40%, respectively, equally likely to calibrate material properties and schedules.
based on the provided weather data [154]. Fig. 12 shows the magnitude of the relationship between the
Interestingly, several studies used measured indoor environ- most commonly used calibration parameters and their correspond-
mental conditions as inputs to the model to obtain a model that ing observed outputs. The mapping reveals that parameters con-
is better calibrated at the zone level. For instance, Mihai and cerning the building envelope (material properties and
Zmeureanu [137] showed that using measured indoor air temper- infiltration rate), internal gains (occupant, lighting, and equipment
atures in place of those from the technical specification led to more power density), and zone cooling and heating setpoints are often
accurate predictions of zone airflow rates. Yin, Kiliccote, and Piette adjusted when calibrating the energy model to building electricity
[139] used air temperature and airflow measurements to derive energy consumption. Closer inspection of the first column of
the zone thermostat setpoint and the VAV box minimum/maxi- Fig. 12 shows that HVAC component efficiency and zone outdoor
mum airflow respectively. Infiltration rate is sometimes derived air levels were also calibrated in a considerable number of papers.
from measurements because it is highly uncertain and can have Not surprisingly, hot water usage was also adjusted in several
13
Table 5 since BES models are often deemed ‘‘calibrated” if they meet the
Metrics used for the evaluation of calibration performance. Each paper may employ CV(RMSE) and NMBE limits (Table 6) specified by ASHRAE Guide-
more than one metric when assessing calibration performance. Therefore, the
cumulative sum for the column ‘‘No. of Papers” is greater than the total number of
line 14 [59], the International Performance Measurement and Ver-
papers reviewed (N = 107). ification Protocol (IPMVP) [60], or the Federal Energy Management
Program (FEMP) [61]. Interestingly, approximately half of the
Metric Acronym No. of
Papers
papers reviewed (51%) used two metrics in their evaluation. 24%
utilized one metric while 19% used three metrics simultaneously.
Coefficient of Variation of the Root Mean Square CV ðRMSEÞ 72
Error
The remaining 7% of the papers reviewed used either four or five
Normalized Mean Bias Error NMBE 59 metrics for the assessment.
Root Mean Square Error RMSE 20 CV(RMSE) (Eq. 3) provides an indication of how close the sim-
Coefficient of Determination R2 12 ulation predictions are to measured data while NMBE (Eq. 4)
Goodness of Fit GOF 11 serves as an indicator of overall bias in the simulation predic-
Annual Percentage Error APE 8
tions. However, NMBE suffers from cancellation between positive
Coefficient of Variation CV 4
Mean Absolute Percentage Error MAPE 4 and negative bias which can lead to misleading interpretations of
Mean Absolute Error MAE 3 predictive performance [160]. This review also confirms the find-
Gelman-Rubin statistic b
R 3 ings of Ruiz and Bandera [161] that the NMBE acronymn is often
Othersy – 18 erroneously referred to as MBE even though the formula is cor-
y Metrics with 6 2 counts. rect (i.e., MBE (%) = formula for NMBE). NMBE is MBE normalized
by the mean of the observed values so that they are comparable.
Several papers also utilize RMSE, which provides a measure of the
studies that were calibrating models of residential typologies variability of the residuals and is the non-normalized form of CV
[157,95,130,158]. About six to seven articles calibrated the internal (RMSE).
loads’ schedule [157,143,65,40,159,102,107]. sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn 2
Fig. 12 reveals several other interesting observations. First, the 1 i¼1 ðmi si Þ
CV ðRMSEÞ ¼ 100 ð3Þ
same calibration parameters were used when calibrating against
m np
both building electricity and gas/steam energy consumption. Sec-
ond, the parameters calibrated when matching simulation predic- Pn
1 i¼1 ðmi
si Þ
tions to total building load/energy are somewhat similar, except NMBEð%Þ ¼ 100 ð4Þ

m np
that equipment and lighting schedules are less likely to be
adjusted. where mi and si are the measured and simulated values respec-
Turning to zone dry-bulb temperature as the observed output, is the mean of the measured values, n is the number of
tively, m
parameters concerning envelope material properties are the most data points, and p is the number of adjustable model parameters.
commonly adjusted, followed by infiltration rate. A similar obser- Around 10% of the papers use GOF and R2 to assess calibration
vation can be made when the model is calibrated to the building’s performance. GOF (Eq. (5)) which was proposed by ASHRAE RP-
heating or cooling load/energy. Material properties and infiltration 1051 [162] incorporates both variance and bias errors through a
rate are selected because indoor temperature measurements are formulation that considers both CV(RMSE) and NMBE. Since GOF
often used to investigate the relative changes in the building envel- combines CV(RMSE) and NMBE into a single composite function,
ope performance with varying boundary conditions it has the advantage of being able to identify a single optimal solu-
[150,149,51,131]. Additionally, studies have found parameters tion and to some extent solve multi-objective optimization prob-
affiliated with infiltration rate and material properties to influence lems more effectively. Therefore, it has been used to define the
indoor air temperature [79,131,45,76,50]. cost function in several optimization-based calibrations
Finally and intuitively, Fig. 12 shows that HVAC component [40,149,36,35,40,49]. For similar reasons, studies that utilize sam-
capacity and efficiency are typically adjusted when the observed pling methods (such as Monte Carlo sampling [102] and Latin
output is the HVAC component’s energy consumption. Likewise, Hypercube sampling [75–77]) have also used GOF to rank and iden-
EPD and equipment schedules are adjusted when the model is cal- tify suitable solutions.
ibrated against equipment energy consumption. pffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
GOF ¼ CV ðRMSEÞ2 þ NMBE2 ð5Þ
2
6. Calibration performance evaluation
R2 (Eq. (6)) provides an indication of the variability in the
6.1. Current approaches dependent variable from the mean values that are explained by
the regression model. ASHRAE Guideline 14 [59] recommends
Table 5 ranks the metrics used to assess calibration perfor- the use of CV ðRMSEÞ and R2 to select the best whole-building
mance based on the number of occurrences in the papers reviewed. energy use regression models such as the algorithms of the ASH-
A large proportion of the papers use CV(RMSE) or NMBE to deter- RAE Inverse Model Toolkit (IMT), which was developed from RP-
mine if a BES model was calibrated. This result is not unexpected 1050 [163,164]. Although there is currently no prescribed mini-
Table 6
Error limits specified by various guidelines and protocols for a building energy simulation model to be deemed calibrated.
Guideline/ Protocol Monthly Criteria (%) Hourly Criteria (%)

NMBE CV(RMSE) NMBE CV(RMSE)
ASHRAE Guideline 14 [59] 5 15 10 30
IPMVP [60] – – 5 20
FEMP [61] 5 15 10 30
14
mum value for R, IPMVP [165] advised that an R2 value of 0.75 pro- trast to out-of-sample buildings, Hedegarrd et al. [92] calibrated
vides a reasonably good causal relationship between energy use 159 BES models using one month of hourly data, and evaluated
and the independent variables. Using EnergyPlus simulations, their predictions using the subsequent month. Wang et al. [101]
Chakraborty and Elzarka [160] demonstrated that R2 used in tan- calibrated 84 residential buildings using five years of monthly
dem with a range normalized RMSE (RN(RMSE)) (Eq. 7) would pro- data and evaluated their predictions using the subsequent two
vide a better representation of the predictive performance of years of data.
system-level energy models. At the building-scale, the out-of-sample dataset typically com-
prises either a randomly sampled subset of the time-series data
Pn
ðmi si Þ2 that was not used for the calibration [81,32], data from a period
R2 ¼ 1 Pni¼1 ð6Þ
2 after the model was calibrated [82,139,83,42,158,66], or a selected
i¼1 ðmi mÞ
period based on occupancy levels and season [124].
where mi and si are the measured and simulated values respec-
is the mean of the measured values, and n is the number
tively, m
of data points. 7. Discussion
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn 2 7.1. Inputs and outputs
1 i¼1 ðmi si Þ
RN ðRMSEÞ ¼ 100 ð7Þ
rangeðmÞ np
The most prominent finding from the meta-analysis is that
where mi and si are the measured and simulated values respec- monthly building electricity and hourly indoor dry bulb tempera-
is the difference between the maximum and minimum
tively, m ture measurements are most commonly used to calibrate BES
of the measured values, n is the number of data points, and p is models, especially at the building scale. A possible explanation is
the number of adjustable model parameters. that electricity and gas/steam data are often obtained from utility
providers who typically provide monthly data. In comparison,
6.2. Evaluating probabilistic predictions measurements of the other outputs such as HVAC energy, equip-
ment electricity, and indoor dry bulb temperature would involve
Calibration methods that involve uncertainty quantification installing sub-meters and/or accessing the building automation
often provide probabilistic predictions to support risk- system where data is usually available at sub-hourly resolution.
conscious decision-making. However, almost all of the evalua- Another finding is that material thermophysical properties,
tion methods in the literature evaluate probabilistic predictions infiltration rate, and internal load densities are frequently selected
in a deterministic manner. Specifically, central tendency mea- for calibration, especially in automated calibration frameworks. It
sures such as the mean or median are used to compute accu- is well known amongst researchers in BES calibration that these
racy metrics, some of which are CV (RMSE) parameters are the main model parameters used to describe a
[94,32,91,81,82,95,93,84,101,89], NMBE [32,81,93,89], APE building and often represent a significant source of uncertainty
[166,82,95,101], RMSE [91,84,87], and MAPE [87] (Table 5). when estimating building energy performance. One well-known
However, it has been shown that using a single value such as early study that is often cited for uncertainties in infiltration rates
the mean to represent the entire distribution may result in an is that of Persily [168]. Likewise, material properties have also been
optimistic bias of the model’s prediction accuracy [85]. There- shown to be uncertain due to various reasons such as poor detail-
fore, these metrics are often accompanied by graphical plots ing/workmanship and thermal bridges [2,169,170].
comparing probabilistic predictions (e.g. using box-plots or error Previous studies have demonstrated the importance of schedule
bars) to the observed values [32,88,82,84,89]. adjustment in model calibration [143,144]. Therefore, it is some-
Alternative assessment methods have also been proposed to what surprising that schedules are typically not considered in
more precisely evaluate probabilistic predictions. For example, automated calibration frameworks. This inconsistency might be
assessing performance by comparing CV (RMSE) and NMBE median due to the sharp increase in computation cost if every schedule
or mean values with their 95% confidence intervals [88,40,98]. The parameter were considered in the calibration. Another possible
Kolmogorov–Smirnov (KS) test has also been used to assess cali- explanation for not considering schedules is that it could result
bration performance by comparing the predicted and measured in identifiability issues if a comprehensive dataset is not available
EUI distributions [166,95]. to avoid overparameterization [32,171]. Consequently, schedule
In order to facilitate comparison between probabilistic predic- adjustment typically involves simplification, such as selecting from
tions and deterministic observations, Chong, Augenbroe, and Da a list of predefined discrete schedules that best fit the measured
[144] proposed using the coverage width-based criterion (CWC). data [157,40,107,102]. As data in the built environment becomes
Likewise, the continuous rank probability score (CRPS) was pro- more available and accessible, developing scalable calibration algo-
posed to measure the distance between the probabilistic predic- rithms that can consider multiple data sources might prove impor-
tions and their corresponding observations [83]. Both the CWC tant in future research.
and the CRPS are the only metrics that consider both correctness
and informativeness of the probabilistic predictions. For detailed 7.2. Calibrating urban-scale models
explanation and formulation of the CWC and the CRPS, the reader
is referred to [144,167] respectively. The result of this review indicates that approximately 15% of
the papers are at the urban-scale. Of the 15%, most are located in
6.3. Validation using out-of-sample data the U.S. (54%) and Europe (34%), with none in a tropical climate.
Since the urban context (inter-building effects and urban microcli-
63% of the studies reviewed did not evaluate the calibrated mate) is an important aspect that should be considered in UBEMs
model on an out-of-sample test dataset. The remaining 37% advo- [172,173], it would be interesting to evaluate the performance of
cated the use of an out-of-sample test dataset to avoid bias in the the UBEM calibration methodologies in the tropics and cities out-
evaluation process. For instance, out-of-sample test buildings side of the U.S. and Europe.
have been used to evaluate the robustness and homogeneity of This review also found that UBEMs are typically calibrated
urban-scale archetype predictive performance [88,95,89]. In con- using monthly or annual data (Fig. 9) and rely on expert knowledge
15
Table 7 Plus model’s prediction of dynamic response by field-testing a

Checklist for improving reproducibility of publications that involves the calibration of set of demand response control strategies. As pointed by Rykiel
building energy simulation models.
[178], the crux of the issue is therefore determining (1) if a model
Checklist to enhance reproducibility is acceptable for its intended purpose; and (2) how confident
General Information should we be about the model’s inference about the actual building
Aim of the simulation system. Similar concepts of fit-for-purpose modeling strategies
Building location (Longitude and Latitude) were also discussed in BES, emphasizing the need to consider the
Building typology aim of the simulation when selecting different modeling
Weatherfile
Total, conditioned and unconditioned floor area
approaches [179–181,144,182].
Simulation engine Likewise, the choice of model and calibration approach should
Measured Data not be decoupled from the intended purpose of the simulation.
Observed output(s) and data source Simple models are more transparent and require less data for
Observed input(s) and data source parameter estimation and calibration but could increase model
Pre-processing
bias or inadequacy. In contrast, complex models are designed to
Calibration Method
represent actual physical systems better but tend to be more data
Calibration parameters and their corresponding ranges
Calibration approach (Automated or Manual) and algorithm if an automated and computationally intensive. The challenge then is in being able
approach was used to abstract a reasonable simplification of reality to meet the simu-
Analytical tools and techniques (see Section 4.3) lation objectives while considering the available data. Conse-
Calibration sequence if a multi-stage sequential approach is involved
quently, it is imperative that the purpose of the simulation and
Results and Conclusion
the corresponding performance criteria be specified before any cal-
Post-processing
Recommendation ibration is carried out. However, current calibration studies rely
solely on measures of accuracy such as CV (RMSE) and NMBE to
determine whether a model is ‘‘calibrated”. There is currently no
guidance on how the credibility of BES models for various applica-
and parameter reduction techniques (Fig. 5). This finding is consis- tions can be qualified. The association between model complexity,
tent with previous studies, which found that the UBEM generation simulation objectives, and data informativeness is also poorly
process typically relies on assumptions due to a lack of data quality understood. Further research on this topic is therefore
and accessibility [174,103]. Consequently, the credibility of UBEMs recommended.
is often questionable due to the widespread use of default or refer-
ence values, and the fact that UBEM calibration remains a signifi- 7.4. Reproducible research in BES
cantly overparameterized problem. However, as discussed in the
preceding Section 7.3, predictive accuracy is not synonymous to In this review, we found that BES simulations and existing cal-
model credibility. The need for parsimonious models was also ibration approaches are difficult to reproduce from the publica-
recently asserted in a review of UBEM use cases [175]. tions alone because of the complexity of BES models, and the
With IoT and the proliferation of sensors in the built environ- absence of clarity concerning the reporting of (a) calibration
ment, wide-ranging data streams at increasing spatial and tempo- parameters, observed inputs, and observed outputs and (b)
ral scales may be more easily accessible in the future. Having assumptions made during data pre- and post-processing.
access to large amounts of data entails other challenges such as BES models and the associated code and data should be made
ensuring data quality and consistency [176,174], and selecting only openly available to improve the quality of scientific research,
the necessary information needed for energy modeling [177]. reduce duplicated efforts, and facilitate collaborations [183]. Fur-
However, it also brings the opportunity for future research that thermore, reproducibility will become increasingly difficult with
investigates the minimum level of complexity and data required increasing data sources and more complex calibration methodolo-
to achieve a UBEM that is commensurate with its intended pur- gies as we attempt to bridge the gap between simulation and real-
pose. Data convergence across multiple spatial and temporal scales ity. Without access to the code and the data, it would be almost
is needed to support such work. Additionally, an interdisciplinary impossible to implement the fundamentals of scientific research
team of modelers, urban planners, policymakers, and decision- that include transparency, rigor, and independent verification
makers more generally is necessary to address these challenges. [184,183].
While full reproducibility requires complete openness and
7.3. Credibility or absolute predictive accuracy familiarity with open-source toolkits, it is still valuable to open
parts of the code and/or data. Therefore, we recommend an incre-
It is apparent from this review that predictive accuracy is mental approach to encourage reproducible research in BES. For a
widely used to evaluate BES models. However, a model with low start, publications should include a checklist to ensure clear report-
absolute predictive accuracy might still be reasonable for its ing of the context and processes involved in the calibration
intended use. For example, a relative comparison between differ- (Table 7). Next, all code should be published. The code only needs
ent design options only requires relative accuracy, which is typi- to be available and does not need to be structured or clean. Even
cally easier to achieve than absolute predictive accuracy. poorly written code informs a lot about the calibration approach.
Likewise, it is also possible for a BES model to exhibit a good fit Additionally, a subset of the data or synthetic data can be used
to observation data but not accurately represent building systems where data privacy is of concern.
or sub-systems due to many modeling parameters and uncertain- The technical challenges impeding reproducible publications
ties. Chong and Menberg [32] demonstrated that a low CV(RMSE) can be summarized as poor documentation of the dependencies
and NMBE is not indicative of good estimates of the true values necessary for the code to run, imprecise documentation on how
of the calibration parameters. to install and run the associated code, and a lack of a robust way
The Cambridge English dictionary defines credibility as ‘‘can be to run exact versions of all software involved [185]. Additionally,
believed or trusted”. From a modeling standpoint, credibility the choice of either a permissive or copyleft license (i.e., legal
would not require the model to be accurate or have high fidelity. terms) should be considered [186]. Docker is a popular platform
For example, Yin, Kiliccote, and Piette [139] evaluated an Energy- that can provide the infrastructure to facilitate reproducible
16
research in BES [187,188]. Specifically, Docker images and Docker- Declaration of Competing Interest
files are Docker concepts that help resolve the technical challenges
to reproducibility mentioned at the start of this paragraph [185]. The authors declare that they have no known competing finan-
To facilitate reproducibility, we created a GitHub repository (Sec- cial interests or personal relationships that could have appeared
tion 8) to demonstrate a Docker based approach for reproducing to influence the work reported in this paper.
BES research.
Acknowledgements
8. Conclusion
This research is supported by the National University of Singa-
pore, Singapore under its start-up grant (Project No. R-296-000-
Calibration remains a challenging task because there are no
190-133); the Republic of Singapore’s National Research Founda-
clear guidance and best practices on calibration procedures such
tion through a grant to the Berkeley Education Alliance for
as model inputs and outputs, calibration methods, calibration per-
Research in Singapore (BEARS) for the Singapore-Berkeley Building
formance evaluation, simulation reproducibility. As a result, BES
Efficiency and Sustainability in the Tropics (SinBerBEST) Program.
calibration has remained highly subjective, and perhaps even elu-
BEARS has been established by the University of California, Berke-
sive, and almost impossible to reproduce. Therefore, this study
ley as a center for intellectual excellence in research and education
contributes to existing knowledge of BES calibration by providing
in Singapore.
a coherent and detailed summary of the calibration methodology,
data requirements, performance evaluation criteria, and the cur-
rent state of knowledge. References
The findings indicate a significant increase in the use of auto-
mated calibration approaches. Amongst the automated calibration [1] J.L. Hensen, R. Lamberts, Building performance simulation for design and
operation, second ed., Routledge, 2019.
approaches, optimization and Bayesian calibration were the most [2] P. De Wilde, The gap between predicted and measured energy performance of
popular. In general, global sensitivity analysis is often applied buildings: A framework for investigation, Autom. Constr. 41 (2014) 40–49,
within automated approaches. In contrast, the dominant tech- https://doi.org/10.1016/j.autcon.2014.02.009.
[3] C. Turner, M. Frankel, et al., Energy performance of leed for new construction
niques used in manual approaches include using detailed audits, buildings, New Build. Inst. (2008) 1–42.
expert knowledge, and/or evidence-based procedures. High- [4] E. Mantesi, C.J. Hopfe, M.J. Cook, J. Glass, P. Strachan, The modelling gap:
resolution data is prevalent in both automated and manual Quantifying the discrepancy in the representation of thermal mass in building
simulation, Build. Environ. 131 (2018) 74–98, https://doi.org/10.1016/j.
approaches possibly due to increasing sensing capabilities and data
buildenv.2017.12.017.
availability in the built environment. [5] A.C. Menezes, A. Cripps, D. Bouchlaghem, R. Buswell, Predicted vs. actual
BES models are usually calibrated against one or two observed energy performance of non-domestic buildings: Using post-occupancy
outputs. The two most commonly used data sources for BES cali- evaluation data to reduce the performance gap, Appl. Energy 97 (2012)
355–364, https://doi.org/10.1016/j.apenergy.2011.11.075.
bration were monthly electricity consumption and hourly indoor [6] S. De Wit, G. Augenbroe, Analysis of uncertainty in building design
dry bulb temperature. Monthly electricity often stems from utility evaluations and its implications, Energy Build. 34 (9) (2002) 951–958,
bills and is often used to calibrate the building envelope’s thermo- https://doi.org/10.1016/S0378-7788(02)00070-1.
[7] H. Yoshino, T. Hong, N. Nord, Iea ebc annex 53: Total energy use in buildings-
physical parameters, infiltration rate, various internal gains densi- analysis and evaluation methods, Energy Build. 152 (2017) 124–136, https://
ties, and indoor setpoint temperatures. Hourly measurements of doi.org/10.1016/j.enbuild.2017.07.038.
indoor dry-bulb temperature during free-floating periods when [8] T.G. Trucano, L.P. Swiler, T. Igusa, W.L. Oberkampf, M. Pilch, Calibration,
validation, and sensitivity analysis: What’s what, Reliab. Eng. Syst. Saf. 91
the indoor temperatures are allowed to float during non- (10–11) (2006) 1331–1357, https://doi.org/10.1016/j.ress.2005.11.031.
operating hours are often used to calibrate thermophysical param- [9] N. Oreskes, K. Shrader-Frechette, K. Belitz, Verification, validation, and
eters of the building envelope and infiltration rate. confirmation of numerical models in the earth sciences, Science 263 (5147)
(1994) 641–646, https://doi.org/10.1126/science.263.5147.641.
The review indicates a lack of reproducibility due to the absence [10] L.F. Konikow, J.D. Bredehoeft, Ground-water models cannot be validated, Adv.
of clarity in reporting the modeling and data assumptions, calibra- Water Resour. 15 (1) (1992) 75–83, https://doi.org/10.1016/0309-1708(92)
tion parameters, observed inputs, and observed outputs. Therefore, 90033-X.
[11] A.I. of Aeronautics, Astronautics, AIAA guide for the verification and
an incremental approach to encourage reproducibility in BES
validation of computational fluid dynamics simulations, American Institute
research was proposed in this study, along with a fully repro- of aeronautics and astronautics, 1998. doi:https://doi.org/10.2514/
ducible example on GitHub (Section 8). 4.472855.001..
Taken together, the present study lays the groundwork that [12] T.A. Reddy, Literature review on calibration of building energy simulation
programs: uses, problems, procedures, uncertainty, and tools, ASHRAE Trans.
future calibration studies can build on. While it is clear that there 112 (2006) 226.
is a significant body of work available, the precise mechanism of [13] D. Coakley, P. Raftery, M. Keane, A review of methods to match building
BES calibration and the evaluation of model credibility remains energy simulation models to measured data, Renew. Sustain. Energy Rev. 37
(2014) 123–141, https://doi.org/10.1016/j.rser.2014.05.007.
to be elucidated. Incorporating multiple data sources within auto- [14] E. Fabrizio, V. Monetti, Methodologies and advancements in the calibration of
mated calibration algorithms would also be exciting for future building energy models, Energies 8 (4) (2015) 2548–2574, https://doi.org/
work with increasing data availability. We also believe that a cul- 10.3390/en8042548.
[15] US Department of Energy (DOE) Office of Energy Efficiency & Renewable
ture of reproducibility will significantly aid efforts in establishing a Energy, EnergyPlus. https://www.energy.gov/eere/buildings/downloads/
standardized calibration methodology. energyplus-0..
[16] IBPSA-USA, Building Energy Software Tools (BEST) directory. https://www.
buildingenergysoftwaretools.com/software-listing?keywords=EnergyPlus..
[17] S.A. Klein, TRNSYS 18: A transient simulation program. https://sel.me.wisc.
Data availability edu/trnsys..
[18] DOE-2.[link]. https://www.doe2.com..
[19] X. Li, J. Wen, Review of building energy modeling for control and operation,
The research compendium for this article can be found at
Renew. Sustain. Energy Rev. 37 (2014) 517–537, https://doi.org/10.1016/j.
https://github.com/ideas-lab-nus/calibrating-building-simulation- rser.2014.05.056.
review, hosted at GitHub. [20] Energy performance of buildings – Calculation of energy use for space heating
The simple example of reproducible building energy simulation and cooling, Standard, International Organization for Standardization,
Geneva, CH (2008)..
(Section 7.4) can be found at https://github.com/ideas-lab-nus/re- [21] Energy performance of buildings – Energy needs for heating and cooling,
producing-building-simulation, hosted at GitHub. internal temperatures and sensible and latent heat loads – Part 1: Calculation
17
procedures, Standard, International Organization for Standardization, [47] Q. Zhang, Z. Tian, Z. Ma, G. Li, Y. Lu, J. Niu, Development of the heating load
Geneva, CH (2017).. prediction model for the residential building of district heating based on
[22] B. Iooss, S.D. Veiga, A. Janon, G. Pujol, with contributions from Baptiste Broto, model calibration, Energy 205. doi:10.1016/j.energy.2020.117949..
K. Boumhaout, T. Delage, R.E. Amri, J. Fruth, L. Gilquin, J. Guillaume, M.I. [48] S.-W. Ha, S.-H. Park, J.-Y. Eom, M.-S. Oh, G.-Y. Cho, E.-J. Kim, Parameter
Idrissi, L. Le Gratiet, P. Lemaitre, A. Marrel, A. Meynaoui, B.L. Nelson, F. calibration for a trnsys bipv model using in situ test data, Energies 13 (18).
Monari, R. Oomen, O. Rakovec, B. Ramos, O. Roustant, E. Song, J. Staum, R. doi:10.3390/en13184935..
Sueur, T. Touati, F. Weber, sensitivity: Global Sensitivity Analysis of Model [49] G. Larochelle Martin, D. Monfet, H. Nouanegue, K. Lavigne, S. Sansregret,
Outputs, r package version 1.24.0 (2021). https://CRAN.R-project. Energy calibration of hvac sub-system model using sensitivity analysis and
org/package=sensitivity.. meta-heuristic optimization, Energy Build. 202. doi:10.1016/j.
[23] J. Herman, W. Usher, Salib: an open-source python library for sensitivity enbuild.2019.109382..
analysis, J. Open Source Software 2 (9) (2017) 97, https://doi.org/10.21105/ [50] M. Ferrara, C. Lisciandrello, A. Messina, M. Berta, Y. Zhang, E. Fabrizio,
joss.00097. Optimizing the transition between design and operation of zebs: Lessons
[24] M. Wetter, et al., Genopt-a generic optimization program, in: Seventh learnt from the solar decathlon china, scutxpolito prototype, Energy Build.
International IBPSA Conference, Rio de Janeiro, 2001, pp. 601–608.. 213 (2018), https://doi.org/10.1016/j.enbuild.2020.109824.
[25] Zhang, Yi, JEPlus – An parametric tool for EnergyPlus and TRNSYS. http:// [51] A. Cacabelos, P. Eguía, L. Febrero, E. Granada, Development of a new multi-
www.jeplus.org/wiki/doku.php.. stage building energy model calibration methodology and validation in a
[26] F.-A. Fortin, F.-M. De Rainville, M.-A.G. Gardner, M. Parizeau, C. Gagné, Deap: public library, Energy Build. 146 (2017) 182–199, https://doi.org/10.1016/j.
Evolutionary algorithms made easy, J. Mach. Learn. Res. 13 (1) (2012) 2171– enbuild.2017.04.071.
2175. [52] W. Li, Z. Tian, Y. Lu, F. Fu, Stepwise calibration for residential building thermal
[27] J. Bossek, ecr: Evolutionary Computation in R, r package version 2.1.0 (2017). performance model using hourly heat consumption data, Energy Build. 181
https://CRAN.R-project.org/package=ecr.. (2018) 10–25, https://doi.org/10.1016/j.enbuild.2018.10.001.
[28] M.J. Bayarri, J.O. Berger, R. Paulo, J. Sacks, J.A. Cafeo, J. Cavendish, C.-H. Lin, J. [53] A. Abdelalim, W. O’Brien, Z. Shi, Data visualization and analysis of energy flow
Tu, A framework for validation of computer models, Technometrics 49 (2) on a multi-zone building scale, Autom. Constr. 84 (2017) 258–273, https://
(2007) 138–154, https://doi.org/10.1198/004017007000000092. doi.org/10.1016/j.autcon.2017.09.012.
[29] M.C. Kennedy, A. O’Hagan, Bayesian calibration of computer models, J. R. Stat. [54] A. Ogando, N. Cid, M. Fernández, Energy modelling and automated
Soc.: Ser. B (Statistical Methodology) 63 (3) (2001) 425–464, https://doi.org/ calibrations of ancient building simulations: A case study of a school in the
10.1111/1467-9868.00294. northwest of spain, Energies 10 (6). doi:10.3390/en10060807..
[30] D. Higdon, M. Kennedy, J.C. Cavendish, J.A. Cafeo, R.D. Ryne, Combining field [55] E. Carlon, M. Schwarz, A. Prada, L. Golicza, V. Verma, M. Baratieri, A.
data and computer simulations for calibration and prediction, SIAM J. Gasparella, W. Haslinger, C. Schmidl, On-site monitoring and dynamic
Scientific Comput. 26 (2) (2004) 448–466, https://doi.org/10.1137/ simulation of a low energy house heated by a pellet boiler, Energy Build.
S1064827503426693. 116 (2016) 296–306, https://doi.org/10.1016/j.enbuild.2016.01.001.
[31] J. Palomo, R. Paulo, G. García-Donato, SAVE: an R package for the statistical [56] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective
analysis of computer models, J. Stat. Softw. 64 (13) (2015) 1–23, URL: http:// genetic algorithm: Nsga-ii, IEEE Trans. Evol. Comput. 6 (2) (2002) 182–197,
www.jstatsoft.org/v64/i13/. https://doi.org/10.1109/4235.996017.
[32] A. Chong, K. Menberg, Guidelines for the bayesian calibration of building [57] J. Kennedy, R. Eberhart, Particle swarm optimization, in: Proceedings of
energy models, Energy Build. 174 (2018) 527–547, https://doi.org/10.1016/j. ICNN’95-international conference on neural networks, vol. 4, IEEE, 1995, pp.
enbuild.2018.06.028. 1942–1948. doi:10.1109/ICNN.1995.488968..
[33] L. Raillon, S. Rouchier, S. Juricic, pysip: an open-source tool for bayesian [58] R. Hooke, T.A. Jeeves, ”direct search”solution of numerical and statistical
inference and prediction of heat transfer in buildings, in: Congres français de problems, J. ACM 8 (2) (1961) 212–229, https://doi.org/10.1145/
thermique, Nantes, 2019.. 321062.321069.
[34] C. Bandera, G. Ruiz, Towards a new generation of building envelope [59] ASHRAE, Guideline 14, measurement of energy and demand savings,
calibration, Energies 10 (12). doi:10.3390/en10122102.. American Society of Heating, Ventilating, and Air Conditioning Engineers,
[35] G. Ramos Ruiz, C. Fernández Bandera, Analysis of uncertainty indices used for Atlanta, Georgia..
building envelope calibration, Appl. Energy 185 (2017) 82–94, https://doi. [60] EVO, International performance measurement and verification protocol:
org/10.1016/j.apenergy.2016.10.054. Concepts and options for determining energy and water savings volume 1,
[36] G. Ramos Ruiz, C. Fernández Bandera, T. Gómez-Acebo Temes, A. Sánchez- Efficiency Valuation Organization..
Ostiz Gutierrez, Genetic algorithm for building envelope calibration, Appl. [61] US DOE FEMP, M&V guidelines: Measurement and verification for
Energy 168 (2016) 691–705, https://doi.org/10.1016/j.apenergy.2016.01.075. performance-based contracts, version 4.0, Energy Efficiency and Renewable
[37] S. Zuhaib, M. Hajdukiewicz, J. Goggins, Application of a staged automated Energy..
calibration methodology to a partially-retrofitted university building energy [62] S. Qiu, Z. Li, Z. Pang, W. Zhang, Z. Li, A quick auto-calibration approach based
model, J. Build. Eng. 26. doi:10.1016/j.jobe.2019.100866.. on normative energy models, Energy Build. 172 (2018) 35–46, https://doi.
[38] S. Martínez, P. Eguía, E. Granada, A. Moazami, M. Hamdy, A performance org/10.1016/j.enbuild.2018.04.053.
comparison of multi-objective optimization-based approaches for calibrating [63] G. Chaudhary, J. New, J. Sanyal, P. Im, Z. O’Neill, V. Garg, Evaluation of
white-box building energy models, Energy Build. 216. doi:10.1016/j. autotune calibration against manual calibration of building energy models,
enbuild.2020.109942.. Appl. Energy 182 (2016) 115–134, https://doi.org/10.1016/j.
[39] S. Martínez, E. Pórez, P. Egua, A. Erkoreka, E. Granada, Model calibration and apenergy.2016.08.073.
exergoeconomic optimization with nsga-ii applied to a residential [64] A. Garrett, J. New, Scalable tuning of building models to hourly data, Energy
cogeneration, Appl. Therm. Eng. 169. doi:10.1016/j. 84 (2015) 493–502, https://doi.org/10.1016/j.energy.2015.03.014.
applthermaleng.2020.114916.. [65] K. Sun, T. Hong, S. Taylor-Lange, M. Piette, A pattern-based automated
[40] S. Nagpal, J. Hanson, C. Reinhart, A framework for using calibrated campus- approach to building energy model calibration, Appl. Energy 165 (2016) 214–
wide building energy models for continuous planning and greenhouse gas 224, https://doi.org/10.1016/j.apenergy.2015.12.026.
emissions reduction tracking, Appl. Energy 241 (2019) 82–97, https://doi.org/ [66] Z. Yang, B. Becerik-Gerber, A model calibration framework for simultaneous
10.1016/j.apenergy.2019.03.010. multi-level building energy simulation, Appl. Energy 149 (2015) 415–431,
[41] S. Tian, S. Shao, B. Liu, Investigation on transient energy consumption of cold https://doi.org/10.1016/j.apenergy.2015.03.048.
storages: Modeling and a case study, Energy 180 (2019) 1–9, https://doi.org/ [67] H. Schreiber, F. Lanzerath, A. Bardow, Predicting performance of adsorption
10.1016/j.energy.2019.04.217. thermal energy storage: From experiments to validated dynamic models,
[42] J. Chen, X. Gao, Y. Hu, Z. Zeng, Y. Liu, A meta-model-based optimization Appl. Therm. Eng. 141 (2018) 548–557, https://doi.org/10.1016/j.
approach for fast and reliable calibration of building energy models, Energy applthermaleng.2018.05.094.
188. doi:10.1016/j.energy.2019.116046.. [68] L. Santos, A. Afshari, L. Norford, J. Mao, Evaluating approaches for district-
[43] C. Andrade-Cabrera, D. Burke, W. Turner, D. Finn, Ensemble calibration of wide energy model calibration considering the urban heat island effect, Appl.
lumped parameter retrofit building models using particle swarm Energy 215 (2018) 31–40, https://doi.org/10.1016/j.apenergy.2018.01.089.
optimization, Energy Build. 155 (2017) 513–532, https://doi.org/10.1016/j. [69] A. Zekar, S. Khatib, Development and assessment of simplified building
enbuild.2017.09.035. representations under the context of an urban energy model: Application to
[44] T. Yang, Y. Pan, J. Mao, Y. Wang, Z. Huang, An automated optimization method arid climate environment, Energy Build. 173 (2018) 461–469, https://doi.org/
for calibrating building energy simulation models with measured data: 10.1016/j.enbuild.2018.04.030.
Orientation and a case study, Appl. Energy 179 (2016) 1220–1231, https:// [70] W. Tian, Y. Heo, P. De Wilde, Z. Li, D. Yan, C.S. Park, X. Feng, G. Augenbroe, A
doi.org/10.1016/j.apenergy.2016.07.084. review of uncertainty analysis in building energy assessment, Renew. Sustain.
[45] F. Roberti, U. Oberegger, A. Gasparella, Calibrating historic building energy Energy Rev. 93 (2018) 285–301, https://doi.org/10.1016/j.rser.2018.05.029.
models to hourly indoor air and surface temperatures: Methodology and case [71] C.J. Roy, W.L. Oberkampf, A comprehensive framework for verification,
study, Energy Build. 108 (2015) 236–243, https://doi.org/10.1016/j. validation, and uncertainty quantification in scientific computing, Comput.
enbuild.2015.09.010. Methods Appl. Mech. Eng. 200 (25–28) (2011) 2131–2144, https://doi.org/
[46] C. Andrade-Cabrera, W. Turner, D. Finn, Augmented ensemble calibration of 10.1016/j.cma.2011.03.016.
lumped-parameter building models, Build. Simul. 12 (2) (2019) 207–230, [72] A. Der Kiureghian, O. Ditlevsen, Aleatory or epistemic? Does it matter?, Struct
https://doi.org/10.1007/s12273-018-0473-5. Saf. 31 (2) (2009) 105–112, https://doi.org/10.1016/j.strusafe.2008.06.020.
18
[73] Y. Sun, Closing the building energy performance gap by improving our [99] P. Raftery, M. Keane, A. Costa, Calibrating whole building energy models:
predictions, Ph.D. thesis, Georgia Institute of Technology (2014).. Detailed case study using hourly measured data, Energy Build. 43 (12) (2011)
[74] G. Yun, K. Song, Development of an automatic calibration method of a vrf 3666–3679, https://doi.org/10.1016/j.enbuild.2011.09.039.
energy model for the design of energy efficient buildings, Energy Build. 135 [100] J. Fernandez, L. del Portillo, I. Flores, A novel residential heating consumption
(2017) 156–165, https://doi.org/10.1016/j.enbuild.2016.11.060. characterisation approach at city level from available public data:
[75] L. Harmer, G. Henze, Using calibrated energy models for building Description and case study, Energy Build. 221. doi:10.1016/j.
commissioning and load prediction, Energy Build. 92 (2015) 204–215, enbuild.2020.110082..
https://doi.org/10.1016/j.enbuild.2014.10.078. [101] C.-K. Wang, S. Tindemans, C. Miller, G. Agugiaro, J. Stoter, Bayesian calibration
[76] J. Cipriano, G. Mor, D. Chemisana, D. Pérez, G. Gamboa, X. Cipriano, Evaluation at the urban scale: a case study on a large residential heating demand
of a multi-stage guided search approach for the calibration of building energy application in amsterdam, J. Build. Performance Simul. 13 (3) (2020) 347–
simulation models, Energy Build. 87 (2015) 370–385, https://doi.org/ 361, https://doi.org/10.1080/19401493.2020.1729862.
10.1016/j.enbuild.2014.08.052. [102] Y. Chen, Z. Deng, T. Hong, Automatic and rapid calibration of urban building
[77] N. Sakiyama, L. Mazzaferro, J. Carlo, T. Bejat, H. Garrecht, Natural ventilation energy models by learning from energy performance database, Appl. Energy
potential from weather analyses and building simulation, Energy Build. 277. doi:10.1016/j.apenergy.2020.115584..
doi:10.1016/j.enbuild.2020.110596.. [103] C.F. Reinhart, C.C. Davila, Urban building energy modeling–a review of a
[78] M. Giuliani, G. Henze, A. Florita, Modelling and calibration of a high-mass nascent field, Build. Environ. 97 (2016) 196–202, https://doi.org/10.1016/j.
historic building for reducing the prebound effect in energy assessment, buildenv.2015.12.001.
Energy Build. 116 (2016) 434–448, https://doi.org/10.1016/j. [104] I. Ballarini, S.P. Corgnati, V. Corrado, Use of reference buildings to assess the
enbuild.2016.01.034. energy saving potentials of the residential building stock: The experience of
[79] S. Martńnez, A. Erkoreka, P. Eguńa, E. Granada, L. Febrero, Energy tabula project, Energy policy 68 (2014) 273–284, https://doi.org/10.1016/j.
characterization of a paslink test cell with a gravel covered roof using a enpol.2014.01.027.
novel methodology: Sensitivity analysis and bayesian calibration, J. Build. [105] T. Loga, B. Stein, N. Diefenbach, Tabula building typologies in 20 european
Eng. 22 (2019) 1–11, https://doi.org/10.1016/j.jobe.2018.11.010. countries-making energy-related features of residential building stocks
[80] K. Menberg, Y. Heo, R. Choudhary, Influence of error terms in bayesian comparable, Energy Build. 132 (2016) 4–12, https://doi.org/10.1016/j.
calibration of energy system models, J. Build. Performance Simul. 12 (1) enbuild.2016.06.094.
(2019) 82–96, https://doi.org/10.1080/19401493.2018.1475506. [106] Z. Taylor, Y. Xie, C. Burleyson, N. Voisin, I. Kraucunas, A multi-scale
[81] A. Chong, K. Lam, M. Pozzi, J. Yang, Bayesian calibration of building energy calibration approach for process-oriented aggregated building energy
models with large datasets, Energy Build. 154 (2017) 343–355, https://doi. demand models, Energy Build. 191 (2019) 82–94, https://doi.org/10.1016/j.
org/10.1016/j.enbuild.2017.08.069. enbuild.2019.02.018.
[82] J. Yuan, V. Nian, B. Su, Q. Meng, A simultaneous calibration and parameter [107] A. Krayem, A. Al Bitar, A. Ahmad, G. Faour, J.-P. Gastellu-Etchegorry, I. Lakkis,
ranking method for building energy models, Appl. Energy 206 (2017) 657– J. Gerard, H. Zaraket, A. Yeretzian, S. Najem, Urban energy modeling and
666, https://doi.org/10.1016/j.apenergy.2017.08.220. calibration of a coastal mediterranean city: The case of beirut, Energy Build.
[83] Q. Li, G. Augenbroe, J. Brown, Assessment of linear emulators in lightweight 199 (2019) 223–234, https://doi.org/10.1016/j.enbuild.2019.06.050.
bayesian calibration of dynamic building energy models for parameter [108] K. Beven, A manifesto for the equifinality thesis, J. Hydrol. 320 (1–2) (2006)
estimation and performance prediction, Energy Build. 124 (2016) 194–202, 18–36, https://doi.org/10.1016/j.jhydrol.2005.07.007.
https://doi.org/10.1016/j.enbuild.2016.04.025. [109] G. Allesina, E. Mussatti, F. Ferrari, A. Muscio, A calibration methodology for
[84] Y. Heo, D. Graziano, L. Guzowski, R. Muehleisen, Evaluation of calibration building dynamic models based on data collected through survey and
efficacy under different levels of uncertainty, J. Build. Performance billings, Energy Build. 158 (2018) 406–416, https://doi.org/10.1016/j.
Simul. 8 (3) (2015) 135–144, https://doi.org/10.1080/19401493. enbuild.2017.09.089.
2014.896947. [110] Z. Mylona, M. Kolokotroni, S. Tassou, Frozen food retail: Measuring and
[85] A. Chong, W. Xu, S. Chao, N.-T. Ngo, Continuous-time bayesian calibration of modelling energy use and space environmental systems in an operational
energy models using bim and energy data, Energy Build. 194 (2019) 177–190, supermarket, Energy Build. 144 (2017) 129–143, https://doi.org/10.1016/j.
https://doi.org/10.1016/j.enbuild.2019.04.017. enbuild.2017.03.049.
[86] S. Chen, D. Friedrich, Z. Yu, J. Yu, District heating network demand prediction [111] J. Vesterberg, S. Andersson, T. Olofsson, Calibration of low-rise multifamily
using a physics-based energy model with a bayesian approach for parameter residential simulation models using regressed estimations of transmission
calibration, Energies 12(18). doi:10.3390/en12183408.. losses, J. Build. Performance Simul. 9 (3) (2016) 304–315, https://doi.org/
[87] G. Tardioli, A. Narayan, R. Kerrigan, M. Oates, J. O’Donnell, D. Finn, A 10.1080/19401493.2015.1067257.
methodology for calibration of building energy models at district scale using [112] D. Guyot, F. Giraud, F. Simon, D. Corgier, C. Marvillet, B. Tremeac, Building
clustering and surrogate techniques, Energy Build. 226. doi:10.1016/j. energy model calibration: A detailed case study using sub-hourly measured
enbuild.2020.110309.. data, Energy Build. 223 (2020), https://doi.org/10.1016/j.
[88] M. Kristensen, R. Hedegaard, S. Petersen, Hierarchical calibration of enbuild.2020.110189 110189.
archetypes for urban building energy modeling, Energy Build. 175 (2018) [113] R. Escandón, R. Suárez, J. Sendra, On the assessment of the energy
219–234, https://doi.org/10.1016/j.enbuild.2018.07.030. performance and environmental behaviour of social housing stock for the
[89] M. Kristensen, R. Hedegaard, S. Petersen, Long-term forecasting of hourly adjustment between simulated and measured data: The case of mild winters
district heating loads in urban areas using hierarchical archetype modeling, in the mediterranean climate of southern europe, Energy Build. 152 (2017)
Energy 201. doi:10.1016/j.energy.2020.117687.. 418–433, https://doi.org/10.1016/j.enbuild.2017.07.063.
[90] S. Rouchier, M. Jiménez, S. Castańo, Sequential monte carlo for on-line [114] F. Ascione, N. Bianco, R. De Masi, F. De’Rossi, G. Vanoli, Energy retrofit of an
parameter estimation of a lumped building energy model, Energy Build. 187 educational building in the ancient center of benevento. feasibility study of
(2019) 86–94, https://doi.org/10.1016/j.enbuild.2019.01.045. energy savings and respect of the historical value, Energy Build. 95 (2015)
[91] H. Lim, Z. Zhai, Comprehensive evaluation of the influence of meta-models on 172–183, https://doi.org/10.1016/j.enbuild.2014.10.072.
bayesian calibration, Energy Build. 155 (2017) 66–75, https://doi.org/ [115] V. Monetti, E. Fabrizio, M. Filippi, Impact of low investment strategies for
10.1016/j.enbuild.2017.09.009. space heating control: Application of thermostatic radiators valves to an old
[92] R. Hedegaard, M. Kristensen, T. Pedersen, A. Brun, S. Petersen, Bottom-up residential building, Energy Build. 95 (2015) 202–210, https://doi.org/
modelling methodology for urban-scale analysis of residential space heating 10.1016/j.enbuild.2015.01.001.
demand response, Appl. Energy 242 (2019) 181–204, https://doi.org/10.1016/ [116] G. Kazas, E. Fabrizio, M. Perino, Energy demand profile generation with
j.apenergy.2019.03.063. detailed time resolution at an urban district scale: A reference building
[93] Y.-J. Kim, C.-S. Park, Stepwise deterministic and stochastic calibration of an approach and case study, Appl. Energy 193 (2017) 243–262, https://doi.org/
energy simulation model for an existing building, Energy Build. 133 (2016) 10.1016/j.apenergy.2017.01.095.
455–468, https://doi.org/10.1016/j.enbuild.2016.10.009. [117] D. Jermyn, R. Richman, A process for developing deep energy retrofit
[94] H. Lim, Z. Zhai, Influences of energy data on bayesian calibration of building strategies for single-family housing typologies: Three toronto case studies,
energy model, Appl. Energy 231 (2018) 686–698, https://doi.org/10.1016/j. Energy Build. 116 (2016) 522–534, https://doi.org/10.1016/j.
apenergy.2018.09.156. enbuild.2016.01.022.
[95] J. Sokol, C. Cerezo Davila, C. Reinhart, Validation of a bayesian-based method [118] H. Samuelson, A. Ghorayshi, C. Reinhart, Analysis of a simplified calibration
for defining residential archetypes in urban building energy models, Energy procedure for 18 design-phase building energy models, J. Build.
Build. 134 (2017) 11–24, https://doi.org/10.1016/j.enbuild.2016.10.050. Performance Simul. 9 (1) (2016) 17–29, https://doi.org/10.1080/
[96] W. Tian, S. Yang, Z. Li, S. Wei, W. Pan, Y. Liu, Identifying informative energy 19401493.2014.988752.
data in bayesian calibration of building energy models, Energy Build. 119 [119] P. Beagon, F. Boland, M. Saffari, Closing the gap between simulation and
(2016) 363–376, https://doi.org/10.1016/j.enbuild.2016.03.042. measured energy use in home archetypes, Energy Build. 224. doi:10.1016/j.
[97] L. Lundström, J. Akander, Bayesian calibration with augmented stochastic enbuild.2020.110244..
state-space models of district-heated multifamily buildings, Energies 13(1). [120] M. Tokarik, R. Richman, Life cycle cost optimization of passive energy
doi:10.3390/en13010076.. efficiency improvements in a toronto house, Energy Build. 118 (2016) 160–
[98] C. Zhu, W. Tian, B. Yin, Z. Li, J. Shi, Uncertainty calibration of building energy 169, https://doi.org/10.1016/j.enbuild.2016.02.015.
models by combining approximate bayesian computation and machine [121] Y. Ji, P. Xu, A bottom-up and procedural calibration method for building
learning algorithms, Appl. Energy 268. doi:10.1016/j.apenergy.2020.115025.. energy simulation models based on hourly electricity submetering data,
Energy 93 (2015) 2337–2350, https://doi.org/10.1016/j.energy.2015.10.109.
19
[122] M. Royapoor, T. Roskilly, Building model calibration using energy and [146] C.A. Aumann, A methodology for developing simulation models of complex
environmental data, Energy Build. 94 (2015) 109–120, https://doi.org/ systems, Ecol. Model. 202 (3–4) (2007) 385–396, https://doi.org/10.1016/j.
10.1016/j.enbuild.2015.02.050. ecolmodel.2006.11.005.
[123] N. Jain, E. Burman, S. Stamp, D. Mumovic, M. Davies, Cross-sectoral [147] H.B. Gunay, W. O’Brien, I. Beausoleil-Morrison, Implementation and
assessment of the performance gap using calibrated building energy comparison of existing occupant behaviour models in energyplus, J. Build.
performance simulation, Energy Build. 224. doi:10.1016/j. Performance Simul. 9 (6) (2016) 567–588, https://doi.org/10.1080/
enbuild.2020.110271.. 19401493.2015.1102969.
[124] A. O’ Donovan, P. O’ Sullivan, M. Murphy, Predicting air temperatures in a [148] S. Kanteh Sakiliba, N. Bolton, M. Sooriyabandara, The energy performance and
naturally ventilated nearly zero energy building: Calibration, validation, techno-economic analysis of zero energy bill homes, Energy Build. 228.
analysis and approaches, Appl. Energy 250 (2019) 991–1010. doi:10.1016/j. doi:10.1016/j.enbuild.2020.110426..
apenergy.2019.04.082.. [149] A. Figueiredo, J. Kämpf, R. Vicente, R. Oliveira, T. Silva, Comparison between
[125] F. Pianosi, K. Beven, J. Freer, J.W. Hall, J. Rougier, D.B. Stephenson, T. Wagener, monitored and simulated data using evolutionary algorithms: Reducing the
Sensitivity analysis of environmental models: A systematic review with performance gap in dynamic building simulation, J. Build. Eng. 17 (2018) 96–
practical workflow, Environ. Model. Software 79 (2016) 214–232, https://doi. 106, https://doi.org/10.1016/j.jobe.2018.02.003.
org/10.1016/j.envsoft.2016.02.008. [150] J. Lee, S. Yoo, J. Kim, D. Song, H. Jeong, Improvements to the customer baseline
[126] M.D. Morris, Factorial sampling plans for preliminary computational load (cbl) using standard energy consumption considering energy efficiency
experiments, Technometrics 33 (2) (1991) 161–174. and demand response, Energy 144 (2018) 1052–1063, https://doi.org/
[127] F. Campolongo, J. Cariboni, A. Saltelli, An effective screening design for 10.1016/j.energy.2017.12.044.
sensitivity analysis of large models, Environ. Model. Software 22 (10) (2007) [151] M. De Rosa, M. Brennenstuhl, C. Cabrera, U. Eicker, D. Finn, An iterative
1509–1518. methodology for model complexity reduction in residential building
[128] K. Kim, J. Haberl, Development of a home energy audit methodology for simulation, Energies 12 (12). doi:10.3390/en12122448..
determining energy and cost efficient measures using an easy-to-use [152] C. Cornaro, S. Rossi, S. Cordiner, V. Mulone, L. Ramazzotti, Z. Rinaldi, Energy
simulation: Test results from single-family houses in texas, usa, Build. performance analysis of stile house at the solar decathlon, Lessons learned, J.
Simul. 9 (6) (2016) 617–628, https://doi.org/10.1007/s12273-016-0299-y. Build. Eng. 13 (2017) (2015) 11–27, https://doi.org/10.1016/j.
[129] W. Tian, A review of sensitivity analysis methods in building energy analysis, jobe.2017.06.015.
Renew. Sustain. Energy Rev. 20 (2013) 411–419, https://doi.org/10.1016/j. [153] E. Carlon, V. Verma, M. Schwarz, L. Golicza, A. Prada, M. Baratieri, W.
rser.2012.12.014. Haslinger, C. Schmidl, Experimental validation of a thermodynamic boiler
[130] J. Robertson, B. Polly, J. Collis, Reduced-order modeling and simulated model under steady state and dynamic conditions, Appl. Energy 138 (2015)
annealing optimization for efficient residential building utility bill 505–516, https://doi.org/10.1016/j.apenergy.2014.10.031.
calibration, Appl. Energy 148 (2015) 169–177, https://doi.org/10.1016/j. [154] M. Bhandari, S. Shrestha, J. New, Evaluation of weather datasets for building
apenergy.2015.03.049. energy simulation, Energy Build. 49 (2012) 109–118, https://doi.org/10.1016/
[131] R. Enríquez, M. Jiménez, M. Heras, Towards non-intrusive thermal load j.enbuild.2012.01.033.
monitoring of buildings: Bes calibration, Appl. Energy 191 (2017) 44–54, [155] A. Persily, A. Musser, S.J. Emmerich, Modeled infiltration rate distributions for
https://doi.org/10.1016/j.apenergy.2017.01.050. us housing, Indoor air 20 (6) (2010) 473–485, https://doi.org/10.1111/j.1600-
[132] F. Tüysüz, H. Sözer, Calibrating the building energy model with the short 0668.2010.00669.x.
term monitored data: A case study of a large-scale residential building, [156] D. Kim, S. Cox, H. Cho, P. Im, Model calibration of a variable refrigerant flow
Energy Build. 224. doi:10.1016/j.enbuild.2020.110207.. system with a dedicated outdoor air system: A case study, Energy Build. 158
[133] K. Kim, J. Haberl, Development of methodology for calibrated simulation in (2018) 884–896, https://doi.org/10.1016/j.enbuild.2017.10.049.
single-family residential buildings using three-parameter change-point [157] S. Nagpal, C. Mueller, A. Aijazi, C. Reinhart, A methodology for auto-
regression model, Energy and Buildings 99 (2015) 140–152, cited By 22. calibrating urban building energy models using surrogate modeling
doi:10.1016/j.enbuild.2015.04.032.. techniques, J. Build. Performance Simul. 12 (1) (2019) 1–16, https://doi.org/
[134] A. Elharidi, P. Tuohy, M. Teamah, A. Hanafy, Energy and indoor environmental 10.1080/19401493.2018.1457722.
performance of typical egyptian offices: Survey, baseline model and [158] M. Manfren, B. Nastasi, Parametric performance analysis and energy model
uncertainties, Energy Build. 135 (2017) 367–384, https://doi.org/10.1016/j. calibration workflow integration – a scalable approach for buildings, Energies
enbuild.2016.11.011. 13(3). doi:10.3390/en13030621..
[135] B. Glasgo, C. Hendrickson, I.L. Azevedo, Assessing the value of information in [159] S. Asadi, E. Mostavi, D. Boussaa, M. Indaganti, Building energy model
residential building simulation: Comparing simulated and actual building calibration using automated optimization-based algorithm, Energy Build.
loads at the circuit level, Appl. Energy 203 (2017) 348–363, https://doi.org/ 198 (2019) 106–114, https://doi.org/10.1016/j.enbuild.2019.06.001.
10.1016/j.apenergy.2017.05.164. [160] D. Chakraborty, H. Elzarka, Performance testing of energy models: are we
[136] I. Allard, T. Olofsson, G. Nair, Energy evaluation of residential buildings: using the right statistical metrics?, J Build. Performance Simul. 11 (4) (2018)
Performance gap analysis incorporating uncertainties in the evaluation 433–448, https://doi.org/10.1080/19401493.2017.1387607.
methods, Build. Simul. 11 (4) (2018) 725–737, https://doi.org/10.1007/ [161] G.R. Ruiz, C.F. Bandera, Validation of calibrated energy models: Common
s12273-018-0439-7. errors, Energies 10 (10) (2017) 1587, https://doi.org/10.3390/en10101587.
[137] A. Mihai, R. Zmeureanu, Bottom-up evidence-based calibration of the hvac [162] T.A. Reddy, I. Maor, C. Panjapornpon, Calibrating detailed building energy
air-side loop of a building energy model, J. Build. Performance Simul. 10 (1) simulation programs with measured data-part i: General methodology (rp-
(2017) 105–123, https://doi.org/10.1080/19401493.2016.1152302. 1051), Hvac&R Res. 13 (2) (2007) 221–241, https://doi.org/10.1080/
[138] F. Ascione, N. Bianco, T. Iovane, G. Mauro, D. Napolitano, A. Ruggiano, L. 10789669.2007.10390952.
Viscido, A real industrial building: Modeling, calibration and pareto [163] J.K. Kissock, J.S. Haberl, D.E. Claridge, Inverse modeling toolkit: numerical
optimization of energy retrofit, J. Build. Eng. 29. doi:10.1016/j. algorithms, ASHRAE Trans. 109 (2003) 425.
jobe.2020.101186.. [164] J.S. Haberl, A. Sreshthaputra, D.E. Claridge, J.K. Kissock, Inverse model toolkit:
[139] R. Yin, S. Kiliccote, M. Piette, Linking measurements and models in application and testing, ASHRAE Trans. 109 (2003) 435.
commercial buildings: A case study for model calibration and demand [165] EVO, Uncertainty assessment for ipmvp, international performance
response strategy evaluation, Energy Build. 124 (2016) 222–235, https://doi. measurement and verification protocol, Efficiency Valuation Organization..
org/10.1016/j.enbuild.2015.10.042. [166] C. Cerezo, J. Sokol, S. AlKhaled, C. Reinhart, A. Al-Mumin, A. Hajiah,
[140] N. Li, Z. Yang, B. Becerik-Gerber, C. Tang, N. Chen, Why is the reliability of Comparison of four building archetype characterization methods in urban
building simulation limited as a tool for evaluating energy conservation building energy modeling (ubem): A residential case study in kuwait city,
measures?, Appl Energy 159 (2015) 196–205, https://doi.org/10.1016/j. Energy Build. 154 (2017) 321–334, https://doi.org/10.1016/j.
apenergy.2015.09.001. enbuild.2017.08.029.
[141] C. Aparicio-Fernéndez, J.-L. Vivancos, P. Cosar-Jorda, R. Buswell, Energy [167] T. Gneiting, A.E. Raftery, Strictly proper scoring rules, prediction, and
modelling and calibration of building simulations: A case study of a domestic estimation, J. Am. Stat. Assoc. 102 (477) (2007) 359–378, https://doi.org/
building with natural ventilation, Energies 12 (17). doi:10.3390/ 10.1198/016214506000001437.
en12173360.. [168] A.K. Persily, Airtightness of commercial and institutional buildings: blowing
[142] D. Yan, W. O’Brien, T. Hong, X. Feng, H.B. Gunay, F. Tahmasebi, A. Mahdavi, holes in the myth of tight buildings..
Occupant behavior modeling for building performance simulation: Current [169] F.G.N. Li, A. Smith, P. Biddulph, I.G. Hamilton, R. Lowe, A. Mavrogianni, E.
state and future challenges, Energy Build. 107 (2015) 264–278, https://doi. Oikonomou, R. Raslan, S. Stamp, A. Stone, A. Summerfield, D. Veitch, V. Gori,
org/10.1016/j.enbuild.2015.08.032. T. Oreszczyn, Solid-wall u-values: heat flux measurements compared with
[143] Y.-S. Kim, M. Heidarinejad, M. Dahlhausen, J. Srebric, Building energy model standard assumptions, Build. Res. Inf. 43 (2) (2015) 238–252, https://doi.org/
calibration with schedules derived from electricity use data, Appl. Energy 190 10.1080/09613218.2014.967977.
(2017) 997–1007, https://doi.org/10.1016/j.apenergy.2016.12.167. [170] V. Gori, V. Marincioni, P. Biddulph, C.A. Elwell, Inferring the thermal
[144] A. Chong, G. Augenbroe, D. Yan, Occupancy data at different spatial resistance and effective thermal mass distribution of a wall from in situ
resolutions: Building energy performance and model calibration, Appl. measurements to characterise heat transfer at both the interior and exterior
Energy 286 (2021), https://doi.org/10.1016/j.apenergy.2021.116492 116492. surfaces, Energy Build. 135 (2017) 398–409, https://doi.org/10.1016/j.
[145] G. Augenbroe, The role of simulation in performance-based building, in: J.L. enbuild.2016.10.043.
Hensen, R. Lamberts (Eds.), Building performance simulation for design and [171] D.H. Yi, D.W. Kim, C.S. Park, Parameter identifiability in bayesian inference for
operation, Routledge, 2019, Ch. 10, p. 343.. building energy models, Energy Build. 198 (2019) 318–328, https://doi.org/
10.1016/j.enbuild.2019.06.012.
20
[172] T. Hong, Y. Chen, X. Luo, N. Luo, S.H. Lee, Ten questions on urban building [180] I. Gaetani, P.-J. Hoes, J.L. Hensen, Occupant behavior in building energy
energy modeling, Build. Environ. 168 (2020), https://doi.org/10.1016/j. simulation: Towards a fit-for-purpose modeling strategy, Energy Build. 121
buildenv.2019.106508 106508. (2016) 188–204, https://doi.org/10.1016/j.enbuild.2016.03.038.
[173] C. Miller, D. Thomas, J. Kämpf, A. Schlueter, Urban and building multiscale co- [181] I. Gaetani, P.-J. Hoes, J.L. Hensen, A stepwise approach for assessing the
simulation: case study implementations on two university campuses, J. Build. appropriate occupant behaviour modelling in building performance
Performance Simul. 11 (3) (2018) 309–321, https://doi.org/10.1080/ simulation, J. Build. Performance Simul. 13 (3) (2020) 362–377, https://doi.
19401493.2017.1354070. org/10.1080/19401493.2020.1734660.
[174] Y. Chen, T. Hong, X. Luo, B. Hooper, Development of city buildings dataset for [182] S. Zhan, A. Chong, Data requirements and performance evaluation of model
urban building energy modeling, Energy Build. 183 (2019) 252–265, https:// predictive control in buildings: A modeling perspective, Renew. Sustain.
doi.org/10.1016/j.enbuild.2018.11.008. Energy Rev. (2021), https://doi.org/10.1016/j.rser.2021.110835 110835.
[175] Y.Q. Ang, Z.M. Berzolla, C.F. Reinhart, From concept to application: A review [183] S. Pfenninger, J. DeCarolis, L. Hirth, S. Quoilin, I. Staffell, The importance of
of use cases in urban building energy modeling, Appl. Energy 279 (2020), open data and software: Is energy research lagging behind?, Energy Policy
https://doi.org/10.1016/j.apenergy.2020.115738 115738. 101 (2017) 211–215, https://doiorg/10.1016/j.enpol.2016.11.046.
[176] F. Noardo, L. Harrie, K.A. Ohori, F. Biljecki, C. Ellul, T. Krijnen, H. Eriksson, D. [184] M. McNutt, J. Unite Reproducibility (2014), https://doi.org/10.1126/science.
Guler, D. Hintz, M.A. Jadidi, M. Pla, S. Sanchez, V.-P. Soini, R. Stouffs, J. aaa1724.
Tekavec, J. Stoter, Tools for BIM-GIS Integration (IFC Georeferencing and [185] C. Boettiger, An introduction to docker for reproducible research, ACM
Conversions): Results from the GeoBIM Benchmark 2019, ISPRS Int. J. Geo-Inf. SIGOPS Oper. Syst. Rev. 49 (1) (2015) 71–79, https://doi.org/10.1145/
9 (9) (2020) 502, https://doi.org/10.3390/ijgi9090502. 2723872.2723882.
[177] F. Biljecki, J. Lim, J. Crawford, D. Moraru, H. Tauscher, A. Konde, K. Adouane, S. [186] S. Pfenninger, L. Hirth, I. Schlecht, E. Schmid, F. Wiese, T. Brown, C. Davis, M.
Lawrence, P. Janssen, R. Stouffs, Extending CityGML for IFC-sourced 3D city Gidden, H. Heinrichs, C. Heuberger, et al., Opening the black box of energy
models, Autom. Constr. 121 (2021), https://doi.org/10.1016/j. modelling: Strategies and lessons learned, Energy Strategy Rev. 19 (2018)
autcon.2020.103440 103440. 63–71, https://doi.org/10.1016/j.esr.2017.12.002.
[178] E.J. Rykiel Jr, Testing ecological models: the meaning of validation, Ecol. [187] B.L. Ball, N. Long, K. Fleming, C. Balbach, P. Lopez, An open source analysis
Model. 90 (3) (1996) 229–244, https://doi.org/10.1016/0304-3800(95) framework for large-scale building energy modeling, J. Build. Performance
00152-2. Simul. 13 (5) (2020) 487–500, https://doi.org/10.1080/
[179] M. Trčka, J.L. Hensen, Overview of hvac system simulation, Autom. Constr. 19 19401493.2020.1778788.
(2) (2010) 93–99, https://doi.org/10.1016/j.autcon.2009.11.019. [188] H. Jia, A. Chong, eplusr, A framework for integrating building energy
simulation and data-driven analytics, Energy Build. (2021), https://doi.org/
10.1016/j.enbuild.2021.110757 110757.
21

Calibrating Building Energy Simulation Models

Uploaded by

Copyright:

Available Formats

Calibrating Building Energy Simulation Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Calibrating Building Energy Simulation Models

Uploaded by

Copyright:

Available Formats

Energy & Buildings 253 (2021) 111533

Contents lists available at ScienceDirect

Energy & Buildings

Calibrating building energy simulation models: A review of the basics to

5.3. Most common observed inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1. Introduction Making scientific pronouncements about phenomena that can-

Simulating an experiment without knowledge of its results or 1.3. Related work

of an ASHRAE initiated research project (RP-1051), Reddy [12] clas- Table 1

4.1. Optimization-based calibration

Genetic algorithm (GA) [34–42], particle swarm optimization

Name Type Language Method(s) Ref.

Optimization has also been used for the calibration of building

Acronym Name Description

4.4. Multi-stage calibration

Supporting multi-stage calibration through a combination of

Fig. 8. Classification of model inputs when calibrating an energy model.

5. Data requirements assumed to be responsible for the discrepancy between simulation

Guideline/ Protocol Monthly Criteria (%) Hourly Criteria (%)

Table 7 Plus model’s prediction of dynamic response by field-testing a

You might also like