Data-Driven Inverse Optimization for Marginal Offer Price Recovery in Electricity Markets
DOI: https://doi.org/10.1145/3575813.3597356
e-Energy '23: The 14th ACM International Conference on Future Energy Systems, Orlando, FL, USA, June 2023
This paper presents a data-driven inverse optimization (IO) approach to recover the marginal offer prices of generators in a wholesale energy market. By leveraging underlying market-clearing processes, we establish a closed-form relationship between the unknown parameters and the publicly available market-clearing results. Based on this relationship, we formulate the data-driven IO problem as a computationally feasible single-level optimization problem. The solution of the data-driven model is based on the gradient descent method, which provides an error bound on the optimal solution and a sub-linear convergence rate. We also rigorously prove the existence and uniqueness of the global optimum to the proposed data-driven IO problem and analyze its robustness in two possible noisy settings. The effectiveness of the proposed method is demonstrated through simulations in both an illustrative IEEE 14-bus system and a realistic NYISO 1814-bus system.
ACM Reference Format:
Zhirui Liang and Yury Dvorkin. 2023. Data-Driven Inverse Optimization for Marginal Offer Price Recovery in Electricity Markets. In The 14th ACM International Conference on Future Energy Systems (e-Energy '23), June 20--23, 2023, Orlando, FL, USA. ACM, New York, NY, USA 13 Pages. https://doi.org/10.1145/3575813.3597356
1 INTRODUCTION
The US power sector has undergone a transformation from a monopoly to a liberalized system since the 1980s, in order to promote competition and increase efficiency [29]. This change has opened up opportunities for power producers to make more profits, but also increases the risk of not being selected for dispatch if their offer price surpasses that of their competitors. Knowing the market, including the prices offered by competitors, allows power producers to maximize their profits and minimize their risk exposure [18]. In parallel, open-access policies (e.g., FERC Order 888) have authorized the release of some market data, including selected transmission network parameters and market-clearing results [23], which allows for data mining and reverse-engineering of the unreleased market data. Building on previous work that has examined the recovery of rival marginal offer prices from the market-clearing results through inverse optimization [10, 15, 16, 41], this paper also explores the price recovery problem in electricity markets but with a focus on designing performance guarantees for convergence, robustness, and uniqueness of the global optimum in inverse optimization problems.
1.1 Literature Review
An inverse optimization (IO) problem seeks to recover unknown parameters in the objective function [27] or constraints [13] of a forward optimization (FO) problem. The FO could be a linear program [3], a conic program [25], a mixed-integer program [2], a linearly constrained separable convex program [51], or more generally, a variational inequality function [8, 48].
The classic IO model assumes that there exists a single set of parameters that make the given solutions optimal. However, in reality, noise in observations can prevent the existence of parameters that make all solutions strictly optimal. This noise can come from various sources, such as (i) measurement errors during the data collection process, (ii) mismatches between the forward model and the actual underlying decision-making process, and (iii) bounded rationality (i.e., when the decision maker settles for sub-optimal results due to cognitive or computational limitations) [5, 32]. Applying the classic IO in noisy settings can lead to infeasibility or uninformative solutions, such as a trivial zero cost vector [14]. Therefore, the goal of IO in noisy settings is to minimize the fitting errors between the model and data to make the given solutions “approximately” optimal [5, 14, 32, 48].
In deterministic settings, IO can precisely recover the parameters of the FO problem based on a single observation. However, in noisy settings, multiple observations are required to minimize the empirical loss of the IO problem. This type of IO, which can handle multiple observations, is also known as data-driven IO [30, 32]. Solving data-driven IO problems can become difficult with an increase in variables as more training data is incorporated. To tackle this challenge, researchers have proposed to use machine learning (ML) methods instead of conventional data-driven IO models [7, 9, 17, 47].
The objective of IO is similar to those of some ML techniques, especially neural networks, as they both aim to estimate the unknown parameters of a model based on available observations. However, the key difference is that the underlying model in IO corresponds to the actual FO problem and its parameters can be interpreted, while in ML, the unknown parameters (such as weights and biases in neural networks) often lack a straightforward explanation. To incorporate physical laws into ML models, researchers have developed physics-informed ML methods [26]. Essentially, if the FO problem with unknown parameters is transformed into a differentiable layer in the neural network, these unknown parameters can be learned through training (similar to other parameters in the network) [1, 4, 12, 19]. For instance, OptNet is a network architecture that integrates a quadratic optimization program as a layer within an end-to-end neural network [4]. In [9], OptNet is used to determine the unknown parameters in a demand response model. The estimation of unknown parameters can also be achieved through other ML methods. For example, [7, 17] reveal the parameters through an online learning process, while [47] presents a deep inverse optimization model that formulates the parameter recovery problem as a deep learning model. The fundamental assumption behind these ML models is that the FO model is differentiable with respect to the unknown parameters, which allows for the computation of gradients and the training based on (stochastic) gradient descent methods.
To compare IO and non-physics-informed ML, [24] evaluated their performance in imputing a convex objective function and found that IO is more suitable for problems with smaller training sets or highly complex objectives/constraints, while ML can handle larger training sets and is less sensitive to noisy training data. However, potential drawbacks of ML, such as its tendency to diverge in extreme situations, restrict its use in critical decision-making. Hence, to gain the trust of decision makers in using ML, formal performance guarantees need to be established for ML models [6].
IO has various applications across fields, including in power systems and power markets. To maximize the profit of market participants in the day-ahead bidding process, IO has been used to recover the piece-wise linear cost function of generators based on historical market-clearing results in [15, 16, 41]. Similarly, [40] also designs a IO problem based on market-clearing results, but the forward problem is an equilibrium model in an oligopolistic electricity market, aimed at determining if the market structure matches the observed outcomes. While [15, 16, 40, 41] assume that all parameters except for offer prices are known, [10] uses IO to recover market structures that are not disclosed to market participants, such as parameters of transmission lines. Alternatively, IO has been used to estimate the parameters of demand response in power systems in studies such as [28, 42, 43]. In [43], IO is recast as a bi-level program and solved heuristically. [42] deals with a non-convex IO model and develops a two-step heuristic approach to solve it. In [28], IO is transformed into a quadratically constrained quadratic program and solved through successive linear programming.
The aim of this paper is to develop a data-driven IO method for recovering the marginal offer prices of generators from historical market-clearing results. The US power systems and markets have the following unique properties that distinguish this IO problem from other application scenarios.
First, the structures and most parameters of power markets are transparent [35], and the market-clearing results, including schedules and prices, are promptly and accurately published [33, 34]. This allows for the observation of both primal variables (schedules) and some dual variables (prices) in the forward problem, giving our price recovery problem an advantage over other IO problems where only primal solutions can be observed.
Second, the market operation relies on the use of locational marginal prices (LMPs) which represent the incremental costs of providing the next megawatt of energy at different locations within the power system. However, the offer price of a generator is only reflected in the LMPs when it is the marginal generator at that location. Hence, a single observation is insufficient to recover the offer prices of all generators, requiring the use of data-driven IO to explore multiple scenarios where different generators are marginal.
Third, the offer price of a generator tend to be stable over time. In day-ahead markets, generators are paid according to LMPs instead of their offer prices. To secure market participation, they typically set the offer prices close to or slightly above their generation costs. However, the offer price of a generator may vary due to external factors like fuel prices and weather, causing the observations from the market being the optimal solutions to the same forward model but with different cost vectors.
Finally, the noise in market-clearing results is sparse and traceable. According to [46], only a small percentage (less than 0.1%) of real-time intervals have resulted in price correction since 2009, indicating that random errors in published price data are uncommon. Another source of noise is the mismatch between the forward model studied in this paper (i.e., the DC optimal power flow model) and the actual market-clearing model which takes into account more constraints and may involve ad-hoc manipulations by market operators. These noises may affect the accuracy of IO results.
In summary, the transparent market structure and abundant market data make the data-driven IO feasible. However, the noise in historical data and the fluctuating nature of offer prices pose challenges to the IO problem. We plan to tackle these challenges by solving the data-driven IO problem using the gradient descent method to effectively utilize the training data and ensure robustness in noisy settings. Our focus is also on establishing performance guarantees for the proposed approach.
1.2 Contributions
After reviewing the related work on IO in Section 2 and formulating the FO problem in Section 3, our contributions in the rest of the paper are:
- We propose a data-driven IO model for marginal offer price recovery in wholesale power markets. Our method differs from previous studies such as [16, 41] since it incorporates the unknown cost vector directly into the objective function, rather than relying solely on the primal or dual variables of the FO problem. This design results in a computationally manageable single-level optimization problem, compared to the bi-level programs in [5, 32].
- Unlike previous work that solves the data-driven IO using off-the-shelf solvers [10, 15, 16, 40, 41], we solve it using gradient descent and derive the corresponding error bound. Our method allows for proving the convergence to the global optimum with a sub-linear rate, whereas the method in [9] only guarantees convergence in the absence of inequality constraints in the FO problem. Furthermore, we rigorously prove the existence and uniqueness of the global optimum to the data-driven IO problem, which are generally missing in other studies.
- Different from [10, 15, 16, 41] which accomplish the offer price recovery using IO but neglect the impact of noise in observations, our approach considers potential sources of errors in market-clearing results and evaluates robustness of the data-driven IO in two noisy settings. The validity and robustness of the proposed data-driven IO method are demonstrated through simulations in the IEEE 14-bus system and the 1814-bus NYISO transmission network.
While the proposed data-driven IO model focuses on the price recovery problem in day-ahead markets, it can also be adapted to applications in real-time and ancillary service markets by appropriately changing constraints (while preserving linearity) and training data.
1.3 Notations
The following notations are used in the paper: In is an identity matrix of size n × n with ones on the main diagonal and zeros elsewhere, Jm, n is an all-ones matrix of size m × n, Om, n is an all-zeros matrix of size m × n, D(·) creates a diagonal matrix from a vector, and ‖ · ‖p represents the ℓp-norm.
2 OVERVIEW OF INVERSE OPTIMIZATION
In this section, we describe a general linear optimization model as the FO problem, formulate the corresponding IO models in deterministic and noisy settings, extend the single-observation IO to the data-driven IO, and compare the benefits and drawbacks of various IO formulations.
2.1 Linear FO Problem
Let $x \in \mathbb {R}^n$, $w \in \mathbb {R}^n$, $A \in \mathbb {R}^{m \times n}$, and $b \in \mathbb {R}^m$. We define our linear forward optimization (FO) problem as:
(1a)
(1b)
where $\xi \in \mathbb {R}^m$ is the vector of dual variables associated with Ax ≥ b. Let $\mathcal {X}(w)$ be the set of feasible solutions to FO(w) and $\mathcal {X}^{OPT}(w)$ be the set of optimal solutions to FO(w). We assume that $\mathcal {X}(w)$ and $\mathcal {X}^{OPT}(w)$ are non-empty.
2.2 Deterministic IO Formulation
Let $x^0 \in \mathbb {R}^n$ denote an observed solution to FO(w) in (1). In a deterministic setting, there exists a cost vector w such that x0 is an optimal solution to FO(w). Therefore, given A, b and x0, the goal is to find w such that $x^0 \in \mathcal {X}^{opt}(w)$. Note that there may exist multiple w that can achieve this goal [27]. For example, if $x^0 \in \mathcal {X}^{opt}(w)$, then $x^0 \in \mathcal {X}^{opt}(G(w))$ also holds, where G(·) is any convex increasing function. The uniqueness of w can be guaranteed by normalization and regulation, e.g., by requiring ‖w‖ = 1. Further, if FO(w) is a linear program, its inverse problem IO(x0) is also a linear program [3].
One way to evaluate the optimality of the estimated w is via the optimality conditions of FO(w), which can be satisfied exactly by x0 when the optimal w is found. Therefore the loss function of FO(w) is trivial and can be set to 0. The optimality primal-dual (PD) conditions of FO(w) can be formulated as [3]:
(2a)
(2b)
(2c)
(2d)
(2e)
(2f)
where (2b) ensures a unique optimal solution, (2c) embodies primal feasibility, (2d) and (2e) denote dual feasibility, and (2f) represents strong duality. Note that (2c) can be omitted since it does not depend on any primal or dual variables.
Alternatively, the optimality conditions of FO(w) can also be formulated as the Karush-Kuhn-Tucker (KKT) conditions [27]:
(3a)
(3b)
where (3b) is complementary slackness. Note that (2d) is referred to as the stationarity constraint in the KKT constraints. The IO-PD and IO-KKT formulations in (2) and (3) are equivalent, i.e.,
2.3 Generalized IO (GIO) in Noisy Settings
In noisy settings, it is assume that the observed solution x0 is feasible but not necessarily optimal in FO(w), i.e., $x^0 \in \mathcal {X}(w)$. Thus, the optimality conditions in (2) or (3) may not be exactly met, and the objective of IO becomes finding the vector w such that x0 is an “ϵ -optimal” solution, i.e., the choice of w enables x0 to approximately satisfy the optimality conditions of FO(w). This IO problem is referred to as generalized IO (GIO) in [14] and there are two GIO models reported in the literature.
Based on the IO-PD formulation, GIO can be formulated by adding a slack variable ϵpd to relax the constraints of primal feasibility (2c) and strong duality (2f) [8, 14, 40], as given by:
(4a)
(4b)
(4c)
It is proved in [14] that (4b) can be omitted since it is automatically satisfied by the optimal solution to GIO-PD(x0) without (4b).
Alternatively, GIO can also be formulated based on the IO-KKT formulation by relaxing the stationarity conditions in (2d) and complementary slackness in (3b) with slack variables ϵstat and ϵcomp [27, 43], respectively:
(5a)
(5b)
(5c)
If the observed solution is not feasible, i.e., $x^0 \notin \mathcal {X}(w)$, we can add slack variables ϵprim and ϵdual to the primal feasibility (2c) and the dual feasibility (2e), respectively, instead of forcing them to hold.
Both forms of GIO are typically straightforward to solve. For example, [14] derives the closed-form solutions to GIO-PD under different forms of loss functions, while [27] proves that GIO-KKT in (5) is a finite-dimensional convex optimization problem which can be solved efficiently with off-the-shelf solvers.
2.4 Nominal IO (NIO) in Noisy Settings
The study in [5] shows that GIO-PD and GIO-KKT are statistically inconsistent, meaning that their results may converge to incorrect values even with a large amount of training data. In response, [22] introduces the nominal IO (NIO), in which the objective is to minimize the difference between the observed solution x0 and the optimal solution of FO(w).
(6)
(7a)
(7b)
(7c)
Similarly, the NIO based on KKT constraints can be formulated. The distinction between the GIO and NIO formulations lies in the choice of loss functions that measure the discrepancy between predictions and observations. In the GIO formulation, the loss is measured by the slackness required to make the observed solution satisfy the approximate optimality conditions under the estimated parameters. In the NIO formulation, the loss is the distance between the observed solution and the optimal solution to FO under the estimated parameters. Moreover, the NIO formulation is statistical consistent [5] and more robust to noise [48], but it is a bi-level program and thus NP-hard [5], while the GIO formulations are relatively easier to solve.
2.5 Data-driven IO Formulation
The IO formulations presented in (2)–(7) are based on a single observation x0, but they can be extended to the data-driven form to accommodate multiple observations. For instance, given N observed solutions to FO $\lbrace x^0_i\rbrace _{i \in \mathcal {I}}$ and the corresponding parameters $\lbrace A_i,b_i\rbrace _{i \in \mathcal {I}}$, the NIO in (7) can be adapted as follows:
(8)
3 FORWARD PROBLEM FORMULATION
In this section, we present the standard US power market-clearing procedure, formulate the day-ahead market-clearing model, and derive the expression for locational marginal prices (LMPs) based on the Lagrangian and KKT conditions of the market-clearing model.
3.1 Energy Market and Its Settlement Process
In the US, the operation of power systems and power markets are managed by the designated independent system operators (ISOs). ISOs ensure the safe and reliable operation of power systems and regulate markets for various products such as capacity, energy, and ancillary services [44]. Fig. 1 illustrates the typical two-settlement process in energy markets, including the day-ahead market (DAM) and real-time market (RTM). The clearing of DAM is based on security-constrained unit commitment (SCUC) and security-constrained economic dispatch (SCED), which determines the least-cost generator schedules and hourly LMPs for the next day based on day-ahead generation offers, load bids, as well as the forecast of load and renewable energy source (RES) generation. The clearing of RTM is based on real-time commitment (RTC) and real-time dispatch (RTD). The commitment of units is predominantly determined in the DAM, making it a crucial aspect for power suppliers. However, the actual scheduling may diverge from the DAM schedule due to changes in operating conditions, influences of additional RT generation offers, and variations in actual load. This paper focuses on recovering the DA offer prices of generators from the DA market-clearing results.
3.2 Day-Ahead Market-Clearing Model
The actual SCUC model employed by ISOs is complex and involves multiple solution steps, as described in [37], and it is simplified to a representative SCUC model in some papers [21]. Following similar simplifications, we formulate the following UC model for a system with n generators, m nodes and l lines:
(9a)
(9b)
(9c)
(9d)
(9e)
(9f)
where (9a) minimizes system operation cost using a linear cost function, (9b) ensures the power balance in the system, (9d) and (9c) limit the output of generators to their technical bounds xmin and xmax , (9f) and (9e) limit the power flow on each line to the maximum transmission capacity fmax . Other notations used in (9) are explained in Table 1.
Model (9) is a mixed-integer linear program (MILP) due to the presence of binary variables u. Although non-convex, it can be solved by modern solvers such as CPLEX or Gurobi. To obtain the dual variables for marginal price calculation, we need to convert (9) into an equivalent convex linear program (LP) by replacing binary variables to real variable $u\in \mathbb {R}^n$ and setting u = u*, where u* is the optimal commitment results obtained by solving (9) with a MILP solver. This approach follows the results in [36], where the authors show that the optimal solution of the MILP is equal to the optimal solution of the LP when u = u* is enforced in LP. The resulting LP is the following DC optimal power flow (DCOPF) model:
(10a)
(10b)
(10c)
(10d)
(10e)
(10f)
Greek letters in parentheses in (10b)–(10f) denote Lagrangian multipliers (dual variables) of the respective constraints. Specifically, λ denotes the energy price at the reference node in the power system. The KKT optimality conditions of (10) are:
(11a)
(11b)
(11c)
(11d)
(11e)
(11f)
The Lagrangian function of (10) is:
(12)
3.3 LMP Derivation from the DCOPF Model
The use of LMPs is the cornerstone of market operation [46]. The marginal energy price, represented by the dual variable λ in (10b), is the LMP at all nodes when the transmission capacity is unlimited. It reflects the cost of serving the next increment of load in the most efficient way. Based on (11a), we can express λ as:
(13)
(14)
(15)
Notation | Meaning |
---|---|
$c \in \mathbb {R}^n$ | offer prices of generators |
$x \in \mathbb {R}^n$ | output power of generators |
$u \in \mathbb {R}^n$ | commitment status of generators |
$e \in \mathbb {R}^m$ | net load (demand minus renewable injections) |
$x^{\max } \in \mathbb {R}^n$ | upper bound of x |
$x^{\min } \in \mathbb {R}^n$ | lower bound of x |
$\Phi \in \mathbb {R}^{l \times m}$ | power transfer distribution factor (PTDF) matrix |
$S \in \mathbb {R}^{m \times n}$ | indicator of generators’ connection to nodes |
$f^{\max }\in \mathbb {R}^l$ | thermal capacity of transmission lines |
4 DATA-DRIVEN IO FORMULATION
Assume that multiple observations of schedules $\lbrace x^0_i\rbrace _{i \in \mathcal {I}}$ and prices $\lbrace \lambda ^0_i,\omega ^0_i\rbrace _{i \in \mathcal {I}}$ are given in the market-clearing results, where $\mathcal {I}$ denotes the set of training data indexed by i. Using the data-driven IO formulation in (8), the price recovery problem can be formulated based on primal variable x:
(16)
(17)
4.1 Data-Driven IO in Deterministic Settings
We start from the single-observation case in a deterministic setting, where the observed solutions λ0, ω0 and x0 are assumed to be the optimal solutions of (10) and therefore satisfy the KKT conditions in (11). From (11a), we know:
(18)
(19)
(20)
(21)
Then, we extend (21) to the multi-observation case in a deterministic setting. Recall that the offer price of a generator may vary at different times due to external factors. In this case, while the observed solutions still satisfy the KKT conditions in (11), calculating c0 based on different sets of λ0 and ω0 would yield different offer prices for the same generator. To address this, we propose the following data-driven IO formulation. Given multiple observed solutions $\lbrace \lambda ^0_i,\omega ^0_i\rbrace _{i \in \mathcal {I}}$ where $\mathcal {I}$ is the set of training data indexed by i, the objective of data-driven IO is:
(22)
Note that the data-driven IO in (22) is a single-level problem, so it has lower computational complexity compared to the bi-level data-driven IO in (16) and (17). This advantage comes from the availability of a closed-form relationship between the unknown parameter c0 and the observed solutions λ0 and ω0 as specified in (21). Although this relationship is insightful in the deterministic setting, it does not hold in some noisy settings, which will be discussed in section 4.4.
Since only the offer prices of free generators can be recovered from (21), $c_i^0$ is a sparse vector. The zero elements in vector $c_i^0$ do not provide any additional information, so they should be eliminated from the loss function in (22) as:
(23)
We can use the gradient descent (GD) method to iteratively update the values of $\hat{c}$. The update rule for $\hat{c}_k$ is:
(24)
Algorithm 1 outlines the procure of solving the proposed data-driven IO model based on GD. To enhance computational performance, stochastic gradient descent (SGD) can be used when dealing with large training data sets or when the price recovery task becomes an online learning problem.
4.2 Convergence of Data-Driven IO Based on Gradient Descent (GD)
In this subsection, we prove that the loss function in (23) converges to its optimal value with a sub-linear rate using the gradient descent (GD) method. The established convergence theorem for GD on convex, Lipschitz continuous functions can be found in Appendix A.1. We first discuss the convexity and Lipschitz continuity of $\left\Vert x \right\Vert _p^p$, which are essential to prove the convergence of the proposed data-driven IO formulation.
Lemma 4.1 (Convexity and Lipschitz continuity of $\left\Vert x \right\Vert _p^p$) Let $f(x) = \left\Vert x \right\Vert _p^p = \sum \nolimits _{i=1}^n |x_i|^p$. Assuming that p ≥ 1 and $|x_i| \in [0,\bar{x}]$ for ∀i ∈ [1, n], then f(x) is convex and Lipschitz continuous with respect to the ℓp-norm, and the Lipschitz constant is $p \bar{x}^{p-1} n^{p-1/p}$.
The proof of Lemma 4.1 can be found in Appendix B.1. Based on Theorem A.1 and Lemma 4.1, we can propose the convergence theorem for the data-driven IO in Section 4.1:
Theorem 4.2 (Convergence of data-driven IO) Consider the loss function l(c) in (23), where $c \in \mathcal {C} \subseteq \mathbb {R}^n$ and $\Vert \mathcal {C}\Vert \le B$. Assume that the ℓp-norm in l(c) satisfies p ≥ 1 and $|c_k| \in [0,\bar{c}]$ for ∀k ∈ [1, n]. Let c* be an optimal solution of $\min _{c \in \mathcal {C}} l(c)$ and let $\hat{c}$ be an output of applying the GD algorithm on l(c). For ∀ϵ > 0, to achieve $l(\hat{c}) - l(c^*) \le \epsilon$, it is sufficient to perform GD for T iterations with a learning rate of η, where $T \ge \frac{n^2 p^2 \bar{c}^{2p}}{\epsilon ^2}$ and $\eta = (p \bar{c}^{p-2} n^{p-2/p} \sqrt {T})^{-1}$.
Since $|c_k| \le \bar{c}$ for ∀k, we have $B = (\sum \nolimits _{i=1}^n \bar{c}^p) ^{1/p} = \bar{c} n^{1/p}$. Based on Lemma 4.1, we can prove l(c) is convex and ρ -Lipschitz continuous and the Lipschitz constant is $\rho = p \bar{c}^{p-1} n^{p-1/p}$. Therefore, $B^2 \rho ^2 = n^2 p^2 \bar{c}^{2p}$. According to Theorem A.1, we know that:
(25)
Specifically, if ℓ1-norm is used in l(c), i.e., when p = 1, the error bound in (25) becomes $\epsilon \le n \bar{c} /\sqrt {T}$. If the loss function is strongly convex, for example when ℓ2-norm is used, the convergence rate could be greater [39]. To satisfy additional limits on the value of $\hat{c}_k$, inequality constraints can be introduced into the unconstrained optimization model in (23). The resulting problem can then be solved using projected gradient descent. However, for the convergence theorem to hold, it is crucial that the feasible set of $\hat{c}_k$, as defined by the inequality constraints, be both closed and convex [49].
4.3 Existence and Uniqueness of Global Optimal Solution to Data-Driven IO
This subsection focuses on demonstrating the existence and uniqueness of the global optimal solution to the data-driven IO formulation in (23). The established theorems for the existence and uniqueness of optimal solutions to convex problems can be found in Appendix A.2. First we prove the strict convexity of $\left\Vert x \right\Vert _p^p$, which is essential to prove the uniqueness of global optimum to the proposed data-driven IO formulation.
Lemma 4.3 (Strict convexity of $\left\Vert x \right\Vert _p^p$) Following the notations in Lemma 4.1, if p > 1 and xi ≥ 0 for ∀i ∈ [1, n], then $f(x) = \left\Vert x \right\Vert _p^p$ is strictly convex.
The proof of Lemma 4.3 can be found in Appendix B.2. Based on Theorems A.2– A.3 and Lemma 4.3, we can propose the following theorem for the data-driven IO formulation in (23):
Theorem 4.4 (Existence and uniqueness of global optimum to data-driven IO) Following the notations in Theorem 4.2, assuming the ℓp-norm in l(c) satisfies p > 1 and $c_k \in [\underline{c},\overline{c}]$ for ∀k ∈ [1, n] where c > 0, then $\min _{c \in \mathcal {C}} l(c)$ has a unique global optimal solution c*, and $\hat{c} \rightarrow c^{*}$ when the iteration T → ∞.
According to Theorem A.3 and Lemma 4.3, we know that l(c) is strictly convex and thus has at most one optimal solution in the convex set $\mathcal {C}$. Meanwhile, since the feasible region of c is closed and bounded and l(c) is continuous on $\mathcal {C}$ (Lipschitz continuity is stronger than uniform continuity), we know that $\min _{c \in \mathcal {C}} l(c)$ has at least one global optimum according to Theorem A.2. In summary, $\min _{c \in \mathcal {C}} l(c)$ has a unique global optimal solution c*. Given the error bound in Theorem 4.2, we know that $l(\hat{c}) - l(c^{*}) \rightarrow 0$ when the iteration T → ∞. Considering the uniqueness of c*, we can prove that $\hat{c} \rightarrow c^{*}$ when T → ∞.□
Inspecting Theorems 4.2 and 4.4, we notice two differences on the assumptions of l(c): Theorem 4.2 assumes p ≥ 1 and $0\le |c_k|\le \bar{c}$ for ∀k, while Theorem 4.4 assumes p > 1 and $\underline{c} \le c_k\le \overline{c}$ for ∀k. Variable ck represents the offer price of generator k, which is greater than its generation cost and capped due to market power regulations. Hence, the assumption of $\underline{c} \le c_k\le \overline{c}$ is realistic. We remark that in some exceptional cases, under heavy production credits, renewable generators can submit negative offer prices; however, this practice is expected to gradually phase out.
Theorems 4.2 and 4.4 only apply to generators that have been marginal at least once in the training data set. The offer prices of non-marginal generators are eliminated from the loss function l(c) in (23), so the estimations of these prices remain unchanged from their initial values, rather than converging to an optimal value.
4.4 Robustness of Data-Driven IO in Noisy Settings
4.4.1 Noisy Setting I: Random Errors in LMPs. The market-clearing results published by ISOs may not reflect the exact optimal solutions due to incorrect inputs or calculation errors. However, these results are generally believed to be accurate and are accepted by market participants due to the price validation procedure conducted after the market is closed [46]. According to (21), the data-driven IO formulation will not be affected by errors in the observations of primal variable x due to its independence from x0. However, errors in LMPs can compromise the expression of ω in (15) and therefore the expression of c0 in (21). This kind of noise in data-driven IO is similar to the estimation error in statistical ML which is implied by the fact that the ML model is based on a finite training data set that only partially reflects the true distribution of data. The GIO and NIO formulations in Section 2 can handle random errors in training data. However, we choose to retain the original form in (23) instead of adapting it to a GIO or NIO formulation due to the sparsity of errors. According to the parameter update rule of GD in (24), the impact of a single training data point on the final result is small when the training set is large. Meanwhile, the ℓ1-norm is robust to outliers. In summary, we can reduce the impact of random errors by using ℓ1-norm in the loss function or increasing the amount of training data. We will explore these methods further in Section 5.1.
4.4.2 Noisy Setting II: Mismatch Between DCOPF and Actual Market-Clearing Model. As mentioned in Section 3.2, the actual day-ahead market-clearing is based on the SCUC model instead of the DCOPF model employed in this paper. This mismatch is the second source of noise in the price recovery problem and is comparable to the approximation error in statistical ML (i.e., the difference between the best model within the chosen model class and the optimal model within all model classes). To some extent, the proposed data-driven IO formulation is robust to this noise, as some additional constraints in SCUC do not compromise the validity of (21) even though it is derived based on DCOPF. For instance, adding the flexibility reserve requirements to (10) results in the following DCOPF model:
(26a)
(26b)
(26c)
(26d)
(26e)
where r+ and r− are the upward and downward flexible capacities provided by generators, and R+ and R− are the upward and downward reserve requirements. In this case, the KKT condition in (11a) and the LMP definition in (14) still hold, so the expression of λ in (13) and the expression of ω in (15) remain unchanged. This means that c0 can still be calculated based on (21) and the objective function in (23) can still be used to recover the cost vector c. The only change would be the number of free generators, which is determined by the quantity of generators operating between their minimum and maximum output limits. The power limits in (26d) and (26e) are more restrictive compared to (10c) and (10d), resulting in fewer free generators and fewer recoverable offer prices from a single observation. However, these additional constraints also create more diverse scenarios where different generators are marginal, thus potentially increasing the total number of recoverable offer prices from historical data. Note that some additional constraints may render (21) invalid, such as the ramping limits of generators:
(27)
(28)
The model mismatch problem is a common challenge for all IO formulations and cannot be fully resolved without using a more accurate FO model. We will study the impact of this mismatch on IO results using numerical experiments in Section 5.2 and discuss other possible solutions in Section 6.
5 CASE STUDY
In this section we demonstrate the effectiveness of the proposed data-driven IO based on the IEEE 14-bus system and the NYISO 1814-bus system. Simulations were conducted using Python v3.8 and the Gurobi solver on a standard PC with an Intel i9 processor and 16 GB of RAM. In the IEEE 14-bus system, the data-driven IO model was solved in one minute. With regard to the NYISO system, the solution of the proposed model was obtained in all instances in less than ten minutes.
5.1 Illustrative Example: IEEE 14-bus system
We first evaluate the performance of the data-driven IO in the IEEE 14-bus system (details in Appendix C.1). We assume that each generator sets a piece-wise linear price baseline with five equally divided blocks based on the quadratic power generation cost, as shown in Fig. 2. The actual offer prices fluctuate around the baseline with deviations following a normal distribution N(μ, σ2), where μ = 0 and σ = 2 (unit: $/MWh) are assumed in this case.
5.1.1 Performance in Deterministic Settings. To recover as many offer prices as possible, we create 200 training data points and validate that each of the five generators is marginal at least once at each block. The initial values of the five blocks are set to 10, 20, 30, 40, and 50, respectively. Fig. 3 shows the variations of four estimated offer prices at each iteration and compares the convergence rates based on the ℓ1 and ℓ2-norms. In all four cases, the estimated prices converge to the true values, with faster convergence observed when using the ℓ2-norm due to its strong convexity.
5.1.2 Performance with Random Errors in LMPs. To evaluate the performance of data-driven IO in the first noisy setting, we add random errors following a normal distribution N(μ, σ2) to the original LMPs. Four noise settings are considered with varying error sizes and frequencies. In the small-error case, we assume μ = 50 and σ = 5, while in the large-error case, we assume μ = 100 and σ = 10 (unit: $/MWh). According to [46], the actual frequency of random errors in LMPs is less than 0.1%, but we consider more challenging cases with 1% and 5% error frequencies. We use data-driven IO to recover the offer price of G3 whose true value is 25.445 $/MWh, and the estimated offer prices and the corresponding errors are shown in Table 2. The data-driven IO shows robustness to random errors with ℓ1-norm and acceptable performance with ℓ2-norm when noise is small and sparse.
Noisy settings | Using ℓ1-norm | Using ℓ2-norm |
---|---|---|
1% small errors | $\hat{c} =25.56$, $\epsilon =0.45\%$ | $\hat{c} =25.69$, $\epsilon =0.96\%$ |
1% large errors | $\hat{c} =25.56$, $\epsilon =0.45\%$ | $\hat{c} =26.19$, $\epsilon =2.93\%$ |
5% small errors | $\hat{c} =25.56$, $\epsilon =0.45\%$ | $\hat{c} =26.67$, $\epsilon =4.81\%$ |
5% large errors | $\hat{c} =25.56$, $\epsilon =0.45\%$ | $\hat{c} =29.17$, $\epsilon =14.64\%$ |
5.2 Numerical Experiments on NYISO System
In this numerical experiment, we study the NYISO system with 1814 buses, 2207 lines, 362 generators and 33 wind farms (details in Appendix C.2). The structure and parameters of this NYISO system are mined and estimated from publicly available data sources [35]. The NYISO publishes daily LMPs at [34] and other market-clearing results every three months at [33]. We use this information to generate hourly market-clearing results over a three-month period (February, April, and August 2018), yielding a total of 2136 training data points. Note that in this case we assume that the baseline price of each generator is equally divided into ten price blocks, and the upper and lower bounds of each block are known. However, if the block setting rules are unpublished, we need to estimate the bounds of each block first based on historical data. Subsequently, we can recover the offer price of each block based on the estimated bounds of the blocks.
5.2.1 Performance in Deterministic Settings. As previously mentioned in Section 4.1, a training data point is valid for the price recovery of a generator if the generator is committed and marginal in that scenario. Fig. 4 summarizes the number of valid training data points for all the generators in the system. As shown in Fig. 4, most of the generators only have valid training data at a few steps of their offer prices, thus only a portion of the prices can be recovered. The results indicate that 44.03% of all offer prices can be recovered from the valid training data, while 81.21% of these recovered prices are based on less than 5 training data points. Meanwhile, for 85.35% of the generators, at least one block of price can be recovered. Note that there is an theoretical upper limit on the proportion of recoverable prices because an offer price can only be recovered from the LMPs if it affected the values of LMPs. Therefore, other approaches, such as NIO, also face the same limitation with regards to the recovery rate.
In total, 53 generators (out of 362) are never found to be marginal in the 2163 data points we analyzed, so their offer prices cannot be estimated from the training data. This result does not indicate a failure of the data-driven IO for large systems, as these non-marginal generators do not typically compete with others. For example, due to a relatively high generation cost, generator No.14 is never committed in this three months. Furthermore, incorporating more training data can decrease the number of unrecoverable generators. As more market-clearing data posted by market operators, the portion of recovered offer prices will increase, which is important for dealing with noisy data.
5.2.2 Performance with Model Mismatch. To assess the performance of the data-driven IO approach in the presence of model mismatches, we replicate the SCUC model used by NYISO in its day-ahead scheduling process, as outlined in [37]. This SCUC model, as detailed in [31], includes additional constraints such as ramping limits for generators, which may challenge the accuracy of recovered prices when those constraints are binding. Fig. 5 shows the binding frequency of two sets of constraints, including the power limits in (26d) and (26e) and the ramping limits in (27) and (28). It also compares the results across three months, namely February (with low demand), April (with medium demand), and August (with high demand). As depicted in Fig. 5 (b), the occurrence of only the ramping limits being binding is rare. As a result, the frequency of both sets of constraints being binding is approximately the same as the frequency of only the power limits being binding.
Fig. 5 conveys a crucial message that the ramping limits of free generators (whose power limits are not binding) are typically not binding either. As explained in Section 5.2.2, the ramping limits will only affect the recovered prices when they are binding. Hence, the impact of ramping limits is mostly eliminated from the valid training data set, which only includes the data of free generators.
With this model mismatch, we are able to recover 36.28% of all offer prices with an average relative error of 3.47%. Fig. 6 shows the errors in recovered prices at the seventh block where 171 generators are marginal. The errors are compared in the deterministic setting and two noisy settings, where noisy setting I includes additional Gaussian noise to 1% LMPs data with a mean value of 50 $/MWh and a standard deviation of 5 $/MWh, and noise setting II involves the model mismatch discussed above. The deterministic setting features small and random errors with an average close to zero, while in noisy setting I, sparse noise in the training data results in significant deviations of the recovered prices of some generators from their true values. Noisy setting II features higher errors compared to the other two cases, however, they are still within an acceptable level compared to the offer prices. Note that although the errors in Fig. 6 (c) tend to be mostly positive, it is not a universal conclusion since the values of errors in this noisy setting depend on the values of dual multipliers of binding constraints in the SCUC model, which can change over time.
6 CONCLUSION AND FUTURE WORK
This paper presents a data-driven IO approach for recovering marginal offer prices of generators from day-ahead market-clearing results. We formulate the data-driven IO problem as a single-level optimization model, distinct from the bi-level models that are commonly used in the literature. This data-driven IO problem can be efficiently solved using the gradient descent method. Furthermore, We prove that the recovered offer prices converge to a unique global optimum with a sub-linear rate. Numerical experiments on the IEEE 14-bus system and NYISO 1814-bus system validate the efficacy of the proposed approach in both deterministic and noisy settings. The 14-bus test case shows that the use of ℓ2-norm in the loss function results in faster convergence, while the ℓ1-norm offers robustness against sparse noise in the training data. In the 1814-bus test case, the data-driven IO recovers approximately 45% of all offer prices using 2163 training data points. The average relative error is 0.88% in deterministic setting and 3.47% considering the mismatch between the DCOPF model and the actual market-clearing process.
A notable limitation of our data-driven IO model is that it assumes complete knowledge of all system parameters except offer prices of generators, which may not be achievable in some real-world power systems. Nevertheless, the presence of wholesale markets and stringent regulatory requirements enhance the likelihood of market transparency. Furthermore, while ISO manuals (such as [37]) provide insights into the market-clearing principles, they do not guarantee complete visibility of the constraints used in the market-clearing process, including out-of-optimization interventions conducted by system operators. As part of our ongoing research, we plan to tackle this issue by implementing a machine learning approach, such as a physics-informed neural network, that can incorporate known constraints while also learning to identify previously unknown constraints from training data. This will significantly reduce the inaccuracies in the recovered prices resulting from model mismatches.
The proposed method for price recovery can lead to increased profits for certain market participants through strategic bidding, but it must not undermine the fairness of power markets. In the event that the price recovery method becomes widespread, the market operator must implement effective measures to prevent market power abuse. Therefore, our future study will concentrate on determining the most efficient policy for market power mitigation and investigating the optimal bidding strategies for market participants. In summary, the game between market participants and market operators will become more intricate and engaging due to the recovery of offer prices.
Furthermore, as depicted in Fig. 1, the energy market comprises a day-ahead market (DAM) and a real-time market (RTM). This paper only studies the price recovery problem in DAM, while the problem is more challenging in RTM due to the spontaneous actions of market participants and the ad-hoc manipulations of market operators. Uncovering the behavior patterns of market participants in RTM holds potential for both market participants seeking to increase their profits and market operators responsible for preventing market power abuse. In the next step, we intend to expand our data-driven IO approach to RTM and ancillary service markets to leverage all available market data.
ACKNOWLEDGMENTS
The authors express their gratitude to Prof. Daniel Bienstock and Dr. Robert Mieth for their assistance in developing the digital twin of the NYISO system, data collection and visualization.
REFERENCES
- Akshay Agrawal, Brandon Amos, Shane Barratt, Stephen Boyd, Steven Diamond, and J Zico Kolter. 2019. Differentiable convex optimization layers. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). NIPS, Vancouver, Canada, 1–13.
- Sara Ahmadian, Umang Bhaskar, Laura Sanità, and Chaitanya Swamy. 2018. Algorithms for inverse optimization problems. In 26th Annual European Symposium on Algorithms (ESA 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Helsinki, Finland, 1–14.
- Ravindra K. Ahuja and James B. Orlin. 2001. Inverse optimization. Operations Research 49, 5 (2001), 771–783.
- Brandon Amos and J. Zico Kolter. 2017. Optnet: Differentiable optimization as a layer in neural networks. In 34th International Conference on Machine Learning (ICML 2017). PMLR, Sydney, Australia, 136–145.
- Anil Aswani, Zuo-Jun Shen, and Auyon Siddiq. 2018. Inverse optimization with noisy data. Operations Research 66, 3 (2018), 870–892.
- Shahar Avin, Haydn Belfield, Miles Brundage, Gretchen Krueger, Jasmine Wang, Adrian Weller, Markus Anderljung, Igor Krawczuk, David Krueger, Jonathan Lebensold, et al. 2021. Filling gaps in trustworthy development of AI. Science 374, 6573 (2021), 1327–1329.
- Andreas Bärmann, Sebastian Pokutta, and Oskar Schneider. 2017. Emulating the expert: Inverse optimization through online learning. In 34th International Conference on Machine Learning (ICML 2017). PMLR, Sydney, Australia, 400–410.
- Dimitris Bertsimas, Vishal Gupta, and Ioannis Ch Paschalidis. 2015. Data-driven estimation in equilibrium using inverse optimization. Mathematical Programming 153, 2 (2015), 595–633.
- Yuexin Bian, Ningkun Zheng, Yang Zheng, Bolun Xu, and Yuanyuan Shi. 2022. Demand response model identification and behavior forecast with OptNet: a gradient-based approach. In 30th ACM International Conference on Future Energy Systems (ACM e-Energy 2022). ACM, virtual, USA, 418–429.
- John R. Birge, Ali Hortaçsu, and J. Michael Pavlin. 2017. Inverse optimization for the recovery of market structure from market outcomes: An application to the MISO electricity market. Operations Research 65, 4 (2017), 837–855.
- Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press, New York, USA.
- Andrew Butler and Roy H Kwon. 2022. Efficient differentiable quadratic programming layers: an ADMM approach. Computational Optimization and Applications (2022), 1–28.
- Timothy CY Chan and Neal Kaw. 2020. Inverse optimization for the recovery of constraint parameters. European Journal of Operational Research 282, 2 (2020), 415–427.
- Timothy CY Chan, Taewoo Lee, and Daria Terekhov. 2019. Inverse optimization: Closed-form solutions, geometry, and goodness of fit. Management Science 65, 3 (2019), 1115–1135.
- Ruidi Chen, Ioannis Ch. Paschalidis, and Michael C. Caramanis. 2017. Strategic equilibrium bidding for electricity suppliers in a day-ahead market using inverse optimization. In 56th IEEE Conference on Decision and Control (CDC 2017). IEEE, Melbourne, Australia, 220–225.
- Ruidi Chen, Ioannis Ch. Paschalidis, Michael C. Caramanis, and Panagiotis Andrianesis. 2019. Learning from past bids to participate strategically in day-ahead electricity markets. IEEE Transactions on Smart Grid 10, 5 (2019), 5794–5806.
- Chaosheng Dong, Yiran Chen, and Bo Zeng. 2018. Generalized inverse optimization through online learning. In 32nd Conference on Neural Information Processing Systems (NeurIPS 2018). NIPS, Montréal, Canada, 1–10.
- B. Espen Eckbo. 2009. Bidding strategies and takeover premiums: A review. Journal of Corporate Finance 15, 1 (2009), 149–178.
- Aaron Ferber, Bryan Wilder, Bistra Dilkina, and Milind Tambe. 2020. Mipaal: Mixed integer program as a layer. In 34th AAAI Conference on Artificial Intelligence (AAAI 2020). Springer, Thessaloniki, Greece, 1504–1511.
- Illinois Center for a Smarter Electric Grid (ICSEG). 2022. IEEE 14-Bus System. University of Illinois at Urbana-Champaign. Retrieved Dec. 29, 2022 from https://icseg.iti.illinois.edu/ieee-14-bus-system/
- Yong Fu and Mohammad Shahidehpour. 2007. Fast SCUC for large-scale power systems. IEEE Transactions on power systems 22, 4 (2007), 2144–2151.
- Kimia Ghobadi, Taewoo Lee, Houra Mahmoudzadeh, and Daria Terekhov. 2018. Robust inverse optimization. Operations Research Letters 46, 3 (2018), 339–344.
- Udi Helman. 2006. Market power monitoring and mitigation in the US wholesale power markets. Energy 31, 6-7 (2006), 877–904.
- Elaheh H. Iraj and Daria Terekhov. 2021. Comparing inverse optimization and machine learning methods for imputing a convex objective function. arXiv preprint arXiv:2102.10742 (2021), 1–17.
- Garud Iyengar and Wanmo Kang. 2005. Inverse conic programming with applications. Operations Research Letters 33, 3 (2005), 319–330.
- George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. 2021. Physics-informed machine learning. Nature Reviews Physics 3, 6 (2021), 422–440.
- Arezou Keshavarz, Yang Wang, and Stephen Boyd. 2011. Imputing a convex objective function. In 2011 IEEE international symposium on intelligent control (ISIC 2011). IEEE, Denver, USA, 613–619.
- András Kovács. 2021. Inverse optimization approach to the identification of electricity consumer models. Central European Journal of Operations Research 29, 2 (2021), 521–537.
- Gong Li, Jing Shi, and Xiuli Qu. 2011. Modeling methods for GenCo bidding strategy optimization in the liberalized electricity spot market–A state-of-the-art review. Energy 36, 8 (2011), 4686–4700.
- Houra Mahmoudzadeh and Kimia Ghobadi. 2022. Learning from Good and Bad Decisions: A Data-driven Inverse Optimization Approach. arXiv preprint arXiv:2207.02894 (2022), 1–36.
- Robert Mieth, Yury Dvorkin, and Miguel A Ortega-Vazquez. 2022. Risk-Aware Dimensioning and Procurement of Contingency Reserve. IEEE Transactions on Power Systems (2022), 1–13.
- Peyman Mohajerin Esfahani, Soroosh Shafieezadeh-Abadeh, Grani A Hanasusanto, and Daniel Kuhn. 2018. Data-driven inverse optimization with imperfect information. Mathematical Programming 167, 1 (2018), 191–234.
- NYISO. 2022. NYISO market clearing results. NYISO. Retrieved Dec. 27, 2022 from http://mis.nyiso.com/public/P-27list.htm
- NYISO. 2022. NYISO price data. NYISO. Retrieved Dec. 27, 2022 from https://www.nyiso.com/energy-market-operational-data
- NYISO. 2022. Reliability Needs Assessment Report. NYISO. Retrieved Dec. 27, 2022 from https://www.nyiso.com/documents/20142/2248793/2022-RNA-Report.pdf
- Richard P. O'Neill, Paul M. Sotkiewicz, Benjamin F. Hobbs, Michael H. Rothkopf, and William R. Stewart Jr. 2005. Efficient market-clearing prices in markets with nonconvexities. European journal of operational research 164, 1 (2005), 269–285.
- NYISO Energy Market Operations. 2022. Day-Ahead Scheduling Manual. NYISO. Retrieved Jan. 2, 2023 from https://www.nyiso.com/documents/20142/2923301/dayahd_schd_mnl.pdf/0024bc71-4dd9-fa80-a816-f9f3e26ea53a
- Remigijus Paulavičius and Julius Žilinskas. 2006. Analysis of different norms and corresponding Lipschitz constants for global optimization. Technological and Economic Development of Economy 12, 4 (2006), 301–306.
- Alexander Rakhlin, Ohad Shamir, and Karthik Sridharan. 2011. Making gradient descent optimal for strongly convex stochastic optimization. arXiv preprint arXiv:1109.5647 (2011), 1–21.
- Simon Risanger, Stein-Erik Fleten, and Steven A. Gabriel. 2020. Inverse equilibrium analysis of oligopolistic electricity markets. IEEE Transactions on Power Systems 35, 6 (2020), 4159–4166.
- Carlos Ruiz, Antonio J. Conejo, and Dimitris J. Bertsimas. 2013. Revealing rival marginal offer prices via inverse optimization. IEEE Transactions on Power Systems 28, 3 (2013), 3056–3064.
- Javier Saez-Gallego and Juan M. Morales. 2017. Short-term forecasting of price-responsive loads using inverse optimization. IEEE Transactions on Smart Grid 9, 5 (2017), 4805–4814.
- Javier Saez-Gallego, Juan M. Morales, Marco Zugno, and Henrik Madsen. 2016. A data-driven bidding model for a cluster of price-responsive consumers of electricity. IEEE Transactions on Power Systems 31, 6 (2016), 5001–5011.
- NYISO Stakeholder Services. 2021. Guide 01 Market Participants User's Guide. NYISO. Retrieved Jan. 2, 2023 from https://www.nyiso.com/documents/20142/3625950/mpug.pdf
- Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding machine learning: From theory to algorithms. Cambridge university press, New York, USA.
- Mathangi Srinivasan Kumar. 2022. Locational Based Marginal Pricing. NYISO. Retrieved Jan. 2, 2023 from https://www.nyiso.com/documents/20142/3037451/3-LMBP.pdf/f7682e03-e921-eaab-09bf-690524b5ade6
- Yingcong Tan, Andrew Delong, and Daria Terekhov. 2019. Deep inverse optimization. In 16th International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR 2019). Springer, Thessaloniki, Greece, 540–556.
- Jérôme Thai and Alexandre M. Bayen. 2018. Imputing a variational inequality function or a convex objective function: A robust approach. J. Math. Anal. Appl. 457, 2 (2018), 1675–1695.
- Trung Vu and Raviv Raich. 2022. On asymptotic linear convergence of projected gradient descent for constrained least squares. IEEE Transactions on Signal Processing 70 (2022), 4061–4076.
- xxx. 2023. Data supplement: NYISO 1814-bus system. Retrieved Feb. 06, 2023 from https://github.com/xxx/data_supplement_NYISO
- Jianzhong Zhang and Chengxian Xu. 2010. Inverse optimization for linearly constrained convex separable programming problems. European Journal of Operational Research 200, 3 (2010), 671–679.
A ESTABLISHED THEOREMS
A.1 Theorem for the Convergence of GD
Theorem A.1 (Convergence of GD for convex and Lipschitz continuous functions) Let f be a convex, ρ -Lipschitz function and let $w^* \in {\rm {arg}} \min _{w \in \mathcal {W}} f(w)$, where $\Vert \mathcal {W}\Vert \le B$. If the GD algorithm is applied on f for T iterations with a learning rate of $\eta = \sqrt {\frac{B^2}{\rho ^2 T}}$, then the output vector $\hat{w}$ satisfies:
(29)
Since this $\mathcal {O}(1/\sqrt T)$ convergence rate of GD is well-established in the literature, e.g., in [45], we omit the proof of Theorem A.1.
A.2 Theorems for Existence and Uniqueness of Optimal Solutions
Theorem A.2 (Existence of global optimum) If the objective function is continuous and the feasible region is closed and bounded, then there exists a global optimum.
Theorem A.3 (Uniqueness of optimal solutions to strictly convex functions) If the objective function is strictly convex and the feasible region is convex, then there exists at most one optimal solution.
We omit the proof of Theorems A.2 and A.3 in this paper since they are rigorously proved in [11].
B PROOFS
B.1 Proof of Lemma 4.2
We first clarify the definition of Lipschitz continuity and prove three lemmas relevant to the proof of Lemma 4.2.
Definition B.1 (ρ -Lipschitz functions) Let $\mathcal {W} \subseteq \mathbb {R}^n$ be a convex set. A function $f:\mathbb {R}^n \rightarrow \mathbb {R}$ is ρ -Lipschitz over $\mathcal {W}$ if there exists a constant ρ > 0 that for $\forall w_1, w_2 \in \mathcal {W}$, we have |f(w1) − f(w2)| ≤ ρ‖w1 − w2‖p.
Note that ‖ · ‖p in the definition of Lipschitz continuity is usually ℓ2-norm, but other norms are also applicable [38]. The statements of "f being ρ -Lipschitz" and "f being Lipschitz continuous with Lipschitz constant ρ " are equivalent.
Lemma B.2 (Lipschitz continuity of norms) Every norm on $\mathbb {R}^n$ is 1-Lipschitz with respect to the same kind of norm.
Let $f(x) = \left\Vert x \right\Vert _p = (\sum \nolimits _{i=1}^n |x_i|^p)^{1/p}$ be the ℓp-norm of x. Based on the triangle inequality of norms (i.e., $ \Vert x+y\Vert _p \le \Vert x \Vert _p + \Vert y \Vert _p,\ \forall x,y \in \mathbb {R}^n$), we know that ‖x‖p = ‖x − y + y‖p ≤ ‖x − y‖p + ‖y‖p. Therefore, ‖x‖p − ‖y‖p ≤ ‖x − y‖p. Similarly we have ‖y‖p = ‖y − x + x‖p ≤ ‖y − x‖p + ‖x‖p = ‖x − y‖p + ‖x‖p. Therefore, ‖y‖p − ‖x‖p ≤ ‖x − y‖p. In summary, we have |‖x‖p − ‖y‖p| ≤ ‖x − y‖p, i.e., f is 1-Lipschitz with respect to the ℓp-norm.□
Lemma B.3 (Lipschitz continuity of differentiable functions) An everywhere differentiable function $f: \mathbb {R} \rightarrow \mathbb {R}$ is Lipschitz continuous if it has bounded first-order derivative, and the corresponding Lipschitz constant is $\sup |f^{\prime }(x)|$.
Let f be a continuous function on the closed interval [a, b] and differentiable on the open interval (a, b). According to the mean value theorem, there exist a ξ ∈ (x, y) such that f(x) − f(y) = f′(ξ)(x − y). Taking the absolute values yields |f(x) − f(y)| ≤ ρ|x − y|, where ρ is the supremum of |f′(x)| over (x, y).□
Lemma B.4 (Lipschitz continuity of compositions) The composition of a ρ1-Lipschitz function and a ρ2-Lipschitz function is a ρ1ρ2-Lipschitz function.
Assume $g: \mathbb {R}^n \rightarrow \mathbb {R}$ is ρ1-Lipschitz continuous and $h: \mathbb {R} \rightarrow \mathbb {R}$ is ρ2-Lipschitz continuous. Let f(x) = h(g(x)). For $\forall a, b \in \mathbb {R}^n$, we have |f(a), f(b)| = |h(g(a)), h(g(b)| ≤ ρ2|g(a), g(b))| ≤ ρ2ρ1‖a, b‖. Therefore, f is ρ1ρ2-Lipschitz continuous.□
Proof of Lemma 4.2: For convexity: According to [11], |x|p is convex on $\mathbb {R}$ for ∀p ≥ 1, and the non-negative weighted sum of convex functions is convex. Therefore, f is convex since it is the sum of convex functions.
For Lipschitz continuity: Let $g:\mathbb {R}^n \rightarrow \mathbb {R}$ be the ℓp-norm and $h:\mathbb {R} \rightarrow \mathbb {R}$ be the $p^\text{th}$ power function, then f(x) = h(g(x)). According to Lemmas B.2, g is 1-Lipschitz continuous. Since $|x_i| \in [0,\bar{x}]$, we know that $g(x) = (\sum \nolimits _{i=1}^n |x_i|^p)^{1/p}$ is bounded by 0 and $(\sum \nolimits _{i=1}^n \bar{x}^p)^{1/p} = \bar{x} n^{1/p}$. Meanwhile, based on Lemma B.3, it can be proved that h(x) = xp is Lipschitz continuous over interval $[0, \bar{x} n^{1/p}]$ for ∀p ≥ 1, and the Lipschitz constant is $p \bar{x}^{p-1} n^{p-1/p}$. Therefore, based on Lemma B.4, we know that f is ρ -Lipschitz continuous, where $p \bar{x}^{p-1} n^{p-1/p}$. $ \square$
B.2 Proof of Lemma 4.6
We first recall the definition of strict convexity and its first and second-order conditions:
Definition B.5 (Strictly convex functions) Let $\mathcal {W} \subseteq \mathbb {R}^n$ be a convex set. If a function $f:\mathbb {R}^n \rightarrow \mathbb {R}$ satisfies f(kw1 + (1 − k)w2) < kf(w1) + (1 − k)f(w2) for $\forall w_1, w_2 \in \mathcal {W}$ and k ∈ (0, 1), it is strictly convex. The first-order condition for strict convexity is f(w2) > f(w1) + ∇f(w1)T(w2 − w1) for $\forall w_1, w_2 \in \mathcal {W}$, and the second-order condition is ∇2f(w)≻0 for $\forall w \in \mathcal {W}$.
Proof of Lemma 4.6: Since $f(x) = \sum \nolimits _{i=1}^n |x_i|^p=\sum \nolimits _{i=1}^n x_i^p$ when xi ≥ 0 for ∀i, we have $\partial f/\partial {x_i} = p x_i^{p-1}$ and $\partial ^2 f/\partial {x_i}^2 = p (p-1) x_i^{p-2}> 0$ for ∀xi > 0 and ∀p > 1. Since ∂2f/∂xi∂xj = 0, ∀i ≠ j, ∇2f(x) is a diagonal matrix where the diagonal elements ∂2f/∂xi2, ∀i are the corresponding eigenvalues of ∇2f(x). Therefore, ∇2f(x)≻0 since all of its eigenvalues are positive, which means that the second-order condition for strict convexity is satisfied. $ \square$
C DATA FOR CASE STUDY
C.1 Data of IEEE 14-bus system
The IEEE 14-bus system is depicted in Fig. 7 [20] and the parameters of the five generators are listed in Table 3. The power generation cost of each generator follows a quadratic equation c0 + c1x + c2x2, where x is the output power of the generator.
No. | xmax /xmin (MW) | c0 ($) | c1 ($/MWh) | c2 ($/MWh2) |
---|---|---|---|---|
G1 | 100/0 | 2 | 0.05 | 0.002 |
G2 | 100/0 | 5 | 0.10 | 0.003 |
G3 | 100/0 | 8 | 0.15 | 0.004 |
G4 | 100/0 | 12 | 0.20 | 0.005 |
G5 | 100/0 | 15 | 0.30 | 0.006 |
C.2 Data of NYISO 1814-bus system
The NYISO system, consisting of 1814 buses (black dots), 2207 lines, 362 generators, and 33 wind farms (blue dots), is shown in Fig. 8. Colors of the transmission lines reflect power flows, with red denoting heavy flow and green indicating light flow. Specific parameters of this system can be found in [50]. Fig. 9 shows the heat-map of LMPs at 4 p.m. on Aug. 28, 2018, which is a moment with particularly heavy load and high LMPs.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
e-Energy '23, June 20–23, 2023, Orlando, FL, USA
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0032-3/23/06…$15.00.
DOI: https://doi.org/10.1145/3575813.3597356