Data-Driven Inverse Optimization for Marginal Offer Price Recovery in Electricity Markets

Zhirui Liang, Department of Electrical and Computer Engineering, Johns Hopkins University, United States of America, zliang31@jh.edu

Yury Dvorkin, Ralph O'Connor Sustainable Energy Institute; Department of Electrical and Computer Engineering; Department of Civil and System Engineering, Johns Hopkins University, USA, ydvorki1@jhu.edu

DOI: https://doi.org/10.1145/3575813.3597356
e-Energy '23: The 14th ACM International Conference on Future Energy Systems, Orlando, FL, USA, June 2023

This paper presents a data-driven inverse optimization (IO) approach to recover the marginal offer prices of generators in a wholesale energy market. By leveraging underlying market-clearing processes, we establish a closed-form relationship between the unknown parameters and the publicly available market-clearing results. Based on this relationship, we formulate the data-driven IO problem as a computationally feasible single-level optimization problem. The solution of the data-driven model is based on the gradient descent method, which provides an error bound on the optimal solution and a sub-linear convergence rate. We also rigorously prove the existence and uniqueness of the global optimum to the proposed data-driven IO problem and analyze its robustness in two possible noisy settings. The effectiveness of the proposed method is demonstrated through simulations in both an illustrative IEEE 14-bus system and a realistic NYISO 1814-bus system.

CCS Concepts: • Computing methodologies → Supervised learning; • Hardware → Smart grid;

Keywords: data-driven inverse optimization (IO), power market, DC optimal power flow (DCOPF), gradient descent (GD)

ACM Reference Format:
Zhirui Liang and Yury Dvorkin. 2023. Data-Driven Inverse Optimization for Marginal Offer Price Recovery in Electricity Markets. In The 14th ACM International Conference on Future Energy Systems (e-Energy '23), June 20--23, 2023, Orlando, FL, USA. ACM, New York, NY, USA 13 Pages. https://doi.org/10.1145/3575813.3597356

1 INTRODUCTION

The US power sector has undergone a transformation from a monopoly to a liberalized system since the 1980s, in order to promote competition and increase efficiency [29]. This change has opened up opportunities for power producers to make more profits, but also increases the risk of not being selected for dispatch if their offer price surpasses that of their competitors. Knowing the market, including the prices offered by competitors, allows power producers to maximize their profits and minimize their risk exposure [18]. In parallel, open-access policies (e.g., FERC Order 888) have authorized the release of some market data, including selected transmission network parameters and market-clearing results [23], which allows for data mining and reverse-engineering of the unreleased market data. Building on previous work that has examined the recovery of rival marginal offer prices from the market-clearing results through inverse optimization [10, 15, 16, 41], this paper also explores the price recovery problem in electricity markets but with a focus on designing performance guarantees for convergence, robustness, and uniqueness of the global optimum in inverse optimization problems.

1.1 Literature Review

An inverse optimization (IO) problem seeks to recover unknown parameters in the objective function [27] or constraints [13] of a forward optimization (FO) problem. The FO could be a linear program [3], a conic program [25], a mixed-integer program [2], a linearly constrained separable convex program [51], or more generally, a variational inequality function [8, 48].

The classic IO model assumes that there exists a single set of parameters that make the given solutions optimal. However, in reality, noise in observations can prevent the existence of parameters that make all solutions strictly optimal. This noise can come from various sources, such as (i) measurement errors during the data collection process, (ii) mismatches between the forward model and the actual underlying decision-making process, and (iii) bounded rationality (i.e., when the decision maker settles for sub-optimal results due to cognitive or computational limitations) [5, 32]. Applying the classic IO in noisy settings can lead to infeasibility or uninformative solutions, such as a trivial zero cost vector [14]. Therefore, the goal of IO in noisy settings is to minimize the fitting errors between the model and data to make the given solutions “approximately” optimal [5, 14, 32, 48].

In deterministic settings, IO can precisely recover the parameters of the FO problem based on a single observation. However, in noisy settings, multiple observations are required to minimize the empirical loss of the IO problem. This type of IO, which can handle multiple observations, is also known as data-driven IO [30, 32]. Solving data-driven IO problems can become difficult with an increase in variables as more training data is incorporated. To tackle this challenge, researchers have proposed to use machine learning (ML) methods instead of conventional data-driven IO models [7, 9, 17, 47].

The objective of IO is similar to those of some ML techniques, especially neural networks, as they both aim to estimate the unknown parameters of a model based on available observations. However, the key difference is that the underlying model in IO corresponds to the actual FO problem and its parameters can be interpreted, while in ML, the unknown parameters (such as weights and biases in neural networks) often lack a straightforward explanation. To incorporate physical laws into ML models, researchers have developed physics-informed ML methods [26]. Essentially, if the FO problem with unknown parameters is transformed into a differentiable layer in the neural network, these unknown parameters can be learned through training (similar to other parameters in the network) [1, 4, 12, 19]. For instance, OptNet is a network architecture that integrates a quadratic optimization program as a layer within an end-to-end neural network [4]. In [9], OptNet is used to determine the unknown parameters in a demand response model. The estimation of unknown parameters can also be achieved through other ML methods. For example, [7, 17] reveal the parameters through an online learning process, while [47] presents a deep inverse optimization model that formulates the parameter recovery problem as a deep learning model. The fundamental assumption behind these ML models is that the FO model is differentiable with respect to the unknown parameters, which allows for the computation of gradients and the training based on (stochastic) gradient descent methods.

To compare IO and non-physics-informed ML, [24] evaluated their performance in imputing a convex objective function and found that IO is more suitable for problems with smaller training sets or highly complex objectives/constraints, while ML can handle larger training sets and is less sensitive to noisy training data. However, potential drawbacks of ML, such as its tendency to diverge in extreme situations, restrict its use in critical decision-making. Hence, to gain the trust of decision makers in using ML, formal performance guarantees need to be established for ML models [6].

IO has various applications across fields, including in power systems and power markets. To maximize the profit of market participants in the day-ahead bidding process, IO has been used to recover the piece-wise linear cost function of generators based on historical market-clearing results in [15, 16, 41]. Similarly, [40] also designs a IO problem based on market-clearing results, but the forward problem is an equilibrium model in an oligopolistic electricity market, aimed at determining if the market structure matches the observed outcomes. While [15, 16, 40, 41] assume that all parameters except for offer prices are known, [10] uses IO to recover market structures that are not disclosed to market participants, such as parameters of transmission lines. Alternatively, IO has been used to estimate the parameters of demand response in power systems in studies such as [28, 42, 43]. In [43], IO is recast as a bi-level program and solved heuristically. [42] deals with a non-convex IO model and develops a two-step heuristic approach to solve it. In [28], IO is transformed into a quadratically constrained quadratic program and solved through successive linear programming.

The aim of this paper is to develop a data-driven IO method for recovering the marginal offer prices of generators from historical market-clearing results. The US power systems and markets have the following unique properties that distinguish this IO problem from other application scenarios.

First, the structures and most parameters of power markets are transparent [35], and the market-clearing results, including schedules and prices, are promptly and accurately published [33, 34]. This allows for the observation of both primal variables (schedules) and some dual variables (prices) in the forward problem, giving our price recovery problem an advantage over other IO problems where only primal solutions can be observed.

Second, the market operation relies on the use of locational marginal prices (LMPs) which represent the incremental costs of providing the next megawatt of energy at different locations within the power system. However, the offer price of a generator is only reflected in the LMPs when it is the marginal generator at that location. Hence, a single observation is insufficient to recover the offer prices of all generators, requiring the use of data-driven IO to explore multiple scenarios where different generators are marginal.

Third, the offer price of a generator tend to be stable over time. In day-ahead markets, generators are paid according to LMPs instead of their offer prices. To secure market participation, they typically set the offer prices close to or slightly above their generation costs. However, the offer price of a generator may vary due to external factors like fuel prices and weather, causing the observations from the market being the optimal solutions to the same forward model but with different cost vectors.

Finally, the noise in market-clearing results is sparse and traceable. According to [46], only a small percentage (less than 0.1%) of real-time intervals have resulted in price correction since 2009, indicating that random errors in published price data are uncommon. Another source of noise is the mismatch between the forward model studied in this paper (i.e., the DC optimal power flow model) and the actual market-clearing model which takes into account more constraints and may involve ad-hoc manipulations by market operators. These noises may affect the accuracy of IO results.

In summary, the transparent market structure and abundant market data make the data-driven IO feasible. However, the noise in historical data and the fluctuating nature of offer prices pose challenges to the IO problem. We plan to tackle these challenges by solving the data-driven IO problem using the gradient descent method to effectively utilize the training data and ensure robustness in noisy settings. Our focus is also on establishing performance guarantees for the proposed approach.

1.2 Contributions

After reviewing the related work on IO in Section 2 and formulating the FO problem in Section 3, our contributions in the rest of the paper are:

We propose a data-driven IO model for marginal offer price recovery in wholesale power markets. Our method differs from previous studies such as [16, 41] since it incorporates the unknown cost vector directly into the objective function, rather than relying solely on the primal or dual variables of the FO problem. This design results in a computationally manageable single-level optimization problem, compared to the bi-level programs in [5, 32].
Unlike previous work that solves the data-driven IO using off-the-shelf solvers [10, 15, 16, 40, 41], we solve it using gradient descent and derive the corresponding error bound. Our method allows for proving the convergence to the global optimum with a sub-linear rate, whereas the method in [9] only guarantees convergence in the absence of inequality constraints in the FO problem. Furthermore, we rigorously prove the existence and uniqueness of the global optimum to the data-driven IO problem, which are generally missing in other studies.
Different from [10, 15, 16, 41] which accomplish the offer price recovery using IO but neglect the impact of noise in observations, our approach considers potential sources of errors in market-clearing results and evaluates robustness of the data-driven IO in two noisy settings. The validity and robustness of the proposed data-driven IO method are demonstrated through simulations in the IEEE 14-bus system and the 1814-bus NYISO transmission network.

While the proposed data-driven IO model focuses on the price recovery problem in day-ahead markets, it can also be adapted to applications in real-time and ancillary service markets by appropriately changing constraints (while preserving linearity) and training data.

1.3 Notations

The following notations are used in the paper: I_n is an identity matrix of size n × n with ones on the main diagonal and zeros elsewhere, J_{m, n} is an all-ones matrix of size m × n, O_{m, n} is an all-zeros matrix of size m × n, D(·) creates a diagonal matrix from a vector, and ‖ · ‖_p represents the ℓ_p-norm.

2 OVERVIEW OF INVERSE OPTIMIZATION

In this section, we describe a general linear optimization model as the FO problem, formulate the corresponding IO models in deterministic and noisy settings, extend the single-observation IO to the data-driven IO, and compare the benefits and drawbacks of various IO formulations.

2.1 Linear FO Problem

Let $x \in \mathbb {R}^n$, $w \in \mathbb {R}^n$, $A \in \mathbb {R}^{m \times n}$, and $b \in \mathbb {R}^m$. We define our linear forward optimization (FO) problem as:

\begin{align} {\bf FO}(w): &\min _{x}\ w^T x \end{align}
(1a)

\begin{align} &\text{s.t. } \ (\xi): Ax \ge b, \end{align}
(1b)

where $\xi \in \mathbb {R}^m$ is the vector of dual variables associated with Ax ≥ b. Let $\mathcal {X}(w)$ be the set of feasible solutions to FO(w) and $\mathcal {X}^{OPT}(w)$ be the set of optimal solutions to FO(w). We assume that $\mathcal {X}(w)$ and $\mathcal {X}^{OPT}(w)$ are non-empty.

2.2 Deterministic IO Formulation

Let $x^0 \in \mathbb {R}^n$ denote an observed solution to FO(w) in (1). In a deterministic setting, there exists a cost vector w such that x⁰ is an optimal solution to FO(w). Therefore, given A, b and x⁰, the goal is to find w such that $x^0 \in \mathcal {X}^{opt}(w)$. Note that there may exist multiple w that can achieve this goal [27]. For example, if $x^0 \in \mathcal {X}^{opt}(w)$, then $x^0 \in \mathcal {X}^{opt}(G(w))$ also holds, where G(·) is any convex increasing function. The uniqueness of w can be guaranteed by normalization and regulation, e.g., by requiring ‖w‖ = 1. Further, if FO(w) is a linear program, its inverse problem IO(x⁰) is also a linear program [3].

One way to evaluate the optimality of the estimated w is via the optimality conditions of FO(w), which can be satisfied exactly by x⁰ when the optimal w is found. Therefore the loss function of FO(w) is trivial and can be set to 0. The optimality primal-dual (PD) conditions of FO(w) can be formulated as [3]:

\begin{align} {\bf IO-PD}(x^0): \min _{w,\xi } &\ 0 \end{align}
(2a)

\begin{align} \text{s.t. } \ & \left\Vert w \right\Vert = 1 \end{align}
(2b)

\begin{align} & Ax^0 \ge b \end{align}
(2c)

\begin{align} & A^T \xi = w \end{align}
(2d)

\begin{align} & \xi \ge O \end{align}
(2e)

\begin{align} & w^T x^0 = b^T \xi, \end{align}
(2f)

where (2b) ensures a unique optimal solution, (2c) embodies primal feasibility, (2d) and (2e) denote dual feasibility, and (2f) represents strong duality. Note that (2c) can be omitted since it does not depend on any primal or dual variables.

Alternatively, the optimality conditions of FO(w) can also be formulated as the Karush-Kuhn-Tucker (KKT) conditions [27]:

\begin{align} {\bf IO-KKT}(x^0): \min _{w,x,\xi } &\ 0 \end{align}
(3a)

\begin{align} \text{s.t. } \ & \text{(2b)--(2e)} \nonumber \\ & D(\xi) (b-Ax^0)=O, \end{align}
(3b)

where (3b) is complementary slackness. Note that (2d) is referred to as the stationarity constraint in the KKT constraints. The IO-PD and IO-KKT formulations in (2) and (3) are equivalent, i.e.,

\begin{equation*} x^0 \in \mathcal {X}^{opt}(w) \Leftrightarrow \text{(2b)--(2f) hold} \Leftrightarrow \text{(2b)--(2e) hold}, \nonumber \end{equation*}

and their feasibility is guaranteed [14].

2.3 Generalized IO (GIO) in Noisy Settings

In noisy settings, it is assume that the observed solution x⁰ is feasible but not necessarily optimal in FO(w), i.e., $x^0 \in \mathcal {X}(w)$. Thus, the optimality conditions in (2) or (3) may not be exactly met, and the objective of IO becomes finding the vector w such that x⁰ is an “ϵ -optimal” solution, i.e., the choice of w enables x⁰ to approximately satisfy the optimality conditions of FO(w). This IO problem is referred to as generalized IO (GIO) in [14] and there are two GIO models reported in the literature.

Based on the IO-PD formulation, GIO can be formulated by adding a slack variable ϵ_pd to relax the constraints of primal feasibility (2c) and strong duality (2f) [8, 14, 40], as given by:

\begin{align} {\bf GIO-PD}(x^0): \min _{w,\xi,\epsilon _{pd}} &\ \left\Vert \epsilon _{pd} \right\Vert _p \end{align}
(4a)

\begin{align} \text{s.t. } \ &\text{(2b), (2d) and (2e)} \nonumber \\ & A(x^0 +\epsilon _{pd}) \ge b \end{align}
(4b)

\begin{align} & w^T (x^0 +\epsilon _{pd}) = b^T \xi. \end{align}
(4c)

It is proved in [14] that (4b) can be omitted since it is automatically satisfied by the optimal solution to GIO-PD(x⁰) without (4b).

Alternatively, GIO can also be formulated based on the IO-KKT formulation by relaxing the stationarity conditions in (2d) and complementary slackness in (3b) with slack variables ϵ_stat and ϵ_comp [27, 43], respectively:

\begin{align} {\bf GIO-KKT}(x^0): \min _{w,\xi,\epsilon _{stat},\epsilon _{comp}} &\ \left\Vert \epsilon _{stat}\right\Vert _p + \left\Vert \epsilon _{comp}\right\Vert _p \end{align}
(5a)

\begin{align} \text{s.t. } \ & \text{(2b), (2d) and (2e)} \nonumber \\ & w - A^T \xi = \epsilon _{stat} \end{align}
(5b)

\begin{align} & D(\xi) (b-Ax^0)= \epsilon _{comp}. \end{align}
(5c)

If the observed solution is not feasible, i.e., $x^0 \notin \mathcal {X}(w)$, we can add slack variables ϵ_prim and ϵ_dual to the primal feasibility (2c) and the dual feasibility (2e), respectively, instead of forcing them to hold.

Both forms of GIO are typically straightforward to solve. For example, [14] derives the closed-form solutions to GIO-PD under different forms of loss functions, while [27] proves that GIO-KKT in (5) is a finite-dimensional convex optimization problem which can be solved efficiently with off-the-shelf solvers.

2.4 Nominal IO (NIO) in Noisy Settings

The study in [5] shows that GIO-PD and GIO-KKT are statistically inconsistent, meaning that their results may converge to incorrect values even with a large amount of training data. In response, [22] introduces the nominal IO (NIO), in which the objective is to minimize the difference between the observed solution x⁰ and the optimal solution of FO(w).

\begin{align} {\bf NIO}(x^0): \min _{w,x} \ \left\Vert x^0 - {\rm {arg}} \min _{x} \ {\bf FO}(w)\right\Vert _p, \end{align}
(6)

Note that (6) is a bi-level program because generally there is no closed-form solution to min _x FO(w). To reduce (6) to a single-level program, min _x FO(w) must be substituted with either the PD constraints in (2) or the KKT constraints in (3). For example, the NIO formulation based on the PD constraints is given by:

\begin{align} {\bf NIO-PD}(x^0): \min _{w,x,\xi } &\ \left\Vert x^0-x \right\Vert _p \end{align}
(7a)

\begin{align} \text{s.t. } \ &\text{(2b), (2d) and (2e)} \nonumber \\ & Ax \ge b \end{align}
(7b)

\begin{align} & w^T x = b^T \xi. \end{align}
(7c)

Similarly, the NIO based on KKT constraints can be formulated. The distinction between the GIO and NIO formulations lies in the choice of loss functions that measure the discrepancy between predictions and observations. In the GIO formulation, the loss is measured by the slackness required to make the observed solution satisfy the approximate optimality conditions under the estimated parameters. In the NIO formulation, the loss is the distance between the observed solution and the optimal solution to FO under the estimated parameters. Moreover, the NIO formulation is statistical consistent [5] and more robust to noise [48], but it is a bi-level program and thus NP-hard [5], while the GIO formulations are relatively easier to solve.

2.5 Data-driven IO Formulation

The IO formulations presented in (2)–(7) are based on a single observation x⁰, but they can be extended to the data-driven form to accommodate multiple observations. For instance, given N observed solutions to FO $\lbrace x^0_i\rbrace _{i \in \mathcal {I}}$ and the corresponding parameters $\lbrace A_i,b_i\rbrace _{i \in \mathcal {I}}$, the NIO in (7) can be adapted as follows:

\begin{align} \min _{w,\lbrace {x_i}\rbrace _{i \in \mathcal {I}}} \ \sum \nolimits _{i \in \mathcal {I}} \left\Vert x_i^0 - {\rm {arg}} \min _{x_i} \ {\bf FO}(w,A_i,b_i)\right\Vert _p, \end{align}
(8)

which is a more computationally demanding bi-level problem compared to (7). This is because the number of variables in (8) is approximately N times larger than that in (7).

3 FORWARD PROBLEM FORMULATION

In this section, we present the standard US power market-clearing procedure, formulate the day-ahead market-clearing model, and derive the expression for locational marginal prices (LMPs) based on the Lagrangian and KKT conditions of the market-clearing model.

3.1 Energy Market and Its Settlement Process

In the US, the operation of power systems and power markets are managed by the designated independent system operators (ISOs). ISOs ensure the safe and reliable operation of power systems and regulate markets for various products such as capacity, energy, and ancillary services [44]. Fig. 1 illustrates the typical two-settlement process in energy markets, including the day-ahead market (DAM) and real-time market (RTM). The clearing of DAM is based on security-constrained unit commitment (SCUC) and security-constrained economic dispatch (SCED), which determines the least-cost generator schedules and hourly LMPs for the next day based on day-ahead generation offers, load bids, as well as the forecast of load and renewable energy source (RES) generation. The clearing of RTM is based on real-time commitment (RTC) and real-time dispatch (RTD). The commitment of units is predominantly determined in the DAM, making it a crucial aspect for power suppliers. However, the actual scheduling may diverge from the DAM schedule due to changes in operating conditions, influences of additional RT generation offers, and variations in actual load. This paper focuses on recovering the DA offer prices of generators from the DA market-clearing results.

Figure 1: Two-settlement process in US energy markets.

3.2 Day-Ahead Market-Clearing Model

The actual SCUC model employed by ISOs is complex and involves multiple solution steps, as described in [37], and it is simplified to a representative SCUC model in some papers [21]. Following similar simplifications, we formulate the following UC model for a system with n generators, m nodes and l lines:

\begin{align} {\bf UC}: & \min _{u,x}\ c^T x \end{align}
(9a)

\begin{align} \text{s.t. } \ & J_{n,1}^T x= J_{m,1}^T e \end{align}
(9b)

\begin{align} & x \le u x^{\max } \end{align}
(9c)

\begin{align} & u x^{\min } \le x \end{align}
(9d)

\begin{align} & \Phi (S x -e) \le f^{\max } \end{align}
(9e)

\begin{align} & - f^{\max } \le \Phi (S x -e), \end{align}
(9f)

where (9a) minimizes system operation cost using a linear cost function, (9b) ensures the power balance in the system, (9d) and (9c) limit the output of generators to their technical bounds x^min and x^max, (9f) and (9e) limit the power flow on each line to the maximum transmission capacity f^max. Other notations used in (9) are explained in Table 1.

Model (9) is a mixed-integer linear program (MILP) due to the presence of binary variables u. Although non-convex, it can be solved by modern solvers such as CPLEX or Gurobi. To obtain the dual variables for marginal price calculation, we need to convert (9) into an equivalent convex linear program (LP) by replacing binary variables to real variable $u\in \mathbb {R}^n$ and setting u = u^*, where u^* is the optimal commitment results obtained by solving (9) with a MILP solver. This approach follows the results in [36], where the authors show that the optimal solution of the MILP is equal to the optimal solution of the LP when u = u^* is enforced in LP. The resulting LP is the following DC optimal power flow (DCOPF) model:

\begin{align} {\bf DCOPF}: & \min _{x}\ c^T x \end{align}
(10a)

\begin{align} \text{s.t. } \ &(\lambda): J_{n,1}^T x= J_{m,1}^T e \end{align}
(10b)

\begin{align} &(\alpha): x \le u^*x^{\max } \end{align}
(10c)

\begin{align} &(\beta): u^*x^{\min } \le x \end{align}
(10d)

\begin{align} &(\mu): \Phi (S x -e) \le f^{\max } \end{align}
(10e)

\begin{align} &(\nu): - f^{\max } \le \Phi (S x -e). \end{align}
(10f)

Greek letters in parentheses in (10b)–(10f) denote Lagrangian multipliers (dual variables) of the respective constraints. Specifically, λ denotes the energy price at the reference node in the power system. The KKT optimality conditions of (10) are:

\begin{align} & \text{(10b)--(10f)} \nonumber \\ &c - \lambda J_{n,1} +\alpha -\beta +(\Phi S)^T \mu - (\Phi S)^T \nu = O_{n,1} \end{align}
(11a)

\begin{align} & D(\alpha) (x-u^*x^{\max })=O_{n,1} \end{align}
(11b)

\begin{align} & D(\beta) (u^*x^{\min }-x)=O_{n,1} \end{align}
(11c)

\begin{align} & D(\mu) (\Phi S x - \Phi e-f^{\max })=O_{l,1} \end{align}
(11d)

\begin{align} & D(\nu) (-\Phi S x + \Phi e-f^{\max })=O_{l,1} \end{align}
(11e)

\begin{align} & \alpha \ge O_{n,1},\ \beta \ge O_{n,1},\ \mu \ge O_{l,1},\ \nu \ge O_{l,1}. \end{align}
(11f)

The Lagrangian function of (10) is:

\begin{align} \mathcal {L}(x,\lambda,\alpha,\beta,\mu,\nu) &=c^T x+ \lambda (J_{m,1}^T e- J_{n,1}^T x) + \alpha ^T(x-u^*x^{\max })\nonumber \\ & + \beta ^T(u^*x^{\min }-x) + \mu ^T(\Phi S x - \Phi e-f^{\max }) \nonumber \\ & + \nu ^T(-\Phi S x + \Phi e-f^{\max }). \end{align}
(12)

We assume that market participants have limited information about the actual market-clearing model used by the ISOs, i.e., they are only aware of the DCOPF model in its simplest form. Therefore, the DCOPF model in (10) will be used as the FO model in Section 4. If some market participants have more knowledge about the actual (ground truth) market-clearing model, that is knowledge exceeding the DCOPF model in (10), the IO model proposed in Section 4 could be upgraded to accommodate the FO problem with more constraints (e.g., linearized ACOPF).

3.3 LMP Derivation from the DCOPF Model

The use of LMPs is the cornerstone of market operation [46]. The marginal energy price, represented by the dual variable λ in (10b), is the LMP at all nodes when the transmission capacity is unlimited. It reflects the cost of serving the next increment of load in the most efficient way. Based on (11a), we can express λ as:

\begin{equation} \lambda = J_{1,n} \big [c + \alpha -\beta +(\Phi S)^T \mu - (\Phi S)^T \nu \big ] \end{equation}
(13)

However, LMPs usually vary between different locations since the transmission constraints obstruct the flow of the least-cost electricity to every part of the system, resulting in congestion costs. The vector of LMPs (ω) can be derived based on (12) as:

\begin{equation} {\bf LMPs}: \omega \equiv \frac{\partial \mathcal {L}}{\partial e} = J_{m,1} \lambda - \Phi ^T \mu + \Phi ^T \nu, \end{equation}
(14)

where λ is the LMP at the reference node (so λ is a component of vector ω), and Φ^T(ν − μ) is congestion cost. Another expression of LMPs can be derived by substituting (13) into (14), as:

\begin{align} \omega & = J_{m,n} \big [c + \alpha -\beta +(\Phi S)^T (\mu - \nu) \big ] + \Phi ^T (\nu - \mu) \nonumber \\ & = J_{m,n} (c + \alpha - \beta)+ (J_{m,n} S^T - I_m)\Phi ^T (\nu - \mu) \end{align}
(15)

In power markets, while the offer prices of generators (elements in cost vector c) are confidential, the market-clearing results including the optimal schedules (x) and prices (λ and ω) are disclosed. Given the relationships between the unknown cost vector c and the publicly available data ω and λ provided by (13) and (15), it is possible to recover c from the historical values of ω and λ using inverse optimization, provided that the schedules (i.e., e, u, x) and the system parameters (i.e., x^max, x^min, f^max, Φ, S) are known. Note that a single observation is not enough to recover all the offer prices, as the offer price of a generator is reflected in LMPs only when the generator is marginal. Hence, data-driven IO has to be used and its performance in noisy settings should be evaluated.

Table 1: Notations in models (9) and (10)

Notation	Meaning
$c \in \mathbb {R}^n$	offer prices of generators
$x \in \mathbb {R}^n$	output power of generators
$u \in \mathbb {R}^n$	commitment status of generators
$e \in \mathbb {R}^m$	net load (demand minus renewable injections)
$x^{\max } \in \mathbb {R}^n$	upper bound of x
$x^{\min } \in \mathbb {R}^n$	lower bound of x
$\Phi \in \mathbb {R}^{l \times m}$	power transfer distribution factor (PTDF) matrix
$S \in \mathbb {R}^{m \times n}$	indicator of generators’ connection to nodes
$f^{\max }\in \mathbb {R}^l$	thermal capacity of transmission lines

4 DATA-DRIVEN IO FORMULATION

Assume that multiple observations of schedules $\lbrace x^0_i\rbrace _{i \in \mathcal {I}}$ and prices $\lbrace \lambda ^0_i,\omega ^0_i\rbrace _{i \in \mathcal {I}}$ are given in the market-clearing results, where $\mathcal {I}$ denotes the set of training data indexed by i. Using the data-driven IO formulation in (8), the price recovery problem can be formulated based on primal variable x:

\begin{align} \min _{c,\lbrace {x_i}\rbrace _{i \in \mathcal {I}}} \ \sum \nolimits _{i \in \mathcal {I}} \left\Vert x_i^0 - {\rm {arg}} \min _{x_i} \ {\bf DCOPF}(c)\right\Vert _p, \end{align}
(16)

or based on dual variable λ :

\begin{align} \min _{c,\lbrace {\lambda _i}\rbrace _{i \in \mathcal {I}}} \ \sum \nolimits _{i \in \mathcal {I}} \left\Vert \lambda _i^0 - {\rm {arg}} \min _{\lambda _i} \ {\bf DCOPF}(c)\right\Vert _p. \end{align}
(17)

To bypass the complexity of bi-level problems, in this section we propose a data-driven IO formulation that incorporates the unknown cost vector c into the objective function. We design the corresponding solution method based on gradient descent and prove its convergence to the global optimum. Additionally, we analyze the robustness of the data-driven IO in two different noisy settings, including (i) random errors in historical LMPs and (ii) mismatch between DCOPF and the actual market-clearing model.

4.1 Data-Driven IO in Deterministic Settings

We start from the single-observation case in a deterministic setting, where the observed solutions λ⁰, ω⁰ and x⁰ are assumed to be the optimal solutions of (10) and therefore satisfy the KKT conditions in (11). From (11a), we know:

\begin{equation} c = \lambda J_{n,1} -\alpha +\beta + S^T \Phi ^T(\nu -\mu). \end{equation}
(18)

From (14), we know:

\begin{equation} \Phi ^T (\nu -\mu) = \omega - J_{m,1} \lambda. \end{equation}
(19)

Substituting (19) into (18) returns:

\begin{equation} c = \lambda (J_{n,1} - S^T J_{m,1}) + S^T \omega -\alpha +\beta. \end{equation}
(20)

which is a closed-form relationship between the unknown parameters c and the observed solutions λ⁰, ω⁰ and x⁰. Thus, (20) can be used to compute c if the true values of α and β are known. Normally, the values of dual multipliers α, β, μ, and ν are not given directly in the market-clearing results. Nonetheless, the binding status of constraints (10c)–(10f) can be determined based on the published power system schedules; that is, if a constraint is non-binding, its corresponding dual multiplier is zero. Therefore, by only considering free generators whose capacity constraints (10d) and (10c) are not binding (i.e., α and β are zero), the offer prices of these generators can be calculated directly by:

\begin{equation} c^0 = \lambda ^0 (J_{n,1} - S^T J_{m,1}) + S^T \omega ^0. \end{equation}
(21)

For generators with binding power constraints, their offer prices cannot be calculated by (21), so their corresponding values in vector c⁰ are set to zero. By considering more scenarios with different free generators, the offer prices of more generators can be recovered. Note that although the generation schedule x⁰ is not explicitly included in (21), it plays a crucial role in identifying the free generators. It is also worth noting that no information about the power transmission system, including the PTDF matrix Φ and the thermal capacity of transmission lines f^max, is required in (21), which means that we can recover the offer prices of generators without assuming full knowledge of system parameters. If certain system parameters are unknown, we can recover them by applying another IO model after we have recovered the offer prices c following this paper.

Then, we extend (21) to the multi-observation case in a deterministic setting. Recall that the offer price of a generator may vary at different times due to external factors. In this case, while the observed solutions still satisfy the KKT conditions in (11), calculating c⁰ based on different sets of λ⁰ and ω⁰ would yield different offer prices for the same generator. To address this, we propose the following data-driven IO formulation. Given multiple observed solutions $\lbrace \lambda ^0_i,\omega ^0_i\rbrace _{i \in \mathcal {I}}$ where $\mathcal {I}$ is the set of training data indexed by i, the objective of data-driven IO is:

\begin{equation} \min _{\hat{c}} \ l(\hat{c})= \sum \nolimits _{i \in \mathcal {I}} \Vert c_{i}^{0} - \hat{c} \Vert _p^p. \end{equation}
(22)

where $c_{i}^{0}$ is calculated by substituting $\lambda ^0_i$ and $\omega ^0_i$ into (21). The objective of this IO is defined using the $p^\text{th}$-power of the ℓ_p-norm, which is different from the ℓ_p-norm used in other IO formulations discussed in Section 2. This choice will aid in proving the convergence and uniqueness of the global optimum in the following subsections by enforcing strict convexity.

Note that the data-driven IO in (22) is a single-level problem, so it has lower computational complexity compared to the bi-level data-driven IO in (16) and (17). This advantage comes from the availability of a closed-form relationship between the unknown parameter c⁰ and the observed solutions λ⁰ and ω⁰ as specified in (21). Although this relationship is insightful in the deterministic setting, it does not hold in some noisy settings, which will be discussed in section 4.4.

Since only the offer prices of free generators can be recovered from (21), $c_i^0$ is a sparse vector. The zero elements in vector $c_i^0$ do not provide any additional information, so they should be eliminated from the loss function in (22) as:

\begin{equation} \min _{\hat{c}_k} \ l(\hat{c}_k)= \sum \nolimits _{i \in \mathcal {I}_k} \Vert c_{i,k}^{0} - \hat{c}_k \Vert _p^p, \end{equation}
(23)

where $\hat{c}_k$ is the recovered price of generator k, and $\lbrace c_{i,k}^{0}\rbrace _{i \in \mathcal {I}_k}$ is the valid (i.e., non-zero) training data for generator k. In this case, the zero elements in vector $c_i^0$ will not contribute to the loss function.

We can use the gradient descent (GD) method to iteratively update the values of $\hat{c}$. The update rule for $\hat{c}_k$ is:

\begin{align} \hat{c}_{k}^{j+1} = \hat{c}_{k}^{j} - \frac{\eta }{N_k} \sum \nolimits _{i=1}^{N_k} \frac{\partial \Vert c_{i,k}^0 - \hat{c}_{k}^{j} \Vert _p^p}{\partial \hat{c}_{k}^{j}},\ \forall k \end{align}
(24)

where $\hat{c}_{k}^{j}$ is the estimated offer price of generator k during the j^th iteration, η is the learning rate, N_k is the amount of valid training data for generator k. Specifically, if we use ℓ₂-norm in the loss function, i.e., p = 2, there is a closed-form solution for $\hat{c}_k$, which is the mean of all the training data $\lbrace c_{i,k}^{0}\rbrace _{i \in \mathcal {I}_k}$.

Algorithm 1 outlines the procure of solving the proposed data-driven IO model based on GD. To enhance computational performance, stochastic gradient descent (SGD) can be used when dealing with large training data sets or when the price recovery task becomes an online learning problem.

Algorithm 1

4.2 Convergence of Data-Driven IO Based on Gradient Descent (GD)

In this subsection, we prove that the loss function in (23) converges to its optimal value with a sub-linear rate using the gradient descent (GD) method. The established convergence theorem for GD on convex, Lipschitz continuous functions can be found in Appendix A.1. We first discuss the convexity and Lipschitz continuity of $\left\Vert x \right\Vert _p^p$, which are essential to prove the convergence of the proposed data-driven IO formulation.

Lemma 4.1 (Convexity and Lipschitz continuity of $\left\Vert x \right\Vert _p^p$) Let $f(x) = \left\Vert x \right\Vert _p^p = \sum \nolimits _{i=1}^n |x_i|^p$. Assuming that p ≥ 1 and $|x_i| \in [0,\bar{x}]$ for ∀i ∈ [1, n], then f(x) is convex and Lipschitz continuous with respect to the ℓ_p-norm, and the Lipschitz constant is $p \bar{x}^{p-1} n^{p-1/p}$.

The proof of Lemma 4.1 can be found in Appendix B.1. Based on Theorem A.1 and Lemma 4.1, we can propose the convergence theorem for the data-driven IO in Section 4.1:

Theorem 4.2 (Convergence of data-driven IO) Consider the loss function l(c) in (23), where $c \in \mathcal {C} \subseteq \mathbb {R}^n$ and $\Vert \mathcal {C}\Vert \le B$. Assume that the ℓ_p-norm in l(c) satisfies p ≥ 1 and $|c_k| \in [0,\bar{c}]$ for ∀k ∈ [1, n]. Let c^* be an optimal solution of $\min _{c \in \mathcal {C}} l(c)$ and let $\hat{c}$ be an output of applying the GD algorithm on l(c). For ∀ϵ > 0, to achieve $l(\hat{c}) - l(c^*) \le \epsilon$, it is sufficient to perform GD for T iterations with a learning rate of η, where $T \ge \frac{n^2 p^2 \bar{c}^{2p}}{\epsilon ^2}$ and $\eta = (p \bar{c}^{p-2} n^{p-2/p} \sqrt {T})^{-1}$.

Proof. Since $|c_k| \le \bar{c}$ for ∀k, we have $B = (\sum \nolimits _{i=1}^n \bar{c}^p) ^{1/p} = \bar{c} n^{1/p}$. Based on Lemma 4.1, we can prove l(c) is convex and ρ -Lipschitz continuous and the Lipschitz constant is $\rho = p \bar{c}^{p-1} n^{p-1/p}$. Therefore, $B^2 \rho ^2 = n^2 p^2 \bar{c}^{2p}$. According to Theorem A.1, we know that:

\begin{equation} l(\hat{c}) - l(c^{*}) \le \frac{B \rho }{\sqrt {T}} = \frac{n p \bar{c}^{p}}{\sqrt {T}}. \end{equation}
(25)

Therefore, $T \ge \frac{n^2 p^2 \bar{c}^{2p}}{\epsilon ^2}$ can ensure $\epsilon \le n p \bar{c}^{p} /\sqrt {T}$.□

Specifically, if ℓ₁-norm is used in l(c), i.e., when p = 1, the error bound in (25) becomes $\epsilon \le n \bar{c} /\sqrt {T}$. If the loss function is strongly convex, for example when ℓ₂-norm is used, the convergence rate could be greater [39]. To satisfy additional limits on the value of $\hat{c}_k$, inequality constraints can be introduced into the unconstrained optimization model in (23). The resulting problem can then be solved using projected gradient descent. However, for the convergence theorem to hold, it is crucial that the feasible set of $\hat{c}_k$, as defined by the inequality constraints, be both closed and convex [49].

4.3 Existence and Uniqueness of Global Optimal Solution to Data-Driven IO

This subsection focuses on demonstrating the existence and uniqueness of the global optimal solution to the data-driven IO formulation in (23). The established theorems for the existence and uniqueness of optimal solutions to convex problems can be found in Appendix A.2. First we prove the strict convexity of $\left\Vert x \right\Vert _p^p$, which is essential to prove the uniqueness of global optimum to the proposed data-driven IO formulation.

Lemma 4.3 (Strict convexity of $\left\Vert x \right\Vert _p^p$) Following the notations in Lemma 4.1, if p > 1 and x_i ≥ 0 for ∀i ∈ [1, n], then $f(x) = \left\Vert x \right\Vert _p^p$ is strictly convex.

The proof of Lemma 4.3 can be found in Appendix B.2. Based on Theorems A.2– A.3 and Lemma 4.3, we can propose the following theorem for the data-driven IO formulation in (23):

Theorem 4.4 (Existence and uniqueness of global optimum to data-driven IO) Following the notations in Theorem 4.2, assuming the ℓ_p-norm in l(c) satisfies p > 1 and $c_k \in [\underline{c},\overline{c}]$ for ∀k ∈ [1, n] where c > 0, then $\min _{c \in \mathcal {C}} l(c)$ has a unique global optimal solution c^*, and $\hat{c} \rightarrow c^{*}$ when the iteration T → ∞.

Proof. According to Theorem A.3 and Lemma 4.3, we know that l(c) is strictly convex and thus has at most one optimal solution in the convex set $\mathcal {C}$. Meanwhile, since the feasible region of c is closed and bounded and l(c) is continuous on $\mathcal {C}$ (Lipschitz continuity is stronger than uniform continuity), we know that $\min _{c \in \mathcal {C}} l(c)$ has at least one global optimum according to Theorem A.2. In summary, $\min _{c \in \mathcal {C}} l(c)$ has a unique global optimal solution c^*. Given the error bound in Theorem 4.2, we know that $l(\hat{c}) - l(c^{*}) \rightarrow 0$ when the iteration T → ∞. Considering the uniqueness of c^*, we can prove that $\hat{c} \rightarrow c^{*}$ when T → ∞.□

Inspecting Theorems 4.2 and 4.4, we notice two differences on the assumptions of l(c): Theorem 4.2 assumes p ≥ 1 and $0\le |c_k|\le \bar{c}$ for ∀k, while Theorem 4.4 assumes p > 1 and $\underline{c} \le c_k\le \overline{c}$ for ∀k. Variable c_k represents the offer price of generator k, which is greater than its generation cost and capped due to market power regulations. Hence, the assumption of $\underline{c} \le c_k\le \overline{c}$ is realistic. We remark that in some exceptional cases, under heavy production credits, renewable generators can submit negative offer prices; however, this practice is expected to gradually phase out.

Theorems 4.2 and 4.4 only apply to generators that have been marginal at least once in the training data set. The offer prices of non-marginal generators are eliminated from the loss function l(c) in (23), so the estimations of these prices remain unchanged from their initial values, rather than converging to an optimal value.

4.4 Robustness of Data-Driven IO in Noisy Settings

4.4.1 Noisy Setting I: Random Errors in LMPs. The market-clearing results published by ISOs may not reflect the exact optimal solutions due to incorrect inputs or calculation errors. However, these results are generally believed to be accurate and are accepted by market participants due to the price validation procedure conducted after the market is closed [46]. According to (21), the data-driven IO formulation will not be affected by errors in the observations of primal variable x due to its independence from x⁰. However, errors in LMPs can compromise the expression of ω in (15) and therefore the expression of c⁰ in (21). This kind of noise in data-driven IO is similar to the estimation error in statistical ML which is implied by the fact that the ML model is based on a finite training data set that only partially reflects the true distribution of data. The GIO and NIO formulations in Section 2 can handle random errors in training data. However, we choose to retain the original form in (23) instead of adapting it to a GIO or NIO formulation due to the sparsity of errors. According to the parameter update rule of GD in (24), the impact of a single training data point on the final result is small when the training set is large. Meanwhile, the ℓ₁-norm is robust to outliers. In summary, we can reduce the impact of random errors by using ℓ₁-norm in the loss function or increasing the amount of training data. We will explore these methods further in Section 5.1.

4.4.2 Noisy Setting II: Mismatch Between DCOPF and Actual Market-Clearing Model. As mentioned in Section 3.2, the actual day-ahead market-clearing is based on the SCUC model instead of the DCOPF model employed in this paper. This mismatch is the second source of noise in the price recovery problem and is comparable to the approximation error in statistical ML (i.e., the difference between the best model within the chosen model class and the optimal model within all model classes). To some extent, the proposed data-driven IO formulation is robust to this noise, as some additional constraints in SCUC do not compromise the validity of (21) even though it is derived based on DCOPF. For instance, adding the flexibility reserve requirements to (10) results in the following DCOPF model:

\begin{align} & \min _{x,r^{+},r^{-},\lambda,\alpha,\beta,\mu,\nu,\theta ^{+},\theta ^{-}}\ c^T x \end{align}
(26a)

\begin{align} \text{s.t. } \ & \text{(10b), (10e) and (10f)} \nonumber \\ &(\theta ^{+}): J_{n,1}^T r^{+} \ge R^{+} \end{align}
(26b)

\begin{align} &(\theta ^{-}): J_{n,1}^T r^{-} \ge R^{-} \end{align}
(26c)

\begin{align} &(\alpha): x+r^{+} \le u^*x^{\max }. \end{align}
(26d)

\begin{align} &(\beta): x-r^{-} \ge u^*x^{\min }. \end{align}
(26e)

where r⁺ and r⁻ are the upward and downward flexible capacities provided by generators, and R⁺ and R⁻ are the upward and downward reserve requirements. In this case, the KKT condition in (11a) and the LMP definition in (14) still hold, so the expression of λ in (13) and the expression of ω in (15) remain unchanged. This means that c⁰ can still be calculated based on (21) and the objective function in (23) can still be used to recover the cost vector c. The only change would be the number of free generators, which is determined by the quantity of generators operating between their minimum and maximum output limits. The power limits in (26d) and (26e) are more restrictive compared to (10c) and (10d), resulting in fewer free generators and fewer recoverable offer prices from a single observation. However, these additional constraints also create more diverse scenarios where different generators are marginal, thus potentially increasing the total number of recoverable offer prices from historical data. Note that some additional constraints may render (21) invalid, such as the ramping limits of generators:

\begin{align} & (\varphi _t^{+}): x_{t} +r_{t}^{+} - x_{t-1} +r_{t-1}^{-}\le R^{up} u_{t-1} + v_{t} x^{\min }, \ \forall t\in \mathcal {T} \end{align}
(27)

\begin{align} & (\varphi _t^{-}): x_{t-1}+r_{t-1}^{+} - x_{t} +r_{t}^{-} \le R^{down} u_{t} + w_{t} x^{\min }, \ \forall t\in \mathcal {T} \end{align}
(28)

where R^up and R^down are the ramp-up and down rates of generators, v_t and w_t are binary variables denoting the start-up and shut-down decisions at time t. In this case, the dual multipliers $\varphi ^{+}_t$ and $\varphi ^{-}_t$ will be added to the expression of c, and their influence on the IO results is determined by the frequency of (27) and (28) being binding.

The model mismatch problem is a common challenge for all IO formulations and cannot be fully resolved without using a more accurate FO model. We will study the impact of this mismatch on IO results using numerical experiments in Section 5.2 and discuss other possible solutions in Section 6.

5 CASE STUDY

In this section we demonstrate the effectiveness of the proposed data-driven IO based on the IEEE 14-bus system and the NYISO 1814-bus system. Simulations were conducted using Python v3.8 and the Gurobi solver on a standard PC with an Intel i9 processor and 16 GB of RAM. In the IEEE 14-bus system, the data-driven IO model was solved in one minute. With regard to the NYISO system, the solution of the proposed model was obtained in all instances in less than ten minutes.

5.1 Illustrative Example: IEEE 14-bus system

We first evaluate the performance of the data-driven IO in the IEEE 14-bus system (details in Appendix C.1). We assume that each generator sets a piece-wise linear price baseline with five equally divided blocks based on the quadratic power generation cost, as shown in Fig. 2. The actual offer prices fluctuate around the baseline with deviations following a normal distribution N(μ, σ²), where μ = 0 and σ = 2 (unit: $/MWh) are assumed in this case.

Figure 2: Power generation costs (dashed lines) and price baselines (solid lines) of five generators.

5.1.1 Performance in Deterministic Settings. To recover as many offer prices as possible, we create 200 training data points and validate that each of the five generators is marginal at least once at each block. The initial values of the five blocks are set to 10, 20, 30, 40, and 50, respectively. Fig. 3 shows the variations of four estimated offer prices at each iteration and compares the convergence rates based on the ℓ₁ and ℓ₂-norms. In all four cases, the estimated prices converge to the true values, with faster convergence observed when using the ℓ₂-norm due to its strong convexity.

Figure 3: Convergence of estimated offer prices based on ℓ₁ and ℓ₂-norms.

5.1.2 Performance with Random Errors in LMPs. To evaluate the performance of data-driven IO in the first noisy setting, we add random errors following a normal distribution N(μ, σ²) to the original LMPs. Four noise settings are considered with varying error sizes and frequencies. In the small-error case, we assume μ = 50 and σ = 5, while in the large-error case, we assume μ = 100 and σ = 10 (unit: $/MWh). According to [46], the actual frequency of random errors in LMPs is less than 0.1%, but we consider more challenging cases with 1% and 5% error frequencies. We use data-driven IO to recover the offer price of G3 whose true value is 25.445 $/MWh, and the estimated offer prices and the corresponding errors are shown in Table 2. The data-driven IO shows robustness to random errors with ℓ₁-norm and acceptable performance with ℓ₂-norm when noise is small and sparse.

Table 2: Estimated offer prices ($\hat{c}$) and corresponding errors (ϵ) in different noisy settings (true value c^* = 25.445 $/MWh)

Noisy settings	Using ℓ₁-norm	Using ℓ₂-norm
1% small errors	$\hat{c} =25.56$, $\epsilon =0.45\%$	$\hat{c} =25.69$, $\epsilon =0.96\%$
1% large errors	$\hat{c} =25.56$, $\epsilon =0.45\%$	$\hat{c} =26.19$, $\epsilon =2.93\%$
5% small errors	$\hat{c} =25.56$, $\epsilon =0.45\%$	$\hat{c} =26.67$, $\epsilon =4.81\%$
5% large errors	$\hat{c} =25.56$, $\epsilon =0.45\%$	$\hat{c} =29.17$, $\epsilon =14.64\%$

5.2 Numerical Experiments on NYISO System

In this numerical experiment, we study the NYISO system with 1814 buses, 2207 lines, 362 generators and 33 wind farms (details in Appendix C.2). The structure and parameters of this NYISO system are mined and estimated from publicly available data sources [35]. The NYISO publishes daily LMPs at [34] and other market-clearing results every three months at [33]. We use this information to generate hourly market-clearing results over a three-month period (February, April, and August 2018), yielding a total of 2136 training data points. Note that in this case we assume that the baseline price of each generator is equally divided into ten price blocks, and the upper and lower bounds of each block are known. However, if the block setting rules are unpublished, we need to estimate the bounds of each block first based on historical data. Subsequently, we can recover the offer price of each block based on the estimated bounds of the blocks.

5.2.1 Performance in Deterministic Settings. As previously mentioned in Section 4.1, a training data point is valid for the price recovery of a generator if the generator is committed and marginal in that scenario. Fig. 4 summarizes the number of valid training data points for all the generators in the system. As shown in Fig. 4, most of the generators only have valid training data at a few steps of their offer prices, thus only a portion of the prices can be recovered. The results indicate that 44.03% of all offer prices can be recovered from the valid training data, while 81.21% of these recovered prices are based on less than 5 training data points. Meanwhile, for 85.35% of the generators, at least one block of price can be recovered. Note that there is an theoretical upper limit on the proportion of recoverable prices because an offer price can only be recovered from the LMPs if it affected the values of LMPs. Therefore, other approaches, such as NIO, also face the same limitation with regards to the recovery rate.

In total, 53 generators (out of 362) are never found to be marginal in the 2163 data points we analyzed, so their offer prices cannot be estimated from the training data. This result does not indicate a failure of the data-driven IO for large systems, as these non-marginal generators do not typically compete with others. For example, due to a relatively high generation cost, generator No.14 is never committed in this three months. Furthermore, incorporating more training data can decrease the number of unrecoverable generators. As more market-clearing data posted by market operators, the portion of recovered offer prices will increase, which is important for dealing with noisy data.

Figure 4: Number of valid training data points for 362 generators.

5.2.2 Performance with Model Mismatch. To assess the performance of the data-driven IO approach in the presence of model mismatches, we replicate the SCUC model used by NYISO in its day-ahead scheduling process, as outlined in [37]. This SCUC model, as detailed in [31], includes additional constraints such as ramping limits for generators, which may challenge the accuracy of recovered prices when those constraints are binding. Fig. 5 shows the binding frequency of two sets of constraints, including the power limits in (26d) and (26e) and the ramping limits in (27) and (28). It also compares the results across three months, namely February (with low demand), April (with medium demand), and August (with high demand). As depicted in Fig. 5 (b), the occurrence of only the ramping limits being binding is rare. As a result, the frequency of both sets of constraints being binding is approximately the same as the frequency of only the power limits being binding.

Fig. 5 conveys a crucial message that the ramping limits of free generators (whose power limits are not binding) are typically not binding either. As explained in Section 5.2.2, the ramping limits will only affect the recovered prices when they are binding. Hence, the impact of ramping limits is mostly eliminated from the valid training data set, which only includes the data of free generators.

With this model mismatch, we are able to recover 36.28% of all offer prices with an average relative error of 3.47%. Fig. 6 shows the errors in recovered prices at the seventh block where 171 generators are marginal. The errors are compared in the deterministic setting and two noisy settings, where noisy setting I includes additional Gaussian noise to 1% LMPs data with a mean value of 50 $/MWh and a standard deviation of 5 $/MWh, and noise setting II involves the model mismatch discussed above. The deterministic setting features small and random errors with an average close to zero, while in noisy setting I, sparse noise in the training data results in significant deviations of the recovered prices of some generators from their true values. Noisy setting II features higher errors compared to the other two cases, however, they are still within an acceptable level compared to the offer prices. Note that although the errors in Fig. 6 (c) tend to be mostly positive, it is not a universal conclusion since the values of errors in this noisy setting depend on the values of dual multipliers of binding constraints in the SCUC model, which can change over time.

Figure 5: Binding frequency of different sets of constraints.

Figure 6: Errors in recovered prices in deterministic and two noisy settings.

6 CONCLUSION AND FUTURE WORK

This paper presents a data-driven IO approach for recovering marginal offer prices of generators from day-ahead market-clearing results. We formulate the data-driven IO problem as a single-level optimization model, distinct from the bi-level models that are commonly used in the literature. This data-driven IO problem can be efficiently solved using the gradient descent method. Furthermore, We prove that the recovered offer prices converge to a unique global optimum with a sub-linear rate. Numerical experiments on the IEEE 14-bus system and NYISO 1814-bus system validate the efficacy of the proposed approach in both deterministic and noisy settings. The 14-bus test case shows that the use of ℓ₂-norm in the loss function results in faster convergence, while the ℓ₁-norm offers robustness against sparse noise in the training data. In the 1814-bus test case, the data-driven IO recovers approximately 45% of all offer prices using 2163 training data points. The average relative error is 0.88% in deterministic setting and 3.47% considering the mismatch between the DCOPF model and the actual market-clearing process.

A notable limitation of our data-driven IO model is that it assumes complete knowledge of all system parameters except offer prices of generators, which may not be achievable in some real-world power systems. Nevertheless, the presence of wholesale markets and stringent regulatory requirements enhance the likelihood of market transparency. Furthermore, while ISO manuals (such as [37]) provide insights into the market-clearing principles, they do not guarantee complete visibility of the constraints used in the market-clearing process, including out-of-optimization interventions conducted by system operators. As part of our ongoing research, we plan to tackle this issue by implementing a machine learning approach, such as a physics-informed neural network, that can incorporate known constraints while also learning to identify previously unknown constraints from training data. This will significantly reduce the inaccuracies in the recovered prices resulting from model mismatches.

The proposed method for price recovery can lead to increased profits for certain market participants through strategic bidding, but it must not undermine the fairness of power markets. In the event that the price recovery method becomes widespread, the market operator must implement effective measures to prevent market power abuse. Therefore, our future study will concentrate on determining the most efficient policy for market power mitigation and investigating the optimal bidding strategies for market participants. In summary, the game between market participants and market operators will become more intricate and engaging due to the recovery of offer prices.

Furthermore, as depicted in Fig. 1, the energy market comprises a day-ahead market (DAM) and a real-time market (RTM). This paper only studies the price recovery problem in DAM, while the problem is more challenging in RTM due to the spontaneous actions of market participants and the ad-hoc manipulations of market operators. Uncovering the behavior patterns of market participants in RTM holds potential for both market participants seeking to increase their profits and market operators responsible for preventing market power abuse. In the next step, we intend to expand our data-driven IO approach to RTM and ancillary service markets to leverage all available market data.

ACKNOWLEDGMENTS

The authors express their gratitude to Prof. Daniel Bienstock and Dr. Robert Mieth for their assistance in developing the digital twin of the NYISO system, data collection and visualization.

REFERENCES

Akshay Agrawal, Brandon Amos, Shane Barratt, Stephen Boyd, Steven Diamond, and J Zico Kolter. 2019. Differentiable convex optimization layers. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). NIPS, Vancouver, Canada, 1–13.
Sara Ahmadian, Umang Bhaskar, Laura Sanità, and Chaitanya Swamy. 2018. Algorithms for inverse optimization problems. In 26th Annual European Symposium on Algorithms (ESA 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Helsinki, Finland, 1–14.
Ravindra K. Ahuja and James B. Orlin. 2001. Inverse optimization. Operations Research 49, 5 (2001), 771–783.
Brandon Amos and J. Zico Kolter. 2017. Optnet: Differentiable optimization as a layer in neural networks. In 34th International Conference on Machine Learning (ICML 2017). PMLR, Sydney, Australia, 136–145.
Anil Aswani, Zuo-Jun Shen, and Auyon Siddiq. 2018. Inverse optimization with noisy data. Operations Research 66, 3 (2018), 870–892.
Shahar Avin, Haydn Belfield, Miles Brundage, Gretchen Krueger, Jasmine Wang, Adrian Weller, Markus Anderljung, Igor Krawczuk, David Krueger, Jonathan Lebensold, et al. 2021. Filling gaps in trustworthy development of AI. Science 374, 6573 (2021), 1327–1329.
Andreas Bärmann, Sebastian Pokutta, and Oskar Schneider. 2017. Emulating the expert: Inverse optimization through online learning. In 34th International Conference on Machine Learning (ICML 2017). PMLR, Sydney, Australia, 400–410.
Dimitris Bertsimas, Vishal Gupta, and Ioannis Ch Paschalidis. 2015. Data-driven estimation in equilibrium using inverse optimization. Mathematical Programming 153, 2 (2015), 595–633.
Yuexin Bian, Ningkun Zheng, Yang Zheng, Bolun Xu, and Yuanyuan Shi. 2022. Demand response model identification and behavior forecast with OptNet: a gradient-based approach. In 30th ACM International Conference on Future Energy Systems (ACM e-Energy 2022). ACM, virtual, USA, 418–429.
John R. Birge, Ali Hortaçsu, and J. Michael Pavlin. 2017. Inverse optimization for the recovery of market structure from market outcomes: An application to the MISO electricity market. Operations Research 65, 4 (2017), 837–855.
Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press, New York, USA.
Andrew Butler and Roy H Kwon. 2022. Efficient differentiable quadratic programming layers: an ADMM approach. Computational Optimization and Applications (2022), 1–28.
Timothy CY Chan and Neal Kaw. 2020. Inverse optimization for the recovery of constraint parameters. European Journal of Operational Research 282, 2 (2020), 415–427.
Timothy CY Chan, Taewoo Lee, and Daria Terekhov. 2019. Inverse optimization: Closed-form solutions, geometry, and goodness of fit. Management Science 65, 3 (2019), 1115–1135.
Ruidi Chen, Ioannis Ch. Paschalidis, and Michael C. Caramanis. 2017. Strategic equilibrium bidding for electricity suppliers in a day-ahead market using inverse optimization. In 56th IEEE Conference on Decision and Control (CDC 2017). IEEE, Melbourne, Australia, 220–225.
Ruidi Chen, Ioannis Ch. Paschalidis, Michael C. Caramanis, and Panagiotis Andrianesis. 2019. Learning from past bids to participate strategically in day-ahead electricity markets. IEEE Transactions on Smart Grid 10, 5 (2019), 5794–5806.
Chaosheng Dong, Yiran Chen, and Bo Zeng. 2018. Generalized inverse optimization through online learning. In 32nd Conference on Neural Information Processing Systems (NeurIPS 2018). NIPS, Montréal, Canada, 1–10.
B. Espen Eckbo. 2009. Bidding strategies and takeover premiums: A review. Journal of Corporate Finance 15, 1 (2009), 149–178.
Aaron Ferber, Bryan Wilder, Bistra Dilkina, and Milind Tambe. 2020. Mipaal: Mixed integer program as a layer. In 34th AAAI Conference on Artificial Intelligence (AAAI 2020). Springer, Thessaloniki, Greece, 1504–1511.
Illinois Center for a Smarter Electric Grid (ICSEG). 2022. IEEE 14-Bus System. University of Illinois at Urbana-Champaign. Retrieved Dec. 29, 2022 from https://icseg.iti.illinois.edu/ieee-14-bus-system/
Yong Fu and Mohammad Shahidehpour. 2007. Fast SCUC for large-scale power systems. IEEE Transactions on power systems 22, 4 (2007), 2144–2151.
Kimia Ghobadi, Taewoo Lee, Houra Mahmoudzadeh, and Daria Terekhov. 2018. Robust inverse optimization. Operations Research Letters 46, 3 (2018), 339–344.
Udi Helman. 2006. Market power monitoring and mitigation in the US wholesale power markets. Energy 31, 6-7 (2006), 877–904.
Elaheh H. Iraj and Daria Terekhov. 2021. Comparing inverse optimization and machine learning methods for imputing a convex objective function. arXiv preprint arXiv:2102.10742 (2021), 1–17.
Garud Iyengar and Wanmo Kang. 2005. Inverse conic programming with applications. Operations Research Letters 33, 3 (2005), 319–330.
George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. 2021. Physics-informed machine learning. Nature Reviews Physics 3, 6 (2021), 422–440.
Arezou Keshavarz, Yang Wang, and Stephen Boyd. 2011. Imputing a convex objective function. In 2011 IEEE international symposium on intelligent control (ISIC 2011). IEEE, Denver, USA, 613–619.
András Kovács. 2021. Inverse optimization approach to the identification of electricity consumer models. Central European Journal of Operations Research 29, 2 (2021), 521–537.
Gong Li, Jing Shi, and Xiuli Qu. 2011. Modeling methods for GenCo bidding strategy optimization in the liberalized electricity spot market–A state-of-the-art review. Energy 36, 8 (2011), 4686–4700.
Houra Mahmoudzadeh and Kimia Ghobadi. 2022. Learning from Good and Bad Decisions: A Data-driven Inverse Optimization Approach. arXiv preprint arXiv:2207.02894 (2022), 1–36.
Robert Mieth, Yury Dvorkin, and Miguel A Ortega-Vazquez. 2022. Risk-Aware Dimensioning and Procurement of Contingency Reserve. IEEE Transactions on Power Systems (2022), 1–13.
Peyman Mohajerin Esfahani, Soroosh Shafieezadeh-Abadeh, Grani A Hanasusanto, and Daniel Kuhn. 2018. Data-driven inverse optimization with imperfect information. Mathematical Programming 167, 1 (2018), 191–234.
NYISO. 2022. NYISO market clearing results. NYISO. Retrieved Dec. 27, 2022 from http://mis.nyiso.com/public/P-27list.htm
NYISO. 2022. NYISO price data. NYISO. Retrieved Dec. 27, 2022 from https://www.nyiso.com/energy-market-operational-data
NYISO. 2022. Reliability Needs Assessment Report. NYISO. Retrieved Dec. 27, 2022 from https://www.nyiso.com/documents/20142/2248793/2022-RNA-Report.pdf
Richard P. O'Neill, Paul M. Sotkiewicz, Benjamin F. Hobbs, Michael H. Rothkopf, and William R. Stewart Jr. 2005. Efficient market-clearing prices in markets with nonconvexities. European journal of operational research 164, 1 (2005), 269–285.
NYISO Energy Market Operations. 2022. Day-Ahead Scheduling Manual. NYISO. Retrieved Jan. 2, 2023 from https://www.nyiso.com/documents/20142/2923301/dayahd_schd_mnl.pdf/0024bc71-4dd9-fa80-a816-f9f3e26ea53a
Remigijus Paulavičius and Julius Žilinskas. 2006. Analysis of different norms and corresponding Lipschitz constants for global optimization. Technological and Economic Development of Economy 12, 4 (2006), 301–306.
Alexander Rakhlin, Ohad Shamir, and Karthik Sridharan. 2011. Making gradient descent optimal for strongly convex stochastic optimization. arXiv preprint arXiv:1109.5647 (2011), 1–21.
Simon Risanger, Stein-Erik Fleten, and Steven A. Gabriel. 2020. Inverse equilibrium analysis of oligopolistic electricity markets. IEEE Transactions on Power Systems 35, 6 (2020), 4159–4166.
Carlos Ruiz, Antonio J. Conejo, and Dimitris J. Bertsimas. 2013. Revealing rival marginal offer prices via inverse optimization. IEEE Transactions on Power Systems 28, 3 (2013), 3056–3064.
Javier Saez-Gallego and Juan M. Morales. 2017. Short-term forecasting of price-responsive loads using inverse optimization. IEEE Transactions on Smart Grid 9, 5 (2017), 4805–4814.
Javier Saez-Gallego, Juan M. Morales, Marco Zugno, and Henrik Madsen. 2016. A data-driven bidding model for a cluster of price-responsive consumers of electricity. IEEE Transactions on Power Systems 31, 6 (2016), 5001–5011.
NYISO Stakeholder Services. 2021. Guide 01 Market Participants User's Guide. NYISO. Retrieved Jan. 2, 2023 from https://www.nyiso.com/documents/20142/3625950/mpug.pdf
Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding machine learning: From theory to algorithms. Cambridge university press, New York, USA.
Mathangi Srinivasan Kumar. 2022. Locational Based Marginal Pricing. NYISO. Retrieved Jan. 2, 2023 from https://www.nyiso.com/documents/20142/3037451/3-LMBP.pdf/f7682e03-e921-eaab-09bf-690524b5ade6
Yingcong Tan, Andrew Delong, and Daria Terekhov. 2019. Deep inverse optimization. In 16th International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR 2019). Springer, Thessaloniki, Greece, 540–556.
Jérôme Thai and Alexandre M. Bayen. 2018. Imputing a variational inequality function or a convex objective function: A robust approach. J. Math. Anal. Appl. 457, 2 (2018), 1675–1695.
Trung Vu and Raviv Raich. 2022. On asymptotic linear convergence of projected gradient descent for constrained least squares. IEEE Transactions on Signal Processing 70 (2022), 4061–4076.
xxx. 2023. Data supplement: NYISO 1814-bus system. Retrieved Feb. 06, 2023 from https://github.com/xxx/data_supplement_NYISO
Jianzhong Zhang and Chengxian Xu. 2010. Inverse optimization for linearly constrained convex separable programming problems. European Journal of Operational Research 200, 3 (2010), 671–679.

A ESTABLISHED THEOREMS

A.1 Theorem for the Convergence of GD

Theorem A.1 (Convergence of GD for convex and Lipschitz continuous functions) Let f be a convex, ρ -Lipschitz function and let $w^* \in {\rm {arg}} \min _{w \in \mathcal {W}} f(w)$, where $\Vert \mathcal {W}\Vert \le B$. If the GD algorithm is applied on f for T iterations with a learning rate of $\eta = \sqrt {\frac{B^2}{\rho ^2 T}}$, then the output vector $\hat{w}$ satisfies:

\begin{equation} f(\hat{w}) - f(w^*) \le \frac{B \rho }{\sqrt {T}}. \end{equation}
(29)

Since this $\mathcal {O}(1/\sqrt T)$ convergence rate of GD is well-established in the literature, e.g., in [45], we omit the proof of Theorem A.1.

A.2 Theorems for Existence and Uniqueness of Optimal Solutions

Theorem A.2 (Existence of global optimum) If the objective function is continuous and the feasible region is closed and bounded, then there exists a global optimum.

Theorem A.3 (Uniqueness of optimal solutions to strictly convex functions) If the objective function is strictly convex and the feasible region is convex, then there exists at most one optimal solution.

We omit the proof of Theorems A.2 and A.3 in this paper since they are rigorously proved in [11].

B PROOFS

B.1 Proof of Lemma 4.2

We first clarify the definition of Lipschitz continuity and prove three lemmas relevant to the proof of Lemma 4.2.

Definition B.1 (ρ -Lipschitz functions) Let $\mathcal {W} \subseteq \mathbb {R}^n$ be a convex set. A function $f:\mathbb {R}^n \rightarrow \mathbb {R}$ is ρ -Lipschitz over $\mathcal {W}$ if there exists a constant ρ > 0 that for $\forall w_1, w_2 \in \mathcal {W}$, we have |f(w₁) − f(w₂)| ≤ ρ‖w₁ − w₂‖_p.

Note that ‖ · ‖_p in the definition of Lipschitz continuity is usually ℓ₂-norm, but other norms are also applicable [38]. The statements of "f being ρ -Lipschitz" and "f being Lipschitz continuous with Lipschitz constant ρ " are equivalent.

Lemma B.2 (Lipschitz continuity of norms) Every norm on $\mathbb {R}^n$ is 1-Lipschitz with respect to the same kind of norm.

Proof. Let $f(x) = \left\Vert x \right\Vert _p = (\sum \nolimits _{i=1}^n |x_i|^p)^{1/p}$ be the ℓ_p-norm of x. Based on the triangle inequality of norms (i.e., $ \Vert x+y\Vert _p \le \Vert x \Vert _p + \Vert y \Vert _p,\ \forall x,y \in \mathbb {R}^n$), we know that ‖x‖_p = ‖x − y + y‖_p ≤ ‖x − y‖_p + ‖y‖_p. Therefore, ‖x‖_p − ‖y‖_p ≤ ‖x − y‖_p. Similarly we have ‖y‖_p = ‖y − x + x‖_p ≤ ‖y − x‖_p + ‖x‖_p = ‖x − y‖_p + ‖x‖_p. Therefore, ‖y‖_p − ‖x‖_p ≤ ‖x − y‖_p. In summary, we have |‖x‖_p − ‖y‖_p| ≤ ‖x − y‖_p, i.e., f is 1-Lipschitz with respect to the ℓ_p-norm.□

Lemma B.3 (Lipschitz continuity of differentiable functions) An everywhere differentiable function $f: \mathbb {R} \rightarrow \mathbb {R}$ is Lipschitz continuous if it has bounded first-order derivative, and the corresponding Lipschitz constant is $\sup |f^{\prime }(x)|$.

Proof. Let f be a continuous function on the closed interval [a, b] and differentiable on the open interval (a, b). According to the mean value theorem, there exist a ξ ∈ (x, y) such that f(x) − f(y) = f′(ξ)(x − y). Taking the absolute values yields |f(x) − f(y)| ≤ ρ|x − y|, where ρ is the supremum of |f′(x)| over (x, y).□

Lemma B.4 (Lipschitz continuity of compositions) The composition of a ρ₁-Lipschitz function and a ρ₂-Lipschitz function is a ρ₁ρ₂-Lipschitz function.

Proof. Assume $g: \mathbb {R}^n \rightarrow \mathbb {R}$ is ρ₁-Lipschitz continuous and $h: \mathbb {R} \rightarrow \mathbb {R}$ is ρ₂-Lipschitz continuous. Let f(x) = h(g(x)). For $\forall a, b \in \mathbb {R}^n$, we have |f(a), f(b)| = |h(g(a)), h(g(b)| ≤ ρ₂|g(a), g(b))| ≤ ρ₂ρ₁‖a, b‖. Therefore, f is ρ₁ρ₂-Lipschitz continuous.□

Proof of Lemma 4.2: For convexity: According to [11], |x|^p is convex on $\mathbb {R}$ for ∀p ≥ 1, and the non-negative weighted sum of convex functions is convex. Therefore, f is convex since it is the sum of convex functions.

For Lipschitz continuity: Let $g:\mathbb {R}^n \rightarrow \mathbb {R}$ be the ℓ_p-norm and $h:\mathbb {R} \rightarrow \mathbb {R}$ be the $p^\text{th}$ power function, then f(x) = h(g(x)). According to Lemmas B.2, g is 1-Lipschitz continuous. Since $|x_i| \in [0,\bar{x}]$, we know that $g(x) = (\sum \nolimits _{i=1}^n |x_i|^p)^{1/p}$ is bounded by 0 and $(\sum \nolimits _{i=1}^n \bar{x}^p)^{1/p} = \bar{x} n^{1/p}$. Meanwhile, based on Lemma B.3, it can be proved that h(x) = x^p is Lipschitz continuous over interval $[0, \bar{x} n^{1/p}]$ for ∀p ≥ 1, and the Lipschitz constant is $p \bar{x}^{p-1} n^{p-1/p}$. Therefore, based on Lemma B.4, we know that f is ρ -Lipschitz continuous, where $p \bar{x}^{p-1} n^{p-1/p}$. $ \square$

B.2 Proof of Lemma 4.6

We first recall the definition of strict convexity and its first and second-order conditions:

Definition B.5 (Strictly convex functions) Let $\mathcal {W} \subseteq \mathbb {R}^n$ be a convex set. If a function $f:\mathbb {R}^n \rightarrow \mathbb {R}$ satisfies f(kw₁ + (1 − k)w₂) < kf(w₁) + (1 − k)f(w₂) for $\forall w_1, w_2 \in \mathcal {W}$ and k ∈ (0, 1), it is strictly convex. The first-order condition for strict convexity is f(w₂) > f(w₁) + ∇f(w₁)^T(w₂ − w₁) for $\forall w_1, w_2 \in \mathcal {W}$, and the second-order condition is ∇²f(w)≻0 for $\forall w \in \mathcal {W}$.

Proof of Lemma 4.6: Since $f(x) = \sum \nolimits _{i=1}^n |x_i|^p=\sum \nolimits _{i=1}^n x_i^p$ when x_i ≥ 0 for ∀i, we have $\partial f/\partial {x_i} = p x_i^{p-1}$ and $\partial ^2 f/\partial {x_i}^2 = p (p-1) x_i^{p-2}> 0$ for ∀x_i > 0 and ∀p > 1. Since ∂²f/∂x_i∂x_j = 0, ∀i ≠ j, ∇²f(x) is a diagonal matrix where the diagonal elements ∂²f/∂x_i², ∀i are the corresponding eigenvalues of ∇²f(x). Therefore, ∇²f(x)≻0 since all of its eigenvalues are positive, which means that the second-order condition for strict convexity is satisfied. $ \square$

C DATA FOR CASE STUDY

C.1 Data of IEEE 14-bus system

The IEEE 14-bus system is depicted in Fig. 7 [20] and the parameters of the five generators are listed in Table 3. The power generation cost of each generator follows a quadratic equation c₀ + c₁x + c₂x², where x is the output power of the generator.

Table 3: Parameters of generators in IEEE 14-bus system

No.	x^max/x^min (MW)	c₀ ($)	c₁ ($/MWh)	c₂ ($/MWh²)
G1	100/0	2	0.05	0.002
G2	100/0	5	0.10	0.003
G3	100/0	8	0.15	0.004
G4	100/0	12	0.20	0.005
G5	100/0	15	0.30	0.006

C.2 Data of NYISO 1814-bus system

The NYISO system, consisting of 1814 buses (black dots), 2207 lines, 362 generators, and 33 wind farms (blue dots), is shown in Fig. 8. Colors of the transmission lines reflect power flows, with red denoting heavy flow and green indicating light flow. Specific parameters of this system can be found in [50]. Fig. 9 shows the heat-map of LMPs at 4 p.m. on Aug. 28, 2018, which is a moment with particularly heavy load and high LMPs.

Figure 9: LMP heat-map of NYISO system at 4 p.m. on Aug. 28, 2018 (Unit of LMPs: $/MWh).

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

e-Energy '23, June 20–23, 2023, Orlando, FL, USA