Learning to Price Homogeneous Data

Keran Chen
UW-Madison
kchen429@wisc.edu Joon Suk Huh
UW-Madison
jhuh23@wisc.edu Kirthevasan Kandasamy
UW-Madison
kandasamy@cs.wisc.edu

Abstract

We study a data pricing problem, where a seller has access to $N$ homogeneous data points (e.g. drawn i.i.d. from some distribution). There are $m$ types of buyers in the market, where buyers of the same type $i$ have the same valuation curve $v_{i}:[N]\rightarrow[0,1]$ , where $v_{i}(n)$ is the value for having $n$ data points. A priori, the seller is unaware of the distribution of buyers, but can repeat the market for $T$ rounds so as to learn the revenue-optimal pricing curve $p:[N]\rightarrow[0,1]$ . To solve this online learning problem, we first develop novel discretization schemes to approximate any pricing curve. When compared to prior work, the size of our discretization schemes scales gracefully with the approximation parameter, which translates to better regret in online learning. Under assumptions like smoothness and diminishing returns which are satisfied by data, the discretization size can be reduced further. We then turn to the online learning problem, both in the stochastic and adversarial settings. On each round, the seller chooses an anonymous pricing curve $p_{t}$ . A new buyer appears and may choose to purchase some amount of data. She then reveals her type only if she makes a purchase. Our online algorithms build on classical algorithms such as UCB and FTPL, but require novel ideas to account for the asymmetric nature of this feedback and to deal with the vastness of the space of pricing curves. Using the improved discretization schemes previously developed, we are able to achieve $\widetilde{\mathcal{O}}(m\sqrt{T})$ regret in the stochastic setting and $\widetilde{\mathcal{O}}(m^{\nicefrac{{3}}{{2}}}\sqrt{T})$ regret in the adversarial setting.

1 Introduction

Due to the rise in popularity of machine learning, there is an increased demand for data. However, not all users of data have the wherewithal to collect data on their own, and have to rely on data marketplaces to acquire the data they need. For example, a materials data platform (e.g. [17]), may have collected vast amounts of data from various proprietary sources. Materials scientists in smaller organizations and academia, who do not have large experimental apparatuses, may wish to purchase this data to aid in their research. Similarly, small businesses may wish to purchase customer data for advertising and product recommendations [5, 4], while small technology companies may wish to purchase data about cloud operations to optimize their computing infrastructure [3, 2].

Model. Motivated by the emergence of such data marketplaces, we study the following online data pricing problem. A seller has access to $N$ homogeneous data points, (e.g. drawn i.i.d. from some distribution). He wishes to sell the data to a sequence of distinct buyers over $T$ rounds, and intends to achieve large revenue. There are $m$ types of buyers in the data marketplace, with all buyers in type $i$ having the same valuation curve $v_{i}:[N]\rightarrow[0,1]$ for the data, where $v_{i}(n)$ represents the buyer’s value for having $n$ points. As data is homogeneous, we can treat an agent’s value as a function of the amount of data $n$ (we will illustrate this in the sequel). Valuation curves are monotone non-decreasing, as more data is better. At each round $t$ , the seller chooses a price curve $p_{t}:[N]\rightarrow[0,1]$ , where $p_{t}(n)$ is the price to the buyer for purchasing $n$ data points. Then a buyer with type $i_{t}$ arrives and purchases an amount of data that maximizes her utility (value minus price), provided that she can achieve non-negative utility. A buyer will reveal her type to the seller only if she makes a purchase, and only after she makes the purchase. The seller has knowledge of valuation curves of the $m$ types, but does not know the distribution $q$ over types (stochastic setting), or the buyer sequence (adversarial setting). Moreover, he cannot practice non-anonymous (discriminatory) pricing, as he needs to choose the pricing curve $p_{t}$ without knowledge of the buyer’s type on that round.

While there is extensive research on revenue-optimal pricing and learning to price, data marketplaces merit special attention, both due to their recent emergence and the unique characteristics of data. Typically the number of data $N$ (number of goods) is very large, but data usually satisfies additional properties such as smoothness (an agent’s value does not increase significantly with a small amount of additional data) and diminishing returns (additional data is more valuable when a buyer has less data). To illustrate further, note that two steps are essential to develop an effective online learning solution for data pricing. (1) First, we need to solve the planning problem, i.e. find a revenue-optimal pricing curve when the type distribution $q$ is known. (2) Second, when $q$ is unknown, we need to combine the algorithm in step (1) with estimates for $q$ to maximize long-term revenue.

Methods in the existing literature fall short in both steps. (1) When the type distribution $q$ is known, the data pricing problem resembles an ordered item pricing problem, which is known to be NP-hard [12, 24]. Hence prior work has aimed at approximating the optimal pricing curves via discretization schemes. Unfortunately, existing discretization schemes have poor, often exponential, dependence on the approximation parameter $\epsilon$ . However, achieving sublinear regret in online learning requires choosing $\epsilon$ that vanishes with longer time horizons, i.e. $\epsilon\rightarrow 0$ as $T\rightarrow\infty$ . Therefore, directly using existing discretization schemes in an online setting leads to poor statistical and computational properties of the associated online algorithm. This requires us to leverage the above properties of data to design discretization schemes with better dependence on $\epsilon$ . (2) While there is prior work on learning optimal prices [32, 26, 21], these techniques either fall short of addressing the complexities in our setting, or fail to account for the properties of data, and hence do not scale gracefully when the amount of data $N$ is very large. Moreover, in our online learning setup, the seller faces a trade-off between setting high prices to maximize instantaneous revenue versus setting low prices so as to guarantee a purchase, which results in the buyer revealing their type, which in turn can be helpful in future rounds. Prior work has studied this asymmetric feedback model only in single-item markets [22, 46] which is significantly simpler, and only in the stochastic setting.

1.1 Summary of our contributions

Our contributions in this work are threefold: (1) First, in §3, we develop discretization schemes for revenue-optimal data pricing under a variety of assumptions, which we will use later in our online learning schemes. (2) In §4, we study learning a revenue-optimal price in a stochastic setting, where the customer types on each round are drawn from a fixed but unknown distribution $q$ . (3) Finally, in §5, we study online learning when the buyer types are chosen by an oblivious adversary.

1. Discretization (approximation) schemes for revenue-optimal data pricing. Assuming only monotonicity, we show that there is a discretization of size $\widetilde{\mathcal{O}}((N/\epsilon)^{m})$ which is an $\mathcal{O}(\epsilon)$ additive approximation to any pricing curve. When compared to prior work [13, 24], our discretization scheme has smaller dependence on $\epsilon^{-1}$ when the number of types $m$ is small (see Table 1). This will be useful, both statistically and computationally, when we study the online setting, as we need to choose $\epsilon\rightarrow 0$ as $T\rightarrow\infty$ to achieve sublinear regret. This is still quite large in real-world data marketplaces, where $N$ may be very large. Hence, we also study two other assumptions. First, when valuations are smooth, satisfying an $L$ -Lipschitz-like condition, we construct a discretization of size $\widetilde{\mathcal{O}}\left((L/\epsilon^{2})^{m}\right)$ , which has no dependence on $N$ . Next, under a diminishing returns condition, we construct a discretization of size $\mathcal{O}\left(J^{m}\epsilon^{-3m}\log N^{m}\right)$ , with only has polylog dependence on $N$ .

Key algorithmic insights. We first show that when there are only $m$ types, for any price function $p:[N]\rightarrow[0,1]$ , there exists an “m-step” price function $p^{\prime}$ whose expected revenue is at least as much as that of $p$ on any type distribution $q$ . An $m$ -step function is a non-decreasing function where $p(n+1)$ and $p(n)$ differ at most $m$ times. This allows us to focus on $m$ -step functions, significantly narrowing the space of pricing functions when $m\ll N$ . We then consider discretizations of the data space $[N]$ and valuations $[0,1]$ and apply this insight to construct discretizations of pricing curves.

Algorithm	Assumptions	Size of discretization	Reference
Hartline and Koltun [24]	–	$\widetilde{\mathcal{O}}(2^{N}\epsilon^{-N})$	–
Chawla et al. [13]	M	$N^{\mathcal{O}\left(\epsilon^{-2}\log\epsilon^{-1}\right)}$	–
Algorithm 1 (ours)	M, F	$\widetilde{\mathcal{O}}(N^{m}\epsilon^{-m})$	Theorem 3.1
Algorithm 5 (ours)	M, F, S	$\widetilde{\mathcal{O}}\left(L^{m}\epsilon^{-2m}\right)$	Theorem 3.2
Algorithm 2 (ours)	M, F, D	$\widetilde{\mathcal{O}}\left(J^{m}\epsilon^{-3m}\log^{m}N\right)$	Theorem 3.3

Table 1: Comparison of discretization (approximation) schemes of prior work and our methods under various assumptions. All methods achieve a

\mathcal{O}(\epsilon)

additive approximation to any pricing curve. Here, M means Monotonicity, F means that there are a Finite (

m

) number of types, S means that the valuation curves satisfy a

L

-Lipschitz-like Smoothness condition (Assumption 1), and D means that they satisfy a Diminishing returns condition (Assumption 2). The

\widetilde{\mathcal{O}}

notation suppresses log dependencies when there is already a polynomial dependence on a parameter. Prior work has exponential dependence in either

N

\epsilon^{-1}

. We wish to do better since (i) typically, the number of data

N

is very large and (ii) we need

\epsilon\rightarrow 0

T\rightarrow\infty

to achieve sublinear regret.

2. Learning to price in the stochastic setting. Next, we turn to the online learning problem described in the beginning in a stochastic setting. On each round, our algorithm computes an upper confidence bound (UCB) [8, 37] on the revenue for each price curve in the discretization previously developed; we then choose the price curve with the highest UCB. There are two challenges in realizing this scheme: First, naively maintaining UCBs for each price leads to large confidence intervals, and hence large regret as the size of the discretization is still quite large; instead, we construct confidence intervals on estimates of the type distribution, and translate them to UCBs for the revenue. Second, due to the asymmetric nature of the feedback, the construction and analysis of these confidence intervals is delicate, and requires novel ideas. As summarized in Table 2, this algorithm achieves a $\widetilde{\mathcal{O}}(m\sqrt{T})$ bound on the regret for any discretization scheme, including those from prior work. In the stochastic setting, the key advantage of our discretization schemes is computational.

3. Learning to price in the adversarial setting. Next, we study learning in an adversarial setting. Our algorithm builds on the Follow-the-Perturbed-leader (FTPL) [30], but is adapted to account for the fact that there may be no feedback on all rounds. For this, we use the information we have about the valuation curves to keep track of which customers would not have made a purchase given a price curve. If a purchase is made and we observe feedback, we use the usual FTPL update, but if not, we reward each pricing curve with the sum of revenue of all types that would not purchase in that current round. Table 2 shows the regret and time complexity of this learning method when paired with various discretization schemes. In the adversarial setting, our discretization schemes offer both computational and statistical advantages when compared to prior work.

Setting	Assumptions	Regret bound	Complexity per iteration	Reference
Stochastic	M, F	$\widetilde{\mathcal{O}}\left(m\sqrt{T}\right)$	$\widetilde{\mathcal{O}}\!\left(\left(\frac{N}{m}\right)^{m}T^{\,\nicefrac{{m}}% {{2}}}\right)$
	M, F, S		$\widetilde{\mathcal{O}}\left(\left(LT\right)^{m}\right)$	Theorem 4.1
	M, F, D		$\widetilde{\mathcal{O}}(J^{m}T^{\nicefrac{{3m}}{{2}}})$
Adversarial	M, F	$\widetilde{\mathcal{O}}\left(m^{\nicefrac{{3}}{{2}}}\sqrt{T}\right)$	$\widetilde{\mathcal{O}}\!\left(\left(\frac{N}{m}\right)^{m}T^{\,\nicefrac{{m}}% {{2}}}\right)$
	M, F, S		$\widetilde{\mathcal{O}}\left(\left(LT\right)^{m}\right)$	Theorem 5.1
	M, F, D		$\widetilde{\mathcal{O}}\left(J^{m}T^{\,\nicefrac{{3m}}{{2}}}\right)$

Discretization method	Assumptions	Complexity per iteration	Regret (Adversarial)
Hartline and Koltun [24]	F	$\widetilde{\mathcal{O}}(2^{N}\epsilon^{-N})$	$\widetilde{\mathcal{O}}(m\sqrt{TN})$
Chawla et al. [13]	M, F	$N^{\mathcal{O}\left(\epsilon^{-2}\log\epsilon^{-1}\right)}$	$\widetilde{\mathcal{O}}\left(mT^{\nicefrac{{3}}{{4}}}\right)$

Table 2: Comparison of regret and time complexity of our online learning methods when paired with our discretization schemes and schemes from prior work. See Table 1 for a description of the assumptions. All methods, including [24, 13] achieve

\mathcal{O}(m\sqrt{T})

regret in the stochastic setting.

1.2 Related work

Dynamic pricing. The online posted-price mechanism, also known as dynamic pricing, is a central research area in algorithmic market design [32, 18]. In the most classical setting [32], the seller sets a price for an item in each round, and a buyer purchases the item only if their valuation exceeds the posted price. While several extensions of this setting have been explored for both parametric [31, 19, 11, 27, 28, 45] and non-parametric [10, 43, 16, 38, 39] demands, most focus on single-parameter demands, i.e., selling a single item to buyers. Our data pricing problem is multi-parameter, as demands are parameterized by multiple outcomes, i.e. the number of data points.

Bayesian unit-demand pricing problem. Formally, our data pricing problem is a variant of the Bayesian Unit-demand Pricing Problem (BUPP) [12]. BUPP addresses the problem of (offline) revenue maximization over a known distribution of unit-demand buyers, meaning they want to buy at most one item from the inventory. In BUPP, a seller has $N$ distinct items to sell to a unit-demand buyer whose valuations are $v=(v_{1},\dots,v_{N})$ , where $v_{i}$ is the value of the $i$ th item. Given prices ${p_{i}},\ {i\in[N]}$ , the unit-demand buyer purchases a single item $i\in[N]$ that maximizes their utility: $v_{i}-p_{i}$ . Assuming the valuation profile $v$ follows a known distribution $D$ , the goal of BUPP is to find the best prices ${p_{i}}_{i\in[N]}$ that maximize the seller’s expected revenue.

Our data pricing problem is a variant of BUPP in two ways: (1) We study the sequential setting where type distributions are unknown, while valuation profiles for each type are known, and (2) We assume monotonic values $v_{1}\leq\dots\leq v_{N}$ , which is natural in data pricing. Unfortunately, BUPP is a computationally intractable problem, as is ours. BUPP is known to be NP-hard even when $D$ is a product distribution [15]. Moreover, even assuming that values are monotonic (i.e., $v_{1}\leq\dots\leq v_{N}$ ), the problem remains (strongly) NP-hard [13]. Therefore, we aim to provide a reasonably efficient no-regret algorithm for our problem, especially when the number of types $m$ is a fixed constant.

The previous works most relevant to our paper are Hartline and Koltun [24] and Chawla et al. [13], which study offline revenue maximization for unit-demand buyers. Buyers in our problem are also unit-demand, as each amount of data points can be seen as an individual item. Revenue maximization for unit-demand buyers is known to be computationally intractable [23], even with ordered (monotonic) buyer values [13], leading these works to focus on approximation algorithms. Hartline and Koltun [24] proposed an approximation algorithm with near-linear runtime in the number of buyers, given a fixed number of items. Chawla et al. [13] introduced a polynomial-time approximation scheme (PTAS) for unit-demand buyers with monotonic values. In this work, we extend the framework to the online setting with partial feedback, which has more practical implications.

Market design for data-sharing. In recent years, there has been a plethora of work devoted to algorithmic market design for data sharing [6, 7, 29, 42]. These works provide ingenious solutions to challenges unique to the data market, such as free replicability and the difficulty of valuation due to the combinatorial nature of data. Except for Agarwal et al. [6], the above-cited solutions are inherently offline or single-shot. While we focus on a simplified yet relevant setting where data comes from a single source, resulting in monotonic valuations, in this work, we tackle the problem in a sequential, dynamic setting, which has practical importance. In contrast to our approach, Agarwal et al. [6] considered the price to be a constant (i.e., a scalar rather than a price vector) to address the inherent computational intractability of multi-dimensional pricing. Instead, we maintain the price as a vector (i.e., a price function) but focus on cases where the valuation function satisfies natural properties such as monotonicity, smoothness, and diminishing returns.

2 Problem setting, assumptions, and challenges

A seller has $N$ homogeneous data points. There are $m$ types of buyers who wish to purchase this data. A buyer of type $i\in[m]$ has a valuation curve $v_{i}:[N]\rightarrow[0,1]$ , where $v_{i}(n)$ is her value for $n$ data points. We will assume $v_{i}(n)$ is non-decreasing as more data is valuable, and further that $v_{i}(0)=0$ .

Example 1.

To motivate this model, consider a seller with $N$ ordered data points $\{x_{1},\dots,x_{N}\}$ , drawn i.i.d. from a distribution $D$ . If a buyer purchases $n$ points, she receives the first $n$ points, $X_{n}=\{x_{1},\dots,x_{n}\}$ . Her ex-post value $\widetilde{v}_{i}(X_{n})$ may represent the accuracy of her ML model trained with $X_{n}$ . However, as the buyer has not seen the data before the purchase, she does not know which specific points she will receive, and hence her (ex-ante) value $v_{i}(n)=\mathbb{E}_{X_{n}}[\widetilde{v}_{i}(X_{n})]$ is the expected model accuracy when $n$ i.i.d points are drawn from $D$ . The different types could be buyers who use the data for different tasks or models. For instance, with ImageNet’s [20], $N\approx$ 1.4 million data points, different types of buyers could perform different learning tasks such as object detection, identification, and segmentation, and/or train different models such as AlexNet [35], ResNet [25], and GoogLeNet [41]. Both empirically and theoretically, for many learning tasks, $v_{i}(n)$ is non-decreasing, and satisfies additional characteristics such as smoothness and/or diminishing returns.

Pricing curves, buyer utility, and buyer purchase model. Let $p:[N]\rightarrow[0,1]$ be a pricing curve chosen by the seller. Let $\mathcal{P}\stackrel{{\scriptstyle\Delta}}{{=}}\{p:[N]\rightarrow[0,1]:\;p(0)=0\}$ denote the set of all pricing curves. If a buyer purchases $n$ points, her utility is $u_{i}(n)=v_{i}(n)-p(n)$ . If a buyer can achieve non-negative utility, i.e. $v_{i}(n)\geq p(n)$ for some $n\in[N]$ , she will purchase an amount of data to maximize her utility. To fully specify the buyer’s purchase model, we will assume that when there are multiple $n$ which maximizes her utility, she will choose the largest such $n$ . Formally, for a given pricing curve $p$ , a buyer of type $i$ will purchase $n_{i,p}$ points where,

\displaystyle n_{i,p}\stackrel{{\scriptstyle\Delta}}{{=}}\begin{cases}\;0&% \text{if $v_{i}(n)<p(n)$ for all $n\in[N]$,}\\ \;\max\big{\{}\mathop{\mathrm{argmax}}_{n\in[N]}\left(v_{i}(n)-p(n)\right)\big% {\}}\quad\quad&\text{otherwise}.\end{cases}

(1)

Optimal revenue. It follows that the revenue from a buyer of type is $p(n_{i,p})$ . Let $q=(q_{1},\dots,q_{m})$ be the distribution of the buyers. Under this distribution $q$ , the expected revenue $\mathrm{rev}(p)$ for a price curve $p$ , the optimal price $p^{\textrm{\tiny OPT}}$ , and the optimal revenue $\mathrm{OPT}$ as follows:

\displaystyle\mathrm{rev}(p)\stackrel{{\scriptstyle\Delta}}{{=}}\sum_{i=1}^{m}% q_{i}\cdot p(n_{i,p}),\hskip 28.90755ptp^{\textrm{\tiny OPT}}\stackrel{{% \scriptstyle\Delta}}{{=}}\mathop{\mathrm{argmax}}_{p\in\mathcal{P}}\mathrm{rev% }(p),\hskip 28.90755pt\mathrm{OPT}\stackrel{{\scriptstyle\Delta}}{{=}}\mathrm{% rev}(p^{\textrm{\tiny OPT}}).

(2)

We have omitted the dependence on $q$ in $\mathrm{rev}$ , $p^{\textrm{\tiny OPT}}$ , and $\mathrm{OPT}$ . There is no closed-form solution to finding the optimal pricing curve, even when $q$ is known. Therefore, in §3, we explore discretization methods to approximate $p^{\textrm{\tiny OPT}}$ , which will then be used in §4 and §5 to develop online learning algorithms. Unfortunately, the size of this discretization can be very large in $N$ and $m$ without further assumptions. Therefore, we also consider two additional commonly satisfied conditions by data.

Our first such assumption states that buyer valuation curves satisfy a Lipschitz-like smoothness condition with Lipschitz constant $L/N$ . We use $L/N$ instead of $L$ since the number of data has a range $[0,N]$ , while the valuations only have a range $[0,1]$ . This condition states that a buyer’s valuation does not change significantly if she only purchases a few additional points.

Assumption 1 (Smoothness, S).

For all $n,n^{\prime}\in[N]$ , we have $\;v_{i}(n+n^{\prime})-v_{i}(n)\leq\frac{L}{N}n^{\prime}$ .

Our second condition is based on the fact that data typically exhibits diminishing returns [34, 33]. This means that an additional data point is more valuable when there is less data, i.e. $v_{i}(n+1)-v_{i}(n)$ is decreasing with $n$ . We will in fact make a stronger assumption, and justify it below.

Assumption 2 (Diminishing returns, D).

There exists some $J>0$ such that, for all types $i\in[m]$ , and for all $n\in[N]$ , we have $v_{i}(n+1)-v_{i}(n)\leq\frac{J}{n}$ .

Assumption 2 quantifies the rate of decrease of diminishing returns. Following Example 1, the valuation (accuracy) curves for many learning problems take the form $v_{i}(n)=\alpha-\beta n^{-\gamma}$ ; for instance, for binary classification in a VC class $\mathcal{H}$ , $\alpha$ may be the best accuracy in $\mathcal{H}$ , $\beta\in\mathcal{O}({\sqrt{d_{\mathcal{H}}}})$ where $d_{\mathcal{H}}$ is the VC dimension, and $\gamma=1/2$ [40]; similarly, for nonparametric regression of a twice differentiable function, $\alpha$ and $\beta$ are constants while $\gamma=2/5$ [44]. In such cases, Assumption 2 is satisfied with $J=\beta\gamma$ . Note that neither assumption subsumes the other: a non-concave Lipschitz function will not satisfy Assumption 2, while a suitable $L$ for a function which satisfies Assumption 2 may need to be very large for Assumption 1 to hold for small $n$ .

2.1 Learning to price in online settings

In this work, we will also study how a seller may learn to maximize revenue. In our learning problem, the seller is aware of the valuation curves $\{v_{i}\}_{i}$ of each type, but does not know the distribution of types (stochastic setting) or there may be no such distribution (adversarial setting).

Setup. The seller repeats the data market for $T$ rounds. At the beginning of each round, he chooses some price curve $p_{t}\in\mathcal{P}$ . After the seller has chosen $p_{t}$ , a new buyer of type $i_{t}\in[m]$ appears and purchases $n_{t}=n_{i_{t},p_{t}}$ amount of data (see (1)). The buyer is aware of her own valuation curve. If she makes a purchase, that is if $n_{t}>0$ , she pays $p_{t}(n_{t})$ to the seller and reveals her type $i_{t}$ . Otherwise, the buyer will make no payment and not reveal her type.

We have assumed that a priori, the seller is aware of the buyer valuation curves $\{v_{i}\}_{i\in[m]}$ , and that buyers are aware of their own valuation curves. In Example 1, a seller can profile how different machine learning models perform with different amounts of data and publish them ahead of time. The buyers can also gauge their value from these curves, even though they do not have access to the data. Next, we have also assumed that buyers will reveal their type after the purchase. In modern machine learning as a service platforms [1, 17, 4], buyers directly run their jobs in the seller’s computing platform, so the seller can observe the buyers job type directly. Even if this is not the case, sellers can elicit this information via questionnaires and reviews from customers who have made a purchase [22].

Challenges. Despite these assumptions, the learning problem remains challenging for two main reasons. First, the space of price curves is vast: discretizing the valuations in $[0,1]$ into $K$ bins, still leaves $\mathcal{O}(K^{N})$ possible price curves, which is both statistically and computationally intractable, especially for large $N$ . Second, in addition to the exploration-exploitation trade-off usually encountered in sequential decision-making, the seller faces a tension between high instantaneous revenue and information acquisition: setting high prices can yield high immediate revenue if a purchase occurs, but it also increases the risk of no purchase, resulting in no revenue and crucially no feedback about the buyer type which could help him in future rounds. This trade-off was recently studied for single-item markets in a stochastic setting [22, 46], but is more complex in our multi-item problem. Moreover, to our knowledge, no existing work addresses this asymmetric feedback model in an adversarial setting, even for single-item markets. Next, we describe the buyer arrival model and define the regret for the learning problem in both stochastic and adversarial settings.

Stochastic setting. Here, there is some fixed but unknown distribution of types $q$ . On each round, a buyer of type $i_{t}\sim q$ is drawn independently. The optimal expected revenue $\mathrm{OPT}$ under type distribution $q$ is as defined in (2). The regret $R_{T}$ is as defined below. We wish to design algorithms which have small expected regret $\mathbb{E}[R_{T}]$ , where the expectation accounts for both the sampling of types $i_{t}\sim q$ and any randomness in the algorithm. We have,

\displaystyle R_{T}\;\stackrel{{\scriptstyle\Delta}}{{=}}\;T\cdot\mathrm{OPT}% \,-\,\sum_{t=1}^{T}p_{t}(n_{t})\;=\;T\cdot\mathrm{OPT}\,-\,\sum_{t=1}^{T}p_{t}% (n_{i_{t},p_{t}}).

(3)

Adversarial setting. Here, the types on each round $\{i_{t}\}_{t=1}^{T}$ are chosen arbitrarily, possibly by an oblivious adversary, ahead of time. The type on round $t$ is revealed to the seller only at the end of the round, and only if there is a purchase. In the adversarial setting, we define our regret $R_{T}$ with respect to the single best price in $\mathcal{P}$ in hindsight. We wish to design algorithms with small expected regret $\mathbb{E}[R_{T}]$ , where the expectation is with respect to any randomness in the algorithm. We have,

\displaystyle R_{T}\;\stackrel{{\scriptstyle\Delta}}{{=}}\;\max_{p\in\mathcal{% P}}\sum_{t=1}^{T}p(n_{i_{t},p})\,-\,\sum_{t=1}^{T}p_{t}(n_{i_{t},p_{t}}).

(4)

3 Efficient discretization of price curves with small errors

We first study the revenue maximization problem in the offline setting, where the seller knows both the valuation curves $v_{i},i\in[m]$ , and the type distribution $q$ . Our goal is to design a discretization so as to achieve revenue within a gap of $\mathcal{O}(\epsilon)$ from $\mathrm{OPT}$ . Before discussing our discretization algorithms, we first show that the optimal pricing curve is “simple” when there are at most $m$ types.

Lemma 3.1.

Assume there are $m$ types with non-decreasing value curves $\{v_{i}\}_{i\in[m]}$ . For any non-decreasing price curve $p$ , there exists an “ $m$ -step” price curve $\bar{p}$ that yields expected revenue at least that of $p$ with respect to any distribution over the $m$ types. Here, $m$ -step refers to non-decreasing functions $f:[N]\rightarrow[0,1]$ where $f(n+1)-f(n)>0$ in at most $m$ points (i.e., at most $m$ jumps).

Lemma 3.1, proven in Appendix A.1, will be an important tool in all three discretization algorithms of this section. It will allow us to reduce the space of pricing curves as we only need to focus on $m$ -step price curves. Next, we present our first discretization procedure in Algorithm 1, which only assumes the monotonicity of the valuation curves.

Discretization scheme under monotonic valuations. Our discretization proecdure, outlined in Algorithm 1, adapts the method in Hartline and Koltun [24] using Lemma 3.1. For this, we will first construct a discretization $W$ of the valuation space as follows. Let $Z_{i}=\epsilon(1+\epsilon)^{i}$ , $i=0,1,\dots,\left\lceil\log_{1+\epsilon}\frac{1}{\epsilon}\right\rceil$ be the powers of $(1+\epsilon)$ on price space $\left[\epsilon,1\right]$ . For each $i$ , we let $W_{i}$ be a uniform discretization of the interval $\left[Z_{i-1},Z_{i+1}\right)$ uniformly with gap $Z_{i-1}\cdot\frac{\epsilon}{m}$ . Finally, let $W$ be the union of all such $W_{i}$ . According to Lemma 3.1, every price function in $\mathcal{P}$ has the same revenue as an $m$ -step function. We set $\overline{\mathcal{P}}$ to be all choices of non-decreasing $m$ -step functions that take value in $W$ . We have the following theorem about Algorithm 1 which we prove in Appendix A.2.

Algorithm 1 Price discretization scheme under monotonicity

Given: Approximation parameter

\epsilon>0

Let

W

be discretization of the valuation space

[0,1]

defined as follows,

	$\displaystyle Z_{i}$	$\displaystyle\stackrel{{\scriptstyle\Delta}}{{=}}\left\{\epsilon(1+\epsilon)^{% i};\ \ \;\;\forall\;i\in\left\{0,1,\dots,\left\lceil\log_{1+\epsilon}\frac{1}{% \epsilon}\right\rceil\right\}\right\},$
	$\displaystyle W_{i}$	$\displaystyle\stackrel{{\scriptstyle\Delta}}{{=}}\left\{Z_{i-1}+Z_{i-1}\cdot% \frac{\epsilon k}{m};\;\;\forall\,k\in\{1,2,...,\left\lceil(2+\epsilon)m\right% \rceil\}\right\},\quad W\stackrel{{\scriptstyle\Delta}}{{=}}\bigcup_{i=1}^{% \left\lceil\log_{1+\epsilon}\frac{1}{\epsilon}\right\rceil}W_{i}.$

Set

\overline{\mathcal{P}}

to be the class of all “

m

-step” functions mapping

[N]

W

Theorem 3.1.

Consider the discretization $\overline{\mathcal{P}}$ as constructed in Algorithm 1. For any type distribution, there exists $p\in\overline{\mathcal{P}}$ such that $\mathrm{rev}(p)\geq\mathrm{OPT}-\mathcal{O}(\epsilon)$ . Moreover, we have $|\overline{\mathcal{P}}|\leq\left(\frac{e(N-1)}{m}\right)^{m}\left(e\lceil(2+% \epsilon)\rceil\left\lceil\log_{1+\epsilon}\frac{1}{\epsilon}\right\rceil% \right)^{m}\in\widetilde{\mathcal{O}}\left(\left(\frac{N}{\epsilon}\right)^{m}\right)$ .

Discretization scheme for smooth monotonic valuations. Due to space constraints, we present our algorithm, under Assumption 1 in Appendix A.3. We have the following theorem about Algorithm 5.

Theorem 3.2.

Consider the discretization $\overline{\mathcal{P}}$ as constructed in Algorithm 5. Under Assumption 1, for any type distribution, there exists $p\in\overline{\mathcal{P}}$ such that $\mathrm{rev}(p)\geq\mathrm{OPT}-\mathcal{O}(\epsilon)$ . Moreover, $|\overline{\mathcal{P}}|\in\mathcal{O}\left(\log^{m}_{1+\epsilon}\left(1/% \epsilon\right)\cdot\left(L/\epsilon\right)^{m}\right)\in\widetilde{\mathcal{O% }}\left(\left(\frac{L}{\epsilon^{2}}\right)^{m}\right)$ .

Algorithm 2 Price discretization scheme monotonic valuations under diminishing returns

Given: Diminishing returns constant

J

, approximation parameter

\epsilon

Let

W\stackrel{{\scriptstyle\Delta}}{{=}}\bigcup_{i=2}^{\left\lceil\log_{1+% \epsilon}\frac{1}{\epsilon}\right\rceil}W_{i}

, were

W_{i}

s are the same as in Algorithm 1.

Let

N_{\textbf{D}}

be discretization of the interval

[0,N]

defined as follows,

	$\displaystyle Y_{i}$	$\displaystyle\stackrel{{\scriptstyle\Delta}}{{=}}\left\lfloor\frac{2Jm}{% \epsilon^{2}}(1+\epsilon^{2})^{i}\right\rfloor,\ i=0,1,\dots,\left\lceil\log_{% 1+\epsilon^{2}}\left(\frac{N\epsilon^{2}}{2Jm}\right)\right\rceil,$
	$\displaystyle Q_{i}$	$\displaystyle\stackrel{{\scriptstyle\Delta}}{{=}}\left\{\left\lfloor Y_{i}+Y_{% i}\cdot\frac{\epsilon^{2}k}{2Jm}\right\rfloor,\ \ k=0,1,\dots,\left\lfloor 2Jm% \right\rfloor\right\},\quad Q\stackrel{{\scriptstyle\Delta}}{{=}}\bigcup_{i=1}% ^{\left\lceil\log_{1+\epsilon^{2}}\left(\frac{N\epsilon^{2}}{2Jm}\right)\right% \rceil}Q_{i},$
	$\displaystyle N_{\textbf{D}}$	$\displaystyle\stackrel{{\scriptstyle\Delta}}{{=}}\left\{1,2,\dots,\left\lfloor% \frac{2Jm}{\epsilon^{2}}\right\rfloor\right\}\cup Q.$

The discretization price set

\overline{\mathcal{P}}

is the class of all “

m

-step” price curves on function space

N_{\textbf{D}}\to W

Discretization scheme for monotone valuations under diminishing returns. Finally, we study discretization schemes under the diminishing returns condition. Our procedure, outlined in Algorithm 2 proceeds as follows. We use the same discretization $W$ of the valuation space from Algorithm 1. Next, we will discretize the dataspace $[N]$ . To exploit the structure in the diminishing returns condition, we will need to do so more densely when $n$ is small. For this, let $Y_{i}=\frac{2Jm}{\epsilon^{2}}(1+\epsilon^{2})^{i}$ , $i=0,\dots,\lceil\log_{1+\epsilon^{2}}\frac{N\epsilon^{2}}{2Jm}\rceil$ be the powers of $(1+\epsilon^{2})$ on data space $\left[\frac{2Jm}{\epsilon^{2}},N\right]$ . For each $i$ , the set $Q_{i}$ further partitions the interval $[Y_{i},Y_{i+1})$ uniformly with gap $Y_{i}\cdot\frac{\epsilon^{2}}{2Jm}$ . For $n$ smaller than $\frac{2Jm}{\epsilon^{2}}$ , we do not discretize it as the valuations may change rapidly when $n$ is small. Let $N_{\textbf{D}}$ be the union of $\left\{1,2,\dots,\left\lfloor\frac{2Jm}{\epsilon^{2}}\right\rfloor\right\}$ and all the set $Q_{i}$ . Therefore, $N_{\textbf{D}}$ has a size of at most $\frac{2Jm}{\epsilon^{2}}+2Jm\lceil\log_{1+\epsilon^{2}}\frac{N\epsilon^{2}}{2% Jm}\rceil$ . We have the following theorem about Algorithm 2 which we prove in Appendix A.5.

Theorem 3.3.

Consider the discretization $\overline{\mathcal{P}}$ as constructed in Algorithm 2. Under Assumption 2, for any type distribution, there exists $p\in\overline{\mathcal{P}}$ such that $\mathrm{rev}(p)\geq\mathrm{OPT}-\mathcal{O}(\epsilon)$ . Moreover,

\displaystyle|\overline{\mathcal{P}}|\in\mathcal{O}\left(\left(\frac{J}{% \epsilon^{2}}\right)^{m}\log^{m}\left(\frac{N\epsilon^{2}}{Jm}\right)\cdot% \left(\log^{m}_{1+\epsilon}1/\epsilon\right)\right)\in\widetilde{\mathcal{O}}% \left(\left(\frac{J}{\epsilon^{3}}\right)^{m}\right).

Proof outline. By Lemma 3.1, we may assume the optimal price curve $p^{\star}=\left\{(n^{\star}_{i},p^{\star}_{i})\right\}_{i=1}^{m}$ is an $m$ -step function, where $p^{\star}_{i}$ denote the value of $p$ on step $i$ . We generate an $m$ -step price curve $p=\left\{(n_{i},p_{i})\right\}_{i=1}^{m}$ on space $N_{\textbf{D}}\to W$ such that $n_{i}$ is obtained by rounding down $n^{\star}_{i}$ to the closest value in $N_{\textbf{D}}$ , and $p_{i}\geq p^{\star}_{i}/(1+\epsilon)$ . We then show that if a buyer purchases at step $i$ under price $p^{\star}$ , she will not purchase at step $j<i$ under new price $p$ . Therefore, the revenue from this buyer is at least $p_{i}\geq p^{\star}_{i}/(1+\epsilon)=p^{\star}_{i}-\mathcal{O}(\epsilon)$ , which ensures that $\mathrm{rev}(p)\geq\mathrm{OPT}-\mathcal{O}(\epsilon)$ .

4 Online learning in the stochastic setting

We now study the online learning problem outlined in §2.1 in the stochastic setting. Our Algorithm, outlined in Algorithm 3 is based on the classical upper confidence bound (UCB) algorithm for stochastic bandits [8, 37]. It takes a discretization $\overline{\mathcal{P}}$ of the pricing curves as input, and on each round chooses a $p_{t}\in\overline{\mathcal{P}}$ which has the largest UCB on the revenue.

The key technical challenge in realizing this scheme is in the construction of the UCB. As $\overline{\mathcal{P}}$ is large, naively constructing our UCBs over prices in $\overline{\mathcal{P}}$ will lead to a $\log|\overline{\mathcal{P}}|$ term in the UCB (say, when applying a union bound), and hence the regret. Instead, we will maintain UCBs for the type distribution, which will only have a $\log(m)$ term, and translate them to UCBs for the revenue. However, as we will see below, the analysis when constructing the UCB this way is nontrivial since we observe the types only if they make a purchase. In particular, our UCB depends on the number of times a buyer could have purchased at a given round, which is a random quantity that depends on the algorithm itself. We will first outline how we construct the UCBs.

Construction of UCB. We will now show how to construct the upper confidence bound $\widehat{\mathrm{rev}}_{t}$ at the end of round $t$ , which will be used in computing $p_{t+1}$ . For $\tau\leq t$ , let $S_{\tau}$ , defined below in (5), be the set of types who would have purchased in round $\tau$ at price $p_{\tau}$ had they appeared in that round. Then, for any type $i\in[m]$ , we define $T_{i,t}$ to be the number of times that type $i$ appears in set $S_{\tau}$ for $\tau\in\{1,\dots,t\}$ . That is, $T_{i,t}$ measures the number of times a buyer of type $i$ would have purchased during the first $t$ rounds. We have,

\displaystyle S_{\tau}\stackrel{{\scriptstyle\Delta}}{{=}}\big{\{}i\in[m]:% \exists n\in[N],v_{i}(n)-p_{\tau}(n)\geq 0\big{\}},\hskip 21.68121ptT_{i,t}% \stackrel{{\scriptstyle\Delta}}{{=}}\sum_{\tau=1}^{t}\mathbbm{I}(i\in S_{\tau}).

(5)

Note that as we use the $0$ price function on round 1, i.e. $p_{1}(\cdot)=0$ , we have $T_{i,t}>0$ for all $t>1$ . Next, we estimate $q_{i}$ via the fraction of times that type $i$ has appeared in the past $t$ rounds, provided that $i\in S_{\tau}$ for $\tau\in\{1,\dots,t\}$ . We have defined this quantity, $\overline{q}_{i,t}$ below in (6). Via a standard application of Hoeffding’s inequality, we can show that $\left|q_{i}-\overline{q}_{i,t}\right|\leq\sqrt{(\log T)/T_{i,t}}$ with high probability. Using this, we can construct an upper confidence bound $\widehat{q}_{i,t}$ as follows,

\displaystyle\overline{q}_{i,t}\stackrel{{\scriptstyle\Delta}}{{=}}\frac{1}{T_% {i,t}}\sum_{\tau=1}^{t}\mathbbm{I}(i\in S_{\tau},i_{\tau}=i),\hskip 28.90755pt% \widehat{q}_{i,t}\stackrel{{\scriptstyle\Delta}}{{=}}\overline{q}_{i,t}+\sqrt{% \frac{\log T}{T_{i,t}}}.

(6)

Algorithm 3 Online data pricing in the stochastic setting.

Given: time horizon

T

, discretization

\overline{\mathcal{P}}

of price curves.

Set

p_{1}

to be the zero function. # Give data away for free on round 1.

A buyer of type

i_{1}\sim q

arrives and purchases

N

data points at price 0.

for

t=2

T

Compute the UCB

\widehat{\mathrm{rev}}_{t-1}(p)

on the revenue of

p

for each

p\in\overline{\mathcal{P}}

. # See (5), (6), and (7).

Set

p_{t}=\mathop{\mathrm{argmax}}_{p\in\overline{\mathcal{P}}}\widehat{\mathrm{% rev}}_{t-1}(p)

A buyer of type

i_{t}\sim q

arrives, purchases

n_{i_{t},p_{t}}

points, and pays

p_{t}(n_{i_{t},p_{t}})

end for

We now translate the UCBs on $q$ to the UCBs on the revenue. Recall from (1) that a buyer of type $i$ will purchase $n_{i,p}$ points at price $p$ and the revenue from this buyer will be $p(n_{i,p})$ . Note that as the seller has access to the valuation curves, he can compute $n_{i,p}$ for any $i$ and price curve $p$ . Since $\mathrm{rev}(p)=\mathbb{E}_{i\sim q}[p(n_{i,p})]$ , we have the following natural UCB for $\mathrm{rev}(p)$ on round $t$ :

\displaystyle\widehat{\mathrm{rev}}_{t}(p)\;\stackrel{{\scriptstyle\Delta}}{{=% }}\;\sum_{i=1}^{m}\widehat{q}_{i,t}\cdot p(n_{i,p}).

(7)

This completes the description of our construction. The following theorem bounds the regret for Algorithm 3 when paired with any of the discretization schemes in §3. While the computational complexity of our method depends on $|\mathcal{P}|$ , there is no dependence on the regret because of the above construction of the UCB. The proof is given in Appendix C.

Theorem 4.1.

Suppose in Algorithm 3 we use a discretization $\overline{\mathcal{P}}$ which is a $\mathcal{O}(1/\sqrt{T})$ additive approximation to any price curve. Then, the regret of Algorithm 3 satisfies $\mathbb{E}[R_{T}]\in\widetilde{\mathcal{O}}(m\sqrt{T})$ .

Proof challenges. When bounding the regret, we first observe that the subsets $S\subset[m]$ induces a partitioning of the price curves, where $p$ belongs to the partition of $S$ , if all types in $S$ would make a purchase at price $p$ , and all types in $S^{c}$ would not make a purchase at price $p$ . With this insight, we can view the action of a seller as not just choosing a price curve, but also choosing a set $S_{t}\subset[n]$ . That is, $S_{t}$ can be viewed as a super-arm in a combinatorial semi-bandit problem [36].

5 Online learning in the adversarial setting

We now study the adversarial setting. Similar to the stochastic setting, our algorithm will use a discretization of the price curves from §3. We will control regret by bounding both the discretization error and the algorithm’s regret relative to the best pricing curve in the discretization.

Before proceeding, let us first contextualize our feedback model against prior work. If the buyers do not reveal their types, this becomes an adversarial bandit problem with $|\overline{\mathcal{P}}|$ arms (pricing curves) [32]. Using an algorithm such as EXP-3 [9] results in large $\widetilde{\mathcal{O}}(T^{\nicefrac{{1}}{{2}}}|\overline{\mathcal{P}}|^{% \nicefrac{{1}}{{2}}})$ regret, which is not ideal due to $|\overline{\mathcal{P}}|$ ’s exponential dependence in $m$ . Conversely, if buyers reveal their types regardless of purchase, this is equivalent to full information feedback, where algorithms such as Hedge or Follow-the-perturbed-leader (FTPL) [30] yield $\mathcal{O}(T^{\nicefrac{{1}}{{2}}}\log^{\nicefrac{{1}}{{2}}}|\overline{% \mathcal{P}}|)$ regret, translating to $\widetilde{\mathcal{O}}((mT)^{\nicefrac{{1}}{{2}}})$ with our discretization schemes in §3. In our intermediate regime, where feedback is only revealed upon purchase, we aim for a middle ground. We show our algorithm, outlined in Algorithm 4, achieves $\widetilde{\mathcal{O}}(m^{3/2}T^{\nicefrac{{1}}{{2}}})$ regret, which is worse than full information, but still depends polynomially on $m$ .

Our algorithm takes a discretization $\overline{\mathcal{P}}$ and a perturbation parameter $\theta$ as input. First, it samples a random perturbation $\theta_{p}$ from an exponential distribution with pdf $\theta e^{-\theta x}$ for each pricing curve $p$ in $\overline{\mathcal{P}}$ . It maintains rewards $\{r_{t}(p)\}_{t,p}$ for each round $t$ and price curve $p$ . On each round, it chooses the price curve that maximizes the perturbed cumulative reward $\sum_{\tau=1}^{t}r_{\tau}(p)+\theta_{p}$ .

This scheme is similar to FTPL, but the key difference is in how we design the rewards $\{r_{t}(p)\}_{t,p}$ . To describe this, let $S_{t}$ , defined exactly as in (5), be the set of agents who would have purchased in round $t$ at price $p_{t}$ . At the end of the round, if there was a purchase, for all prices $p\in\overline{\mathcal{P}}$ , we set the reward to be $r_{t}(p)=p(n_{i_{t},p})$ , i.e. the payment we would have received from the buyer at that round, had the price been $p$ (see (1)). If there was no purchase, we know that $i_{t}\notin S_{t}$ , in which case we set $r_{t}(p)=\sum_{i\in S_{t}^{c}}p(n_{i,p})$ . In this case, $r_{t}(p)$ is an upper bound on $p(n_{i_{t},p})$ , and this upper bound is tight around prices similar to the chosen price $p_{t}$ ; in fact, $r_{t}(p_{t})=0$ if there was no purchase. Intuitively, $r_{t}(p)$ deals with the uncertainty of not knowing the type on round $t$ by providing a large reward (as we are taking the sum) to prices that could have resulted in a purchase, which encourages exploration of such prices in future rounds. This intuition will help us bound the regret.

Algorithm 4 Online data pricing in the adversarial setting.

Given: time horizon

T

, discretization

\overline{\mathcal{P}}

, perturbation parameter

\theta

For each

p\in\overline{\mathcal{P}}

, sample

\theta_{p}

from an exponential distribution with pdf

\theta e^{-\theta x}

for

t=1

T

Set price curve for the current round

\;\;p_{t}=\underset{p\in\overline{\mathcal{P}}}{\mathop{\mathrm{argmax}}}% \displaystyle\sum_{\tau=1}^{t-1}r_{\tau}(p)\;+\;\theta_{p}

A buyer of type

i_{t}

arrives, purchases

n_{i_{t},p_{t}}

points, and pays

p_{t}(n_{i_{t},p_{t}})

n_{i_{t},p_{t}}>0

then Set

r_{t}(p)=p(n_{i_{t},p})

for all

p\in\overline{\mathcal{P}}

. # If there was a purchase

else Set

r_{t}(p)=\sum_{i\in S_{t}^{c}}p(n_{i,p})

for all

p\in\overline{\mathcal{P}}

. # See (5) for

S_{t}

end if

end for

Theorem 5.1 provides a bound on the regret for Algorithm 4. Its proof is given in Appendix B. Combining this with the size of $\overline{\mathcal{P}}$ under the various assumptions in §3, we obtain $\widetilde{\mathcal{O}}(m^{\nicefrac{{3}}{{2}}}\sqrt{T})$ regret.

Theorem 5.1.

Suppose in Algorithm 4 we use a discretization $\overline{\mathcal{P}}$ which is a $\mathcal{O}(1/\sqrt{T})$ additive approximation to any price curve. Let $R_{T}$ be as defined in (4). Then, for Algorithm 4, we have $\mathbb{E}[R_{T}]\;\in\;\mathcal{O}\left(m^{2}\theta T+\theta^{-1}\left(1+\log% \left|\overline{\mathcal{P}}\right|\right)\right)$ . Setting $\theta=\sqrt{\frac{1+\log\left|\overline{\mathcal{P}}\right|}{m^{2}T}}$ , we have $\mathbb{E}[R_{T}]\;\in\;\mathcal{O}\big{(}m\sqrt{T\log\left|\overline{\mathcal% {P}}\right|}\big{)}.$

6 Conclusion

We designed revenue-optimal learning algorithms for pricing data. First, we leveraged properties like smoothness and diminishing returns to create novel discretization schemes for approximating any pricing curve. These schemes were then used in our learning algorithms to improve their statistical and computational properties. Our algorithms build on classical methods like UCB and FTPL but required significant adaptations to handle the vast space of pricing curves and the asymmetric feedback. An interesting future direction would be to relax the assumption that the seller knows the valuation curves ${v_{i}}$ .

References

aws [a] AWS Forecast. https://aws.amazon.com/forecast/, a. Accessed: 2024-05-12.
aws [b] AWS Data Hub. https://aws.amazon.com/blogs/big-data/tag/datahub/, b. Accessed: 2024-05-11.
[3] Azure Data Share. https://azure.microsoft.com/en-us/products/data-share. Accessed: 2024-05-10.
[4] Delta Sharing. https://docs.databricks.com/en/data-sharing/index.html. Accessed: 2024-05-11.
[5] Ads Data Hub. https://developers.google.com/ads-data-hub/guides/intro. Accessed: 2022-05-10.
Agarwal et al. [2019] A. Agarwal, M. Dahleh, and T. Sarkar. A marketplace for data: An algorithmic solution. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 701–726, 2019.
Agarwal et al. [2020] A. Agarwal, M. Dahleh, T. Horel, and M. Rui. Towards data auctions with externalities. arXiv preprint arXiv:2003.08345, 2020.
Auer [2002] P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422, 2002.
Auer et al. [2002] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
Besbes and Zeevi [2009] O. Besbes and A. Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operations Research, 57(6):1407–1420, 2009.
Besbes and Zeevi [2015] O. Besbes and A. Zeevi. On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Science, 61(4):723–739, 2015.
Chawla et al. [2007] S. Chawla, J. D. Hartline, and R. Kleinberg. Algorithmic pricing via virtual valuations. In Proceedings of the 8th ACM Conference on Electronic Commerce, pages 243–251, 2007.
Chawla et al. [2022] S. Chawla, R. Rezvan, Y. Teng, and C. Tzamos. Pricing ordered items. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 722–735, 2022.
Chen et al. [2016] W. Chen, W. Hu, F. Li, J. Li, Y. Liu, and P. Lu. Combinatorial multi-armed bandit with general reward functions. Advances in Neural Information Processing Systems, 29, 2016.
Chen et al. [2014] X. Chen, I. Diakonikolas, D. Paparas, X. Sun, and M. Yannakakis. The complexity of optimal multidimensional pricing. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1319–1328. SIAM, 2014.
Cheung et al. [2017] W. C. Cheung, D. Simchi-Levi, and H. Wang. Dynamic pricing and demand learning with limited price experimentation. Operations Research, 65(6):1722–1731, 2017.
Citrine Informatics [2024] Citrine Informatics. Citrine Informatics – Accelerating Materials Innovation. URL: https://citrine.io/, 2024. Accessed: March 9, 2024.
Den Boer [2015] A. V. Den Boer. Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys in Operations Research and Management Science, 20(1):1–18, 2015.
den Boer and Zwart [2014] A. V. den Boer and B. Zwart. Simultaneously learning and optimizing using controlled variance pricing. Management Science, 60(3):770–783, 2014.
Deng et al. [2009] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
Dudík et al. [2020] M. Dudík, N. Haghtalab, H. Luo, R. E. Schapire, V. Syrgkanis, and J. W. Vaughan. Oracle-efficient online learning and auction design. Journal of the ACM (JACM), 67(5):1–57, 2020.
Guo et al. [2023] W. Guo, N. Haghtalab, K. Kandasamy, and E. Vitercik. Leveraging reviews: Learning to price with buyer and seller uncertainty. In Proceedings of the 24th ACM Conference on Economics and Computation, pages 816–816, 2023.
Guruswami et al. [2005] V. Guruswami, J. D. Hartline, A. R. Karlin, D. Kempe, C. Kenyon, and F. McSherry. On profit-maximizing envy-free pricing. In SODA, volume 5, pages 1164–1173, 2005.
Hartline and Koltun [2005] J. D. Hartline and V. Koltun. Near-optimal pricing in near-linear time. In Proceedings of the 9th International Conference on Algorithms and Data Structures, WADS’05, page 422–431, Berlin, Heidelberg, 2005. Springer-Verlag. ISBN 3540281010. doi: 10.1007/11534273_37. URL https://doi.org/10.1007/11534273_37.
He et al. [2016] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
Jagadeesan et al. [2021] M. Jagadeesan, A. Wei, Y. Wang, M. Jordan, and J. Steinhardt. Learning equilibria in matching markets from bandit feedback. Advances in Neural Information Processing Systems, 34:3323–3335, 2021.
Javanmard [2017] A. Javanmard. Perishability of data: dynamic pricing under varying-coefficient models. The Journal of Machine Learning Research, 18(1):1714–1744, 2017.
Javanmard and Nazerzadeh [2019] A. Javanmard and H. Nazerzadeh. Dynamic pricing in high-dimensions. The Journal of Machine Learning Research, 20(1):315–363, 2019.
Jia et al. [2019] R. Jia, D. Dao, B. Wang, F. A. Hubis, N. Hynes, N. M. Gürel, B. Li, C. Zhang, D. Song, and C. J. Spanos. Towards efficient data valuation based on the Shapley value. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1167–1176. PMLR, 2019.
Kalai and Vempala [2005] A. Kalai and S. Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
Keskin and Zeevi [2014] N. B. Keskin and A. Zeevi. Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Operations Research, 62(5):1142–1167, 2014.
Kleinberg and Leighton [2003] R. Kleinberg and T. Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 594–605. IEEE, 2003.
Krause and Guestrin [2011] A. Krause and C. Guestrin. Submodularity and its applications in optimized information gathering. ACM Transactions on Intelligent Systems and Technology (TIST), 2(4):1–20, 2011.
Krause et al. [2008] A. Krause, H. B. McMahan, C. Guestrin, and A. Gupta. Robust submodular observation selection. Journal of Machine Learning Research, 9(12), 2008.
Krizhevsky et al. [2012] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
Kveton et al. [2015] B. Kveton, Z. Wen, A. Ashkan, and C. Szepesvari. Tight regret bounds for stochastic combinatorial semi-bandits. In Artificial Intelligence and Statistics, pages 535–543. PMLR, 2015.
Lai and Robbins [1985] T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4–22, 1985.
Misra et al. [2019] K. Misra, E. M. Schwartz, and J. Abernethy. Dynamic online pricing with incomplete information using multiarmed bandit experiments. Marketing Science, 38(2):226–252, 2019.
Perakis and Singhvi [2023] G. Perakis and D. Singhvi. Dynamic pricing with unknown nonparametric demand and limited price changes. Operations Research, 2023.
Shalev-Shwartz and Ben-David [2014] S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
Szegedy et al. [2015] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
Wang et al. [2020] T. Wang, J. Rausch, C. Zhang, R. Jia, and D. Song. A principled approach to data valuation for federated learning. Federated Learning: Privacy and Incentive, pages 153–167, 2020.
Wang et al. [2021] Y. Wang, B. Chen, and D. Simchi-Levi. Multimodal dynamic pricing. Management Science, 67(10):6136–6152, 2021.
Wasserman [2006] L. Wasserman. All of nonparametric statistics. Springer Science & Business Media, 2006.
Xu and Wang [2021] J. Xu and Y.-X. Wang. Logarithmic regret in feature-based dynamic pricing. Advances in Neural Information Processing Systems, 34:13898–13910, 2021.
Zhao and Chen [2019] H. Zhao and W. Chen. Stochastic one-sided full-information bandit. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 150–166. Springer, 2019.

Appendix A Omitted Details from Section 3

A.1 Proof of Lemma 3.1

See 3.1

Proof of Lemma 3.1.

Fix a price curve $p$ . Let $n_{i,p}$ be the amount of data type $i$ purchase at price curve $p$ , that is

\displaystyle n_{i,p}\stackrel{{\scriptstyle\Delta}}{{=}}\max\left\{\underset{% n\in\left[N\right]}{\mathop{\mathrm{argmax}}}(v_{i}(n)-p(n))\right\}.

For $\{n_{i,p}\}_{i\in[m]}$ , let $\pi:[m]\rightarrow[m]$ be a permutation such that $n_{\pi(1),p}\leq n_{\pi(2),p}\leq\cdots\leq n_{\pi(m),p}$ . Let $n_{(i)}\stackrel{{\scriptstyle\Delta}}{{=}}n_{\pi(i),p}$ . Then, define a function $\bar{p}:[N]\rightarrow[0,1]$ as follows,

\displaystyle\bar{p}(n)\stackrel{{\scriptstyle\Delta}}{{=}}\begin{cases}p\left% (n_{(1)}\right),&n\leq n_{(1)},\\ p\left(n_{(2)}\right),&n_{(1)}<n\leq n_{(2)},\\ &\vdots\\ p\left(n_{(m-1)}\right),&n_{(m-2)}<n\leq n_{(m-1)},\\ p\left(n_{(m)}\right),&n_{(m-1)}<n\leq N,\end{cases}

so that $\bar{p}$ has at most $m$ steps. Then, $\bar{p}$ has following properties,

	$\displaystyle\bar{p}(n)=p(n),\text{ when }n\in\left\{n_{(1)},n_{(2)},\dots,n_{% (m)}\right\},$
	$\displaystyle\bar{p}(n)\leq p(n),\text{ when }n\in[N]\setminus\left\{n_{(1)},n% _{(2)},\dots,n_{(m)}\right\}.$

We next prove that for any $i\in[m]$ , after changing the price function from $p$ to $\bar{p}$ , the type $i$ buyer either purchases at $(n_{i,p},p(n_{i,p}))$ or at $(N,p(n_{(m)}))$ .

For any type $i$ and any amount of data $n\leq n_{(m)}$ , there exists $k$ such that $n_{(k-1)}<n\leq n_{(k)}$ (let $n_{(0)}=0$ ), we then have

$\displaystyle v_{i}(n)-\bar{p}(n)$	$\displaystyle\leq v_{i}\left(n_{(k)}\right)-\bar{p}\left(n_{(k)}\right)$	(as $v_{i}$ is non-decreasing and $\bar{p}$ is a step function.)
	$\displaystyle=v_{i}\left(n_{(k)}\right)-p\left(n_{(k)}\right)$	(as $\bar{p}\left(n_{(k)}\right)=p\left(n_{(k)}\right)$ )
	$\displaystyle\leq v_{i}(n_{i,p})-p(n_{i,p})$	(as $n_{i,p}$ maximizes the buyer’s utility.)
	$\displaystyle=v_{i}(n_{i,p})-\bar{p}(n_{i,p}).$	(as $\bar{p}(n_{i,p})=p(n_{i,p})$ )

As shown in the above, type $i$ still prefers purchasing $n_{i,p}$ data over all $n\leq n_{(m)}$ under price $\bar{p}$ .

For $n\in\left\{n_{(m)}+1,\dots,N\right\}$ , by the monotonicity of value curves, we have

\displaystyle N=\max\left\{\underset{n\in\left\{n_{(m)}+1,\dots,N\right\}}{% \arg\max}\left(v_{i}(n)-\bar{p}(n)\right)\right\}.

Therefore, for any $i\in[m]$ , type $i$ either purchases at $(n_{i,p},p(n_{i,p}))$ , or purchases at $(N,\bar{p}(N))=(N,p(n_{(m)}))$ under price $\bar{p}$ . No matter in which case, type $i$ contributes no less revenue under $\bar{p}$ than $p$ . It then follows that, for any type distribution $q$ ,

\displaystyle\mathrm{rev}(\bar{p})\geq\mathrm{rev}(p).

∎

A.2 Proof of Theorem 3.1

In this subsection, we prove Theorem 3.1 by decomposing it into three technical lemmas (Lemma A.1, A.2 and A.3). In Lemma A.1 and A.2, we prove the approximation guarantee of our discretization scheme and, in Lemma A.3 we provide an upper bound on the size of the discretization.

Lemma A.1.

For any type distribution, there exists a pricing function $\widetilde{p}:[N]\rightarrow[\epsilon,1]$ such that

\displaystyle\mathrm{rev}(\widetilde{p})\geq\mathrm{OPT}-\epsilon.

Proof of Lemma A.1.

Consider the optimal pricing function $p^{\star}:[N]\rightarrow[0,1]$ , i.e., $\mathrm{OPT}=\mathrm{rev}(p^{\star})$ . Consider price curve $\widetilde{p}:[N]\rightarrow[\epsilon,1]$ where $\widetilde{p}(n)=\max\left(\epsilon,p^{\star}(n)\right)$ .

Let $J\stackrel{{\scriptstyle\Delta}}{{=}}\left\{n\in[N]:\widetilde{p}(n)=p^{\star}% (n)\right\}$ be the set of data quantities whose price under $\widetilde{p}$ are the same as those under $p$ . Any buyer type who would have purchased $n\in J$ amount of data under $p^{\star}$ will purchase the same amount of data under $\widetilde{p}$ . On the other hand, for buyer types who would have purchased $n\notin J$ amount of data under $p^{\star}$ , since $\widetilde{p}(n)=\epsilon>p^{\star}(n)$ for $n\notin J$ , the expected revenue contribution from such buyers under $p^{\star}$ is at most $\epsilon$ , hence no matter they purchase or not under $\widetilde{p}$ , we have $\mathrm{rev}(\widetilde{p})\geq\mathrm{OPT}-\epsilon$ . ∎

Lemma A.2.

For any $\widetilde{p}\in\left[\epsilon,1\right]^{N}$ there exists $p^{\prime}\in\overline{\mathcal{P}}$ such that $\mathrm{rev}(p^{\prime})\geq\mathrm{rev}(\widetilde{p})/(1+\epsilon)$ , for any type distribution $q$ .

Proof of Lemma A.2.

For $m$ buyer types, by Lemma 3.1, there exists a non-decreasing step function $\bar{p}\in[\epsilon,1]^{N}$ with at most $m$ steps, whose expected revenue is at least $\mathrm{rev}(\widetilde{p})$ . Assume $\bar{p}$ has $k$ steps, $k\leq m$ . To simplify the notation, for $1\leq j\leq k$ , let $\bar{p}_{j}$ denote the price $\bar{p}$ on $j$ th step. That is,

\displaystyle\bar{p}(n)=\begin{cases}\bar{p}_{1},&n\in(0,i_{1}]\cap\mathbb{Z},% \\ \bar{p}_{2},&n\in(i_{1},i_{2}]\cap\mathbb{Z},\\ &\vdots\\ \bar{p}_{k},&n\in(i_{k-1},N]\cap\mathbb{Z}.\end{cases}

Where $i_{1},\dots,i_{k-1}\in[N]$ are discontinuities in $\bar{p}$ .

Recall the definitions of $Z$ and $W$ as stated in Algorithm 1,

	$\displaystyle Z_{i}$	$\displaystyle\stackrel{{\scriptstyle\Delta}}{{=}}\left\{\epsilon(1+\epsilon)^{% i}:\forall\;i\in\left\{0,1,\dots,\left\lceil\log_{1+\epsilon}\frac{1}{\epsilon% }\right\rceil\right\}\right\},\,Z=\bigcup_{i}Z_{i}.$
	$\displaystyle W_{i}$	$\displaystyle\stackrel{{\scriptstyle\Delta}}{{=}}\left\{Z_{i-1}+Z_{i-1}\cdot% \frac{\epsilon k}{m}:\forall\,k\in\left\{1,2,...,\left\lceil(2+\epsilon)m% \right\rceil\right\}\right\},\quad W\stackrel{{\scriptstyle\Delta}}{{=}}% \bigcup_{i=1}^{\left\lceil\log_{1+\epsilon}\frac{1}{\epsilon}\right\rceil}W_{i}.$

Let $i_{k}=N$ and for each $j\in[k]$ , let $Z_{i_{j}}$ be the price obtained by rounding $\bar{p}_{j}$ down to the nearest value in $Z$ . By constructions of $Z$ and $W$ above, $W_{i_{j}}$ is a partition of interval $(Z_{i_{j}-1},Z_{i_{j}+1})$ . Let $w_{j}$ be the price obtained by rounding $\bar{p}_{j}$ down to the nearest value in $W_{i_{j}}$ . Set $d_{j}\stackrel{{\scriptstyle\Delta}}{{=}}\frac{\epsilon}{m}\cdot Z_{i_{j}-1}$ and consider $k$ -step function $p$ defined by whose price at $j$ th step (denoted $p_{j}$ ) is $w_{j}-(j-1)d_{j}\in W_{i_{j}}$ , that is

\displaystyle p(n)=\begin{cases}p_{1}=w_{1},&\text{for}\ n\in(0,i_{1}]\cap% \mathbb{Z},\\ p_{2}=w_{2}-d_{2},&\text{for}\ n\in(i_{1},i_{2}]\cap\mathbb{Z},\\ &\vdots\\ p_{k}=w_{k}-(k-1)d_{k},&\text{for}\ n\in(i_{k-1},N]\cap\mathbb{Z}.\end{cases}

By the tie-breaking rule and the monotonicity of valuation curves, buyers only purchase among $0,i_{1},i_{1},\dots,i_{k}$ number of data under $p$ and $\bar{p}$ .

Subclaim. Then, $p$ and $\bar{p}$ satisfies the following

\displaystyle\mathrm{rev}(p)\geq\mathrm{rev}(\bar{p})/(1+\epsilon),

(8)

with respect to any type distribution.

Proof of the Subclaim. We prove the above subclaim with two steps.

Step 1: No buyer who prefers to purchase $i_{j}$ data under $\bar{p}$ would prefer $i_{j^{\prime}}$ data for some $j^{\prime}<j$ under $p$ (i.e., one with a less price). This is because, when going from price $\bar{p}$ to $p$ , the increase in the buyer’s utility for $i_{j}$ data is $\bar{p}_{j}-p_{j}$ , which is higher than the increase $\bar{p}_{j^{\prime}}-p_{j^{\prime}}$ for $i_{j^{\prime}}$ data. Formally, this can be seen as follows: For any $j^{\prime}<j$ we have,

\displaystyle\bar{p}_{j}-p_{j}\geq w_{j}-p_{j}=(j-1)d_{j},

as $\bar{p}_{j}\geq w_{j}$ and $p_{j}=w_{j}-(j-1)d_{j}$ . Moreover,

\displaystyle\bar{p}_{j^{\prime}}

\displaystyle<w_{j^{\prime}}+d_{j^{\prime}}\implies\bar{p}_{j^{\prime}}-p_{j^{% \prime}}<w_{j^{\prime}}+d_{j^{\prime}}-p_{j^{\prime}}=j^{\prime}d_{j^{\prime}}.

(9)

The inequality $\bar{p}_{j^{\prime}}<w_{j^{\prime}}+d_{j^{\prime}}$ holds because $w_{j^{\prime}}$ is the result of rounding down $\bar{p}_{j}$ to the nearest value in $W_{i_{j}}$ .

By constructions of sets $Z$ and $W$ , we have $d_{j}\geq d_{j^{\prime}}$ which implies $(j-1)d_{j}\geq j^{\prime}d_{j^{\prime}}$ . Then, by combining the above inequalities, we obtain

\displaystyle\bar{p}_{j}-p_{j}\geq(j-1)d_{j}\geq j^{\prime}d_{j^{\prime}}\geq% \bar{p}_{j^{\prime}}-p_{j^{\prime}}.

(10)

Consider a buyer with value curve $v$ who prefers to purchase at $i_{j}$ under price $\bar{p}$ , then it must be

\displaystyle v(i_{j})-\bar{p}_{j}>v(i_{j^{\prime}})-\bar{p}_{j^{\prime}}.

(11)

Then, by combining (10) and (11), we have

\displaystyle v(i_{j})-{p}_{j}>v(i_{j^{\prime}})-{p}_{j^{\prime}},

therefore the buyer would not purchase at $i_{j^{\prime}}<i_{j}$ under $p$ .
Step 2: Next, we claim that $p_{j}\geq\bar{p}_{j}/(1+\epsilon)$ for all step $j\in[k]$ . Since $Z_{i_{j}}$ is obtained by rounding $\bar{p}_{j}$ down to the nearest value in $Z$ , we have

\displaystyle\bar{p}_{j}\geq Z_{i_{j}}=Z_{i_{j}-1}+\epsilon Z_{i_{j}-1}=Z_{i_{% j}-1}+md_{j}.

(12)

By (9) and the above, we have

\displaystyle p_{j}\geq\bar{p}_{j}-jd_{j}\geq Z_{i_{j}-1}+(m-j)d_{j}\geq Z_{i_% {j}-1},

where the first inequality is by (9), the second is by (12), and the third is because $m\leq j$ .

Then, it follows that

\displaystyle\bar{p}_{j^{\prime}}-p_{j}\leq j\cdot d_{j}=\epsilon\cdot\frac{j}% {m}\cdot Z_{i_{j}-1}\leq\epsilon\cdot Z_{i_{j}-1}\leq\epsilon\cdot p_{j}% \implies p_{j}\geq\bar{p}_{j}/(1+\epsilon).

So far we have proved $p_{j}\geq\bar{p}_{j}/(1+\epsilon)$ and no type wants to change their preference to a smaller amount of data under $p$ . If one type purchase at $\bar{p}_{i}$ under $\bar{p}$ and $p_{k}$ under $p$ for $k\geq i$ , then $p_{k}\geq p_{i}\geq\bar{p}_{i}/(1+\epsilon)$ . Therefore, we have

\displaystyle\mathrm{rev}(p)\geq\mathrm{rev}(\bar{p})/(1+\epsilon)\geq\mathrm{% rev}(\bar{p})/(1+\epsilon).

Since the construction of price $p$ is not relevant to type distribution, the above holds for any type distribution $q$ , which proves the subclaim. ∎

Note that $p$ constructed in the above subclaim is not necessarily non-decreasing as a larger amount of data surfers more price deduction when going from $\bar{p}$ to $p$ . In this case, we can directly construct a non-decreasing price curve $p^{\prime}\in\overline{\mathcal{P}}$ from $p$ such that

\displaystyle\mathrm{rev}(p^{\prime})\geq\mathrm{rev}(\bar{p})/(1+\epsilon).

Let $S\stackrel{{\scriptstyle\Delta}}{{=}}\left\{i\in[k]:\exists j<i,\text{ s.t. }p% _{j}>p_{i}\right\}$ . If $S$ is empty, this implies that $p$ is non-decreasing, hence setting $p^{\prime}=p$ . If $S$ is not empty, we define $p^{\prime}$ as follows: Let $p^{\prime}$ be a $k$ -step function with the same jump points $i_{1},\dots,i_{k}$ as $p$ . Let $p^{\prime}_{i}$ be the value of $p^{\prime}$ on $i$ th step. Then, for $i\notin S$ , let $p^{\prime}_{i}=p_{i}$ ; and for $i\in S$ , let $p^{\prime}_{i}=\max_{j\notin S,j<i}p_{j}$ . By construction, $p^{\prime}$ is non-decreasing. Moreover, $p^{\prime}=p$ on set $S^{c}$ and $p^{\prime}>p$ on set $S$ .

Next, we claim that $\bar{p}_{j}-p^{\prime}_{j}$ is non-decreasing for all $j\in[k]$ . Both $(\bar{p}_{j}-p_{j})_{j\in[k]}$ and $\bar{p}$ are non-decreasing with respect to $j$ by the previous results. Hence,

$\displaystyle\bar{p}_{j}-p^{\prime}_{j}<\bar{p}_{j}-p^{\prime}_{j}\leq\bar{p}_% {j+1}-p_{j+1}=\bar{p}_{j+1}-p^{\prime}_{j+1},$	$\displaystyle\quad\text{ if }j\in S,j+1\notin S,$
$\displaystyle\bar{p}_{j}-p^{\prime}_{j}=\bar{p}_{j}-p^{\prime}_{j}\leq\bar{p}_% {j+1}-p_{j+1}=\bar{p}_{j+1}-p^{\prime}_{j+1},$	$\displaystyle\quad\text{ if }j\notin S,j+1\notin S,$
$\displaystyle\bar{p}_{j}-p^{\prime}_{j}=\bar{p}_{j}-p^{\prime}_{j+1}\leq\bar{p% }_{j+1}-p^{\prime}_{j+1},$	$\displaystyle\quad\text{ if }j\notin S,j+1\in S,$	(as $p^{\prime}_{j+1}=p^{\prime}_{j}$ )
$\displaystyle\bar{p}_{j}-p^{\prime}_{j}=\bar{p}_{j}-p^{\prime}_{j+1}\leq\bar{p% }_{j+1}-p^{\prime}_{j+1},$	$\displaystyle\quad\text{ if }j\in S,j+1\in S.$	(as $p^{\prime}_{j+1}=p^{\prime}_{j}$ )

Therefore, any type that prefers to purchase at $j$ th step under $\bar{p}$ would not prefer purchasing at any step $j^{\prime}<j$ under $p^{\prime}$ , and since $p^{\prime}_{j}\geq p_{j}\geq\bar{p}_{j}/(1+\epsilon)$ , we have

\displaystyle\mathrm{rev}(p^{\prime})\geq\mathrm{rev}(\bar{p})/(1+\epsilon)% \geq\mathrm{rev}(\widetilde{p})/(1+\epsilon).

∎

Lemma A.3.

When $n>m$ , $\left|\overline{\mathcal{P}}\right|\leq\left(\frac{eN}{m}\right)^{m}\left(e% \lceil(2+\epsilon)\rceil\left\lceil\log_{1+\epsilon}\frac{1}{\epsilon}\right% \rceil\right)^{m}$ .

Proof of Lemma A.3.

For any integer $i\leq m$ , the number of non-decreasing $i$ -step price function is $\binom{N-1}{i}\binom{\left|W\right|}{i}$ , hence we have

	$\displaystyle\left\|\overline{\mathcal{P}}\right\|$	$\displaystyle=\sum_{i=1}^{m}\binom{N-1}{i}\binom{\left\|W\right\|}{i}$
		$\displaystyle\leq\left(\sum_{i=1}^{m}\binom{N-1}{i}\right)\left(\sum_{i=1}^{m}% \binom{\left\|W\right\|}{i}\right)$
		$\displaystyle\leq\left(\sum_{i=0}^{m}\binom{N-1}{i}\right)\left(\sum_{i=0}^{m}% \binom{\left\|W\right\|}{i}\right)$
		$\displaystyle\leq\left(\frac{e(N-1)}{m}\right)^{m}\left(\frac{e\left\|W\right\|}% {m}\right)^{m}$
		$\displaystyle\leq\left(\frac{e(N-1)}{m}\right)^{m}\left(e\lceil(2+\epsilon)% \rceil\left\lceil\log_{1+\epsilon}\frac{1}{\epsilon}\right\rceil\right)^{m}$

In the last inequality, we use the fact that $\left|W\right|\leq\lceil(2+\epsilon)m\rceil\left\lceil\log_{1+\epsilon}\frac{1% }{\epsilon}\right\rceil$ . ∎

Finally, Theorem 3.1 follows directly from the above lemmas. See 3.1

Proof of Theorem 3.1.

Combining Lemma A.1 and Lemma A.2 together, we conclude that there exists price curve $p^{\prime}\in\overline{\mathcal{P}}$ such that

\displaystyle\mathrm{rev}(p^{\prime})\geq\frac{\mathrm{rev}(\tilde{p})}{1+% \epsilon}\geq\frac{\mathrm{OPT}-\epsilon}{1+\epsilon}\geq\mathrm{OPT}-\frac{2% \epsilon}{1+\epsilon}=\mathrm{OPT}-\mathcal{O}(\epsilon).

The size of $\overline{\mathcal{P}}$ follows from Lemma A.3. ∎

A.3 Price discretization scheme for smooth monotonic valuations

Algorithm 5 Price discretization scheme for smooth monotonic valuations

Given: Smoothness constant

L

, approximation parameter

\epsilon>0

Let

W

be discretization of the valuation space

[0,1]

given in Algorithm 1.

Let

N_{\textbf{S}}

be the following discretization of the interval

[0,N]

\displaystyle\delta\stackrel{{\scriptstyle\Delta}}{{=}}\left\lfloor\frac{% \epsilon N}{mL}\right\rfloor,\hskip 28.90755ptN_{\textbf{S}}\stackrel{{% \scriptstyle\Delta}}{{=}}\left\{\delta k:\ k\in\left\lceil\frac{N}{\delta}% \right\rceil\right\}.

Set

\overline{\mathcal{P}}

to be the class of all “

m

-step” functions mapping

N_{\textbf{S}}\to W

A.4 Proof of Theorem 3.2

Discretization scheme for smooth monotonic valuations. We study discretization schemes to approximate monotone valuations under the smoothness condition in Assumption 1. Our procedure is outlined in Algorithm 5. The discretization $W$ of the valuation space follows Algorithm 1. Additionally, we uniformly split the data space into multiples of $\left\lfloor\frac{\epsilon N}{mL}\right\rfloor$ , denoting them as the set $N_{\textbf{S}}$ . We then set the discretization $\overline{\mathcal{P}}$ to be the class of all “ $m$ -step” price curves on the function space $N_{\textbf{S}}\to W$ . The following theorem, proven in Appendix A.4, outlines the main properties of this discretization scheme: the size of the discretization has no dependence on the number of data $N$ .

See 3.2

Proof of Theorem 3.2.

By Lemma 3.1, there is a revenue optimal price curve $p^{\star}:[N]\rightarrow[0,1]$ which is a $k$ -step function, for some $k\in[m]$ . Where $p^{\star}$ can be compactly represented as the following set of tuples:

\displaystyle\left\{(n^{\star}_{1},p^{\star}_{1}),(n^{\star}_{2},p^{\star}_{2}% ),\dots,(n^{\star}_{k},p^{\star}_{k})\right\},

where $n^{\star}_{1},\dots,n^{\star}_{k}$ denote the locations of jumps and $p^{\star}_{i}$ denote the value of $p^{\star}$ on step $i\in[k]$ (i.e. $p^{\star}(n)=p^{\star}_{i}$ for $n\in(n^{\star}_{i-1},n^{\star}_{i}]$ ).

Let $\bar{\epsilon}:=\frac{\epsilon}{m}$ . Next, we generate a price ${p}^{\prime}$ using Algorithm 6, which ensures that the price curve $p$ generated in the following step (13) is non-decreasing. We demonstrate that in each round of Algorithm 6, we incur a revenue loss of at most $\bar{\epsilon}$ . If ${p}^{\prime}_{i}>{p}^{\prime}_{i-1}+\bar{\epsilon}$ , everything remains the same and thus does not affect the expected revenue. If not, we combine the price of step $i$ with step $i-1$ , let ${p}^{\prime}_{j}\stackrel{{\scriptstyle\Delta}}{{=}}{p}^{\prime}_{j}-\left({p}% ^{\prime}_{i}-{p}^{\prime}_{i-1}\right)$ for $j=i,\dots,k$ . During this process, buyers either make purchases at the same step, or switch to purchase at a higher step. Note that ${p}^{\prime}_{i}-{p}^{\prime}_{i-1}<\bar{\epsilon}$ , so the revenue loss of each type is at most $\bar{\epsilon}$ . This implies that the revenue loss in each round is at most $\bar{\epsilon}$ . As there are $k$ rounds, we lose expected revenue of at most $m\bar{\epsilon}$ . We conclude that $\mathrm{rev}({p}^{\prime})$ is within a gap of $\epsilon$ from $\mathrm{OPT}$ , i.e., $\mathrm{rev}({p}^{\prime})\geq\mathrm{OPT}-\epsilon$ .

Algorithm 6

Input: Optimal price curve

p^{\star}

Let

{p}^{\prime}=p^{\star}

for

i=2,\dots,k

{p}^{\prime}_{i}<{p}^{\prime}_{i-1}+\bar{\epsilon}

then

for

j=i,\dots,k

{p}^{\prime}_{j}={p}^{\prime}_{j}-\left({p}^{\prime}_{i}-{p}^{\prime}_{i-1}\right)

end for

end if

end for

Output: Price curve

{p}^{\prime}

After combining some steps in Algorithm 6, Assume that ${p}^{\prime}$ is a $\bar{k}$ -step function ( $\bar{k}\leq k$ ) represented by

\displaystyle\left\{({n}^{\prime}_{1},{p}^{\prime}_{1}),({n}^{\prime}_{2},{p}^% {\prime}_{2}),\dots,({n}^{\prime}_{\bar{k}},{p}^{\prime}_{\bar{k}})\right\}.

Then, we define a new price curve $p\in\overline{\mathcal{P}}$ as follows: let $\delta:=\left\lfloor\frac{\bar{\epsilon}N}{L}\right\rfloor$ , then $p$ is a $\bar{k}$ -step function represented by

\displaystyle\left\{(n_{1},p_{1}),(n_{2},p_{2}),\dots,(n_{\bar{k}},p_{\bar{k}}% )\right\},

where

\displaystyle n_{i}\stackrel{{\scriptstyle\Delta}}{{=}}\left\lfloor\frac{{n}^{% \prime}_{i}}{\delta}\right\rfloor\delta,\quad p_{i}\stackrel{{\scriptstyle% \Delta}}{{=}}{p}^{\prime}_{i}-i\bar{\epsilon}.

(13)

First, we show that no buyer who purchases at step $i$ under ${p}^{\prime}$ would purchase at step $j<i$ under $p$ . Let the buyer’s valuation be $v$ . First, we prove that the buyer’s utility is non-negative at $n_{i}$ :

$\displaystyle v(n_{i})-p_{i}$	$\displaystyle\geq v({n}^{\prime}_{i})-\delta\cdot\frac{L}{N}-p_{i}$	(by $L/N$ -Smoothness of $v$ .)
	$\displaystyle=v({n}^{\prime}_{i})-\delta\cdot\frac{L}{N}-{p}^{\prime}_{i}+i% \bar{\epsilon}$
	$\displaystyle\geq v({n}^{\prime}_{i})-\bar{\epsilon}-{p}^{\prime}_{i}+i\bar{\epsilon}$	(as $\delta\cdot\frac{L}{N}\leq\frac{L}{N}\cdot\frac{\bar{\epsilon}N}{L}=\bar{\epsilon}$ .)
	$\displaystyle=v({n}^{\prime}_{i})-{p}^{\prime}_{i}+(i-1)\bar{\epsilon}$
	$\displaystyle\geq v({n}^{\prime}_{i})-{p}^{\prime}_{i}$
	$\displaystyle\geq 0.$

Then, we prove that the buyer’s utility at $n_{i}$ is larger than that of $n_{j}$ for $j<i$ , therefore, the buyer would not prefer buying at step $j<i$ under price $p$ .

$\displaystyle v(n_{i})-p_{i}-(v(n_{j})-p_{j})$	$\displaystyle\geq v({n}^{\prime}_{i})-\delta\cdot\frac{L}{N}-v({n}^{\prime}_{j% })-(p_{i}-p_{j})$	(by $L/N$ -Smoothness of $v$ .)
	$\displaystyle=v({n}^{\prime}_{i})-\delta\cdot\frac{L}{N}-v({n}^{\prime}_{j})-(% {p}^{\prime}_{i}-{p}^{\prime}_{j}-(i-j)\bar{\epsilon})$
	$\displaystyle\geq v({n}^{\prime}_{i})-\bar{\epsilon}-v({n}^{\prime}_{j})-({p}^% {\prime}_{i}-{p}^{\prime}_{j}-(i-j)\bar{\epsilon})$	(as $\delta\cdot\frac{L}{N}\leq\frac{L}{N}\cdot\frac{\bar{\epsilon}N}{L}=\bar{\epsilon}$ )
	$\displaystyle=(v({n}^{\prime}_{i})-{p}^{\prime}_{i})-(v({n}^{\prime}_{j})-{p}^% {\prime}_{j})+(i-j-1)\bar{\epsilon}$
	$\displaystyle\geq(v({n}^{\prime}_{i})-{p}^{\prime}_{i})-(v({n}^{\prime}_{j})-{% p}^{\prime}_{j})$	(as $i>j$ )
	$\displaystyle\geq 0.$	(as the buyer prefers $n_{i}$ than $n_{k}$ under ${p}^{\prime}$ .)

Finally, fix the type distribution $(q_{1},\dots,q_{m})$ , then we have

$\displaystyle\mathrm{rev}({p}^{\prime})-\mathrm{rev}(p)$	$\displaystyle\leq\sum_{h=1}^{m}q_{h}\left(\sum_{i=1}^{k}({p}^{\prime}_{i}-p_{i% })\cdot\mathbbm{I}(\text{Type $j$ purchase at }{p}^{\prime}_{i}\text{ under % price }{p}^{\prime})\right)$
	$\displaystyle\leq m\bar{\epsilon}$
	$\displaystyle=\epsilon.$	(as $\epsilon=m\bar{\epsilon}$ .)

Hence, $\mathrm{rev}(p)$ is within a gap of $2\epsilon$ from $\mathrm{OPT}$ .

We then apply Theorem 3.1 to price $p$ . Therefore, it is enough to consider price functions from the set $N_{\textbf{S}}\stackrel{{\scriptstyle\Delta}}{{=}}\left\{k\delta:k=1,\dots,% \left\lceil\frac{N}{\delta}\right\rceil\right\}\subseteq[N]$ to $W$ to approximate the revenue within $\mathcal{O}(\epsilon)$ gap. Moreover, this discretization is of the size $\left\lceil\frac{N}{\delta}\right\rceil^{|W|}\in\mathcal{O}\left(\left(\log_{1% +\epsilon}\left(\frac{1}{\epsilon}\right)\right)^{m}\left(\frac{L}{\epsilon}% \right)^{m}\right)$ as $\left\lceil\frac{N}{\delta}\right\rceil\in\mathcal{O}\left(\frac{Lm}{\epsilon}\right)$ . ∎

A.5 Proof of Theorem 3.3

See 3.3

Proof of Theorem 3.3.

For each $i=0,1,\dots,\left\lceil\log_{1+\epsilon^{2}}\left(\frac{N\epsilon^{2}}{2Jm}% \right)\right\rceil$ , let $Y_{i}\stackrel{{\scriptstyle\Delta}}{{=}}\left\lfloor\frac{2Jm}{\epsilon^{2}}(% 1+\epsilon^{2})^{i}\right\rfloor$ , and $Q_{i}$ be the set $\left\{\left\lfloor Y_{i}+\frac{Y_{i}\epsilon^{2}}{2Jm}k\right\rfloor:k=1,% \dots,\left\lfloor 2Jm\right\rfloor\right\}$ , i.e., $Q_{i}$ splits the interval $[Y_{i},Y_{i+1}]$ equally into $2mJ$ parts.

The union of $Q_{i}$ s and the set $\left\{1,2,\dots,\left\lfloor\frac{2Jm}{\epsilon^{2}}\right\rfloor\right\}$ form a set of grids on $[0,N]$ , denoted by $N_{\textbf{D}}$ . There are at most $\frac{2Jm}{\epsilon^{2}}+2Jm\log_{1+\epsilon^{2}}\left(\frac{N\epsilon^{2}}{2% Jm}\right)$ grids in total.

\displaystyle\left\{(n^{\star}_{1},p^{\star}_{1}),(n^{\star}_{2},p^{\star}_{2}% ),\dots,(n^{\star}_{k},p^{\star}_{k})\right\},

where ${n}^{\star}_{1},\dots,{n}^{\star}_{k}$ denote the locations of jumps and $p^{\star}_{i}$ denote the value of $p^{\star}$ on step $i\in[k]$ (i.e. $p^{\star}(n)=p^{\star}_{i}$ for $n\in(n^{\star}_{i-1},n^{\star}_{i}]$ ).

Then, define a new $k$ -step price curve $p$ via

\displaystyle\left\{(n_{1},p_{1}),(n_{2},p_{2}),\dots,(n_{k},p_{k})\right\},

where $n_{i}$ is given by

\displaystyle n_{i}

\displaystyle\leftarrow\text{round down }n^{\star}_{i}\text{ to the closest % grid in }N_{\textbf{D}}.

Then we define $p_{i}$ below. If $p^{\star}_{i}<\epsilon(1+\epsilon)$ , let $p_{i}=\epsilon(1+\epsilon)$ ; otherwise, let $Z_{n^{\star}_{i}}$ be the price obtained by rounding $p^{\star}_{i}$ down to the nearest value in $Z$ . By constructions of $Z$ and $W$ above, $W_{n^{\star}_{i}}$ is a partition of interval $(Z_{n^{\star}_{i}-1},Z_{n^{\star}_{i}+1})$ . Let $w_{i}$ be the price obtained by rounding $p^{\star}_{i}$ down to the nearest value in $W_{n^{\star}_{i}}$ . Set $d_{i}\stackrel{{\scriptstyle\Delta}}{{=}}\frac{\epsilon}{m}\cdot Z_{n^{\star}_% {i}-1}$ . Then define $p_{i}\stackrel{{\scriptstyle\Delta}}{{=}}w_{i}-i\cdot d_{i}\in W_{n^{\star}_{i}}$ .

First, we prove for $i$ satisfying $p^{\star}_{i}>\epsilon(1+\epsilon)$ , if a buyer purchases at $n_{i}$ under price $p^{\star}$ , she will not purchase at $n_{j},\,j<i$ under new price $p$ . We prove this property separately when $n_{i}\leq\frac{2Jm}{\epsilon^{2}}$ and $n_{i}>\frac{2Jm}{\epsilon^{2}}$ .

(i) When $n_{i}>\frac{2Jm}{\epsilon^{2}}$ .

The buyer’s utility at $n_{i}$ under price $p$ is,

\displaystyle v(n_{i})-p_{i}=v(n^{\star}_{j})-p^{\star}_{i}+\left(p^{\star}_{i% }-p_{i}-\left(v(n^{\star}_{i})-v(n_{i})\right)\right).

(14)

Let $\delta_{i}\stackrel{{\scriptstyle\Delta}}{{=}}v(n^{\star}_{i})-v(n_{i})$ . Then $\delta_{i}$ is upper bounded by,

	$\displaystyle\delta_{i}$	$\displaystyle=\sum_{h=n_{i}}^{n^{\star}_{i}-1}v(h+1)-v(h)\leq\sum_{h=n_{i}}^{n% ^{\star}_{i}-1}\frac{J}{h}\leq\frac{J}{n_{i}}(n^{\star}_{i}-n_{i})$
		$\displaystyle\leq\frac{J}{n_{i}}\cdot\left(n_{i}\cdot\frac{\epsilon^{2}}{2mJ}+% 1\right)=\frac{\epsilon^{2}}{2m}+\frac{J}{n_{i}}\leq\frac{\epsilon^{2}}{2m}+% \frac{\epsilon^{2}}{2m}=\frac{\epsilon^{2}}{m},$		(15)

where the third inequality is due to Lemma A.4.

By the construction of $p$ , we have

\displaystyle p^{\star}_{i}-p_{i}=Z_{n_{i}-1}\cdot\frac{\epsilon i}{m}\geq% \frac{\epsilon^{2}i}{m}\geq\frac{\epsilon^{2}}{m}\geq\delta_{i}.

(16)

Therefore, by (14), $v(n_{i})-p_{i}\geq v(n^{\star}_{i})-p^{\star}_{i}\geq 0$ , buyer’s utility at $n_{i}$ under price $p$ is non-negative.

Next, we claim that $v(n_{i})-p_{i}-\left(v(n_{j})-p_{j}\right)\geq 0$ . To prove this, for any $j<i$ , let $\delta_{j}\stackrel{{\scriptstyle\Delta}}{{=}}v(n^{\star}_{j})-v(n_{j})$ , then we have

	$\displaystyle v(n_{i})-p_{i}-\left(v(n_{j})-p_{j}\right)$
	$\displaystyle\hskip 72.26999pt={v(n^{\star}_{i})-p^{\star}_{i}-(v(n^{\star}_{j% })-p^{\star}_{j})}+(p^{\star}_{i}-p_{i}-\delta_{i})-(p^{\star}_{j}-p_{j}-% \delta_{j})$

Where ${v(n^{\star}_{i})-p^{\star}_{i}-(v(n^{\star}_{j})-p^{\star}_{j})}\geq 0$ because the buyer prefers $n^{\star}_{i}$ over $n^{\star}_{j}$ under price $p^{\star}$ . Recall that we have $\delta_{j}\geq 0$ , then we bound $\delta_{i}-\delta_{j}$ as follows,

\displaystyle\delta_{i}-\delta_{j}\leq\delta_{i}\leq\frac{\epsilon^{2}}{m}.

(17)

By the construction of $p_{i}$ , we have,

$\displaystyle p^{\star}_{i}-p_{i}-(p^{\star}_{j}-p_{j})$	$\displaystyle=Z_{n_{i}-1}\cdot\frac{\epsilon i}{m}-Z_{n_{j}-1}\cdot\frac{% \epsilon j}{m}$
	$\displaystyle\geq Z_{n_{j}-1}\cdot\left(\frac{\epsilon i}{m}-\frac{\epsilon j}% {m}\right)$	(as $Z_{n_{i}-1}\geq Z_{n_{j}-1}$ )
	$\displaystyle\geq Z_{n_{j}-1}\cdot\left(\frac{\epsilon}{m}\right)$	(as $i>j$ )
	$\displaystyle\geq\frac{\epsilon^{2}}{m}.$	(18)

Therefore, combining (17) and (18) together, we have

\displaystyle v(n_{i})-p_{i}-\left(v(n_{j})-p_{j}\right)\geq{v(n^{\star}_{i})-% p^{\star}_{i}-(v(n^{\star}_{j})-p^{\star}_{j})}\geq 0.

We conclude that under price $p$ , the buyer prefers $n_{i}$ over $n_{j}$ , for any $j<i$ .

(ii) When $n_{i}\leq\frac{2Jm}{\epsilon^{2}}$ .

In this case, $n_{i}=n^{\star}_{i}$ , and for any $j<i$ , we still have $n_{j}=n^{\star}_{j}$ . First, we prove the buyer’s utility at $n^{\prime}_{i}$ under $p$ is non-negative:

	$\displaystyle v(n_{i})-p_{i}$	$\displaystyle=v(n^{\star}_{i})-p_{i}$
		$\displaystyle=v(n^{\star}_{i})-p^{\star}_{i}+(p^{\star}_{i}-p_{i})$
		$\displaystyle\geq v(n^{\star}_{i})-p^{\star}_{i}$
		$\displaystyle\geq 0.$

Then, we show that the buyer prefers $n_{i}$ over $n_{j}$ under $p$ :

	$\displaystyle v(n_{i})-p_{i}-\left(v(n_{j})-p_{j}\right)$	$\displaystyle={v(n^{\star}_{i})-p^{\star}_{i}-(v(n^{\star}_{j})-p^{\star}_{j})% }+(p^{\star}_{i}-p_{i}-\delta_{i})-(p^{\star}_{j}-p_{j}-\delta_{j})$
		$\displaystyle={v(n^{\star}_{i})-p^{\star}_{i}-(v(n^{\star}_{j})-p^{\star}_{j})% }+(p^{\star}_{i}-p_{i})-(p^{\star}_{j}-p_{j})$
		$\displaystyle\geq{v(n^{\star}_{i})-p^{\star}_{i}-(v(n^{\star}_{j})-p^{\star}_{% j})}$
		$\displaystyle\geq 0,$

where the first inequality is due to (18), and the second is because the buyer prefers $n^{\star}_{i}$ over $n^{\star}_{j}$ under $p^{\star}$ .

So far we have completed the proof that for $i$ satisfying $p^{\star}_{i}>\epsilon(1+\epsilon)$ , if a buyer purchases at $n_{i}$ under price $p^{\star}$ , she will not purchase at $n_{j},\,j<i$ under new price $p$ .

Then, similar to Step 2 in the proof of Lemma A.2, we have $p\geq\frac{p^{\star}}{1+\epsilon}$ pointwise. We then conclude the proof by observing

\displaystyle\mathrm{rev}(p)\geq\frac{\mathrm{rev}(p^{\star})-\mathcal{O}(% \epsilon)}{1+\epsilon}=\mathrm{OPT}-\mathcal{O}(\epsilon).

∎

Lemma A.4.

When $n_{i}>\frac{2Jm}{\epsilon^{2}}$ , we have $n^{\star}_{j}-n_{i}\leq n_{i}\cdot\frac{\epsilon^{2}}{2Jm}+1$ .

Proof of Lemma A.4.

By the construction of discretization set, $n_{i}$ must have the following form,

\left\lfloor Y_{i^{\prime}}+Y_{i^{\prime}}\cdot\frac{\epsilon^{2}k^{\prime}}{2% Jm}\right\rfloor,\text{ where }Y_{i^{\prime}}=\left\lfloor\frac{2Jm}{\epsilon^% {2}}(1+\epsilon^{2})^{i^{\prime}}\right\rfloor\text{ for some }i^{\prime},k^{% \prime}\in\mathbb{Z}.

Since ${n}^{\prime}_{j}$ is obtained by rounding down $n_{j}$ to the nearest grid in $N_{\textbf{D}}$ , $n_{j}$ satisfies the following inequality,

\displaystyle n_{j}\leq n^{\star}_{j}\leq Y_{i^{\prime}}+Y_{i^{\prime}}\cdot% \frac{\epsilon^{2}(k^{\prime}+1)}{2Jm}.

Therefore, we have

	$\displaystyle n^{\star}_{i}-n_{i}$	$\displaystyle\leq Y_{i^{\prime}}+Y_{i^{\prime}}\cdot\frac{\epsilon^{2}(k^{% \prime}+1)}{2Jm}-n_{i}$
		$\displaystyle=Y_{i^{\prime}}+Y_{i^{\prime}}\cdot\frac{\epsilon^{2}(k^{\prime}+% 1)}{2Jm}-\left\lfloor Y_{i^{\prime}}+Y_{i^{\prime}}\cdot\frac{\epsilon^{2}k^{% \prime}}{2Jm}\right\rfloor$
		$\displaystyle\leq Y_{i^{\prime}}+Y_{i^{\prime}}\cdot\frac{\epsilon^{2}(k^{% \prime}+1)}{2Jm}-\left(Y_{i^{\prime}}+Y_{i^{\prime}}\cdot\frac{\epsilon^{2}k^{% \prime}}{2Jm}\right)+1$
		$\displaystyle=Y_{i^{\prime}}\cdot\frac{\epsilon^{2}}{2Jm}+1$
		$\displaystyle\leq n_{i}\cdot\frac{\epsilon^{2}}{2Jm}+1.$

Where in the last inequality, since $Y_{i^{\prime}}$ is an integer, and we have

\displaystyle{n}^{\prime}_{i}=\left\lfloor Y_{i^{\prime}}+Y_{i^{\prime}}\cdot% \frac{\epsilon^{2}k^{\prime}}{2Jm}\right\rfloor\geq Y_{i^{\prime}},\text{ for}% \ k^{\prime}\geq 0.

∎

Appendix B Proof of Theorem 5.1

See 5.1

Proof of Theorem 5.1.

Recall that the regret $R_{T}$ for the adversarial setting is

	$\displaystyle R_{T}$	$\displaystyle\;\stackrel{{\scriptstyle\Delta}}{{=}}\;\max_{p\in\mathcal{P}}% \sum_{t=1}^{T}r(i_{t},p)\,-\,\sum_{t=1}^{T}r(i_{t},p_{t})$
		$\displaystyle=\underbrace{\;\;\max_{p\in\mathcal{P}}\sum_{t=1}^{T}r(i_{t},p)\,% -\,\max_{p\in\overline{\mathcal{P}}}\sum_{t=1}^{T}r(i_{t},p)}_{\text{Loss of % revenue due to discretization}}\,+\,\underbrace{\max_{p\in\overline{\mathcal{P% }}}\sum_{t=1}^{T}r(i_{t},p)\,-\,\sum_{t=1}^{T}r(i_{t},p_{t}).}_{\text{$\;% \stackrel{{\scriptstyle\Delta}}{{=}}\;\overline{R}_{T}$ (discretization regret% )}}$		(19)

We decompose $R_{T}$ into two regrets. The first term is the sacrifice of revenue on discretization. The second term is the algorithm regret when competing against the optimal price within the discretization set $\overline{\mathcal{P}}$ .

According to Theorem 3.1, our discretization scheme approaches optimal revenue within a gap of $\frac{2\epsilon}{1+\epsilon}$ :

\displaystyle\max_{p\in\mathcal{P}}\sum_{t=1}^{T}r(i_{t},p)\,-\,\max_{p\in% \overline{\mathcal{P}}}\sum_{t=1}^{T}r(i_{t},p)\leq\frac{2\epsilon T}{1+% \epsilon}<2\epsilon T.

(20)

Therefore, the first term can be bounded by $2\epsilon T$ .

According to Theorem B.1, the second term discretization regret is upper bounded by

\displaystyle\mathbb{E}[\overline{R}_{T}]\leq 3m\sqrt{T\log\left|\overline{% \mathcal{P}}\right|}.

(21)

Combining (20) and (21) together, we have,

\displaystyle\mathbb{E}[R_{T}]\leq 2\epsilon T+3m\sqrt{T\log\left|\overline{% \mathcal{P}}\right|}=\mathcal{O}\left(m\sqrt{T\log\left|\overline{\mathcal{P}}% \right|}\right).

(as

\epsilon=\frac{1}{\sqrt{T}}

)

Plug in the size of discretization set in Section 3, we have,

\displaystyle\mathbb{E}[R_{T}]=\widetilde{\mathcal{O}}\left(m^{\nicefrac{{3}}{% {2}}}\sqrt{T}\right).

∎

Theorem B.1.

The discretization regret $\overline{R}_{T}$ defined in (19) has upper bound $\mathcal{O}\left(m\sqrt{T\log\left|\overline{\mathcal{P}}\right|}\right)$ .

Proof of Theorem B.1.

We first claim that $r_{t}(p_{t})=r(i_{t},p_{t})$ all $t$ . If the buyer make a purchase at round $t$ , $r_{t}(p_{t})=r(i_{t},p_{t})$ holds by definition. But if the buyer does not purchase at a price $p_{t}$ on round $t$ , $r(i_{t},p_{t})=0$ . Since $S_{t}^{c}$ contains all the types that would not make a purchase at $p_{t}$ , we have $r(i,p_{t})=0,\,\forall i\in S_{t}^{c}$ , and

\displaystyle r(i_{t},p_{t})=\sum_{i\in S_{t}^{c}}r(i,p_{t})=r_{t}(p_{t})=0.

Therefore, $r_{t}(p_{t})=r(i_{t},p_{t})$ holds for every round $t\in[T]$ . Denote $p^{\star}$ as,

\displaystyle p^{\star}=\underset{p\in\overline{\mathcal{P}}}{\mathop{\mathrm{% argmax}}}\ \sum_{t=1}^{T}r(i_{t},p).

Then, we decompose the regret as follows,

$\displaystyle\mathbb{E}[R_{T}]$	$\displaystyle=\sum_{t=1}^{T}r(i_{t},p^{\star})-\mathbb{E}\left[\sum_{t=1}^{T}r% (i_{t},p_{t})\right]$
	$\displaystyle=\sum_{t=1}^{T}r(i_{t},p^{\star})-\mathbb{E}\left[\sum_{t=1}^{T}r% _{t}(p_{t})\right]$
	$\displaystyle=\mathbb{E}\left[\sum_{t=1}^{T}\left(r(i_{t},p^{\star})-r_{t}(p^{% \star})\right)\right]+\mathbb{E}\left[\sum_{t=1}^{T}r_{t}(p^{\star})-\sum_{t=1% }^{T}r_{t}(p_{t+1})\right]+\mathbb{E}\left[\sum_{t=1}^{T}r_{t}(p_{t+1})-r_{t}(% p_{t})\right].$	(22)

We bound three terms in (22) separately.

The first term. For any price $p$ and any round $t$ , we have $r_{t}(p)\geq r(i_{t},p)$ by definition. Hence,

\displaystyle\sum_{t=1}^{T}\left(r(i_{t},p^{\star})-r_{t}(p^{\star})\right)% \leq 0.

(23)

The second term. Since $p^{\star}=\underset{p\in\overline{\mathcal{P}}}{\mathop{\mathrm{argmax}}}\ % \sum_{t=1}^{T}r(i_{t},p)$ . We apply Lemma B.1 to $p^{\star}$ ,

\displaystyle\sum_{t=1}^{T}r_{t}(p^{\star})-\sum_{t=1}^{T}r_{t}(p_{t+1})\leq% \theta_{p_{1}}-\theta_{p^{\star}}.

Note that both $\theta_{p_{1}}$ and $\theta_{p^{\star}}$ are drawn i.i.d. from exponential distribution,

\displaystyle\mathbb{E}[\theta_{p_{1}}]\leq\mathbb{E}\left[\underset{p\in% \overline{\mathcal{P}}}{\max}\ \theta_{p}\right]\leq\frac{1+\log\left|% \overline{\mathcal{P}}\right|}{\theta},

\displaystyle\mathbb{E}[\theta_{p^{\star}}]\leq\mathbb{E}\left[\underset{p\in% \overline{\mathcal{P}}}{\max}\ \theta_{p}\right]\leq\frac{1+\log\left|% \overline{\mathcal{P}}\right|}{\theta}.

We have

\displaystyle\mathbb{E}\left[\sum_{t=1}^{T}r_{t}(p^{\star})-\sum_{t=1}^{T}r_{t% }(p_{t+1})\right]\leq\mathbb{E}\big{[}\theta_{p_{1}}-\theta_{p^{\star}}\big{]}% \leq\frac{1+\log\left|\overline{\mathcal{P}}\right|}{\theta}.

(24)

The third term. Note that for any price $p\in\overline{\mathcal{P}}$ and any round $t$ , $r_{t}(p)\leq m$ . Therefore we have,

\displaystyle\mathbb{E}\left[r_{t}(p_{t+1})-r_{t}(p_{t})\right]=\mathbb{P}% \left(p_{t+1}\neq p_{t}\right)\mathbb{E}\left[r_{t}(p_{t+1})-r_{t}(p_{t})\mid p% _{t+1}\neq p_{t}\right]\leq m\cdot\mathbb{P}\left(p_{t+1}\neq p_{t}\right).

The price curve on round $t$ is $p_{t}$ , then by the price updation rule,

\displaystyle p_{t}=\underset{p\in\overline{\mathcal{P}}}{\mathop{\mathrm{% argmax}}}\sum_{\tau=1}^{t-1}r_{\tau}(p)+\theta_{p},

which is equivalent to,

\displaystyle\theta_{p_{t}}\geq\theta_{p}+\sum_{\tau=1}^{t-1}r_{\tau}(p)-\sum_% {\tau=1}^{t-1}r_{\tau}(p_{t}),\,\forall p\in\overline{\mathcal{P}}.

For all ${p}^{\prime}\in\overline{\mathcal{P}}$ , let $c_{t-1,{p}^{\prime}}$ denote

\displaystyle\underset{p\in\overline{\mathcal{P}}}{\max}\left(\theta_{p}+\sum_% {\tau=1}^{t-1}r_{\tau}(p)-\sum_{\tau=1}^{t-1}r_{\tau}({p}^{\prime})\right)% \triangleq c_{t-1,{p}^{\prime}},

(25)

then $p_{t}={p}^{\prime}$ is equivalent to

\displaystyle\theta_{{p}^{\prime}}\geq c_{t-1,{p}^{\prime}}.

(26)

Subclaim. If $\theta_{p_{t}}$ also satisfies the following condition (27),

\displaystyle\theta_{p_{t}}\geq\theta_{p}+\sum_{\tau=1}^{t-1}r_{\tau}(p)-\sum_% {\tau=1}^{t-1}r_{\tau}(p_{t})+m,\,\forall p\in\overline{\mathcal{P}},

(27)

then $p_{t+1}=p_{t}$ .

Proof of the Subclaim. If (27) holds for all $p\in\overline{\mathcal{P}}$ ,

	$\displaystyle\theta_{p_{t}}$	$\displaystyle\geq\theta_{p}+\sum_{\tau=1}^{t-1}r_{\tau}(p)-\sum_{\tau=1}^{t-1}% r_{\tau}(p_{t})+m$
		$\displaystyle\geq\theta_{p}+\sum_{\tau=1}^{t}r_{\tau}(p)-\sum_{\tau=1}^{t}r_{% \tau}(p_{t}).$		(because $\forall p\in\overline{\mathcal{P}}$ , $r_{t}(p)\in[0,m]$ )

Hence,

\displaystyle p_{t}=\underset{p\in\overline{\mathcal{P}}}{\mathop{\mathrm{% argmax}}}\sum_{\tau=1}^{t}r_{\tau}(p)+\theta_{p}=p_{t+1}.

∎

Therefore, (27) is a sufficient condition for $p_{t+1}=p_{t}$ . We then bound the probability of $p_{t+1}=p_{t}$ by computing the probability of (27) happening.

$\displaystyle\mathbb{P}\left(p_{t}=p_{t+1}\right)$	$\displaystyle=\sum_{p\in\overline{\mathcal{P}}}\mathbb{P}\left(p_{t}=p\right)% \mathbb{P}(p_{t+1}=p\mid p_{t}=p)$
	$\displaystyle=\sum_{p\in\overline{\mathcal{P}}}\mathbb{P}\left(p_{t}=p\right)% \mathbb{P}\left(p_{t+1}=p\mid\theta_{p}\geq c_{t-1,{p}}\right)$	( by (26))
	$\displaystyle\geq\sum_{p\in\overline{\mathcal{P}}}\mathbb{P}\left(p_{t}=p% \right)\mathbb{P}\left(\theta_{p}\geq c_{t-1,{p}}+m\mid\theta_{p}\geq c_{t-1,{% p}}\right)$
	$\displaystyle\geq\sum_{p\in\overline{\mathcal{P}}}\mathbb{P}\left(p_{t}=p% \right)e^{-m\theta}$
	$\displaystyle=e^{-m\theta}$
	$\displaystyle\geq 1-m\theta$

Therefore, $\mathbb{P}\left(p_{t}\neq p_{t+1}\right)\leq m\theta$ . Hence, the third term can be bounded as

\displaystyle\mathbb{E}\big{[}r_{t}(p_{t+1})-r_{t}(p_{t})\big{]}\leq m^{2}% \theta\implies\sum_{t=1}^{T}\mathbb{E}\big{[}r_{t}(p_{t+1})-r_{t}(p_{t})\big{]% }\leq m^{2}\theta T.

(28)

Set $\theta=\sqrt{\frac{\log\left|\overline{\mathcal{P}}\right|}{m^{2}T}}$ . Combining the upper bounds for three terms (23), (24) and (28) together, we have

\displaystyle\mathbb{E}[R_{T}]

\displaystyle\leq\frac{1+\log\left|\overline{\mathcal{P}}\right|}{\theta}+m^{2% }\theta T\in\mathcal{O}\left(m\sqrt{T\log\left|\overline{\mathcal{P}}\right|}% \right).

Plugging in the size of the discretization set (Theorem 3.1), we have,

\displaystyle\mathbb{E}[R_{T}]\in\widetilde{\mathcal{O}}\left(m^{\nicefrac{{3}% }{{2}}}\sqrt{T}\right).

∎

Lemma B.1.

For any $p\in\overline{\mathcal{P}}$ ,

\displaystyle\sum_{t=1}^{T}r_{t}(p_{t+1})+\theta_{p_{1}}\geq\sum_{t=1}^{T}r_{t% }(p)+\theta_{p}.

(29)

Proof of Lemma B.1.

We prove this by induction. For $T=0$ , the inequality $\theta_{p_{1}}\geq\theta_{p}$ holds by definition $p_{1}=\underset{p\in\overline{\mathcal{P}}}{\mathop{\mathrm{argmax}}}\ \theta_% {p}$ . Assume that the inequality holds for some $T$ . Then for any $p\in\overline{\mathcal{P}}$ ,

	$\displaystyle\sum_{t=1}^{T+1}r_{t}(p_{t+1})+\theta_{p_{1}}$	$\displaystyle=\sum_{t=1}^{T}r_{t}(p_{t+1})+\theta_{p_{1}}+r_{T+1}(p_{T+2})$
		$\displaystyle\geq\sum_{t=1}^{T}r_{t}(p_{T+2})+\theta_{p_{T+2}}+r_{T+1}(p_{T+2})$
		$\displaystyle=\sum_{t=1}^{T+1}r_{t}(p_{T+2})+\theta_{p_{T+2}}$
		$\displaystyle\geq\sum_{t=1}^{T+1}r_{t}(p)+\theta_{p}.$

Where the first inequality is by the induction hypothesis, and the second inequality is by

\displaystyle p_{T+2}=\underset{p\in\overline{\mathcal{P}}}{\arg\max}\sum_{t=1% }^{T+1}r_{t}(p)+\theta_{p}.

By the induction, the inequality (29) holds for any $T\geq 0$ . ∎

Appendix C Proof of Theorem 4.1

In this section, we prove, Theorem 4.1, our regret upper bound of Algorithm 3. We prove the theorem by first decomposing the regret into two parts: Regret with respect to the best price in a discretized set (called “discretization regret”) and the residual error due to discretization. The residual error is controlled by the approximation guarantees developed in Section 3. Then, the key lemma in this appendix is Lemma C.1 which controls the discretization. We prove Lemma C.1 using a technique adapted from Chen et al. [14].

See 4.1

Proof of Theorem 4.1.

For the sake of simplicity, we define $r(i,p)$ as the revenue under type $i$ and price $p$ , i.e, $r(i,p)\,\stackrel{{\scriptstyle\Delta}}{{=}}\,p(n_{i,p})$ . Therefore, on every round, we have $r(i_{t},p_{t})=p_{t}(n_{i_{t},p_{t}})$ .

Recall that the regret $R_{T}$ is

$\displaystyle R_{T}$	$\displaystyle\;\stackrel{{\scriptstyle\Delta}}{{=}}\;\;T\cdot\mathrm{OPT}\,-\,% \sum_{t=1}^{T}p_{t}(n_{i_{t},p_{t}})$
	$\displaystyle\;=\;\;T\cdot\mathrm{OPT}\,-\,\sum_{t=1}^{T}r(i_{t},p_{t})$
	$\displaystyle=\underbrace{\;\;T\cdot\mathrm{OPT}\,-\,T\cdot\underset{p\in% \overline{\mathcal{P}}}{\max}\,\mathrm{rev}(p)}_{\text{Loss of revenue due to % discretization}}\,+\,\underbrace{T\cdot\underset{p\in\overline{\mathcal{P}}}{% \max}\,\mathrm{rev}(p)\,-\,\sum_{t=1}^{T}r(i_{t},p_{t}).}_{\text{$\;\stackrel{% {\scriptstyle\Delta}}{{=}}\;\overline{R}_{T}$ (discretization regret)}}$	(30)

We decompose $R_{T}$ into two parts. The first term is the sacrifice of revenue on discretization. The second term is the algorithm regret when competing against the optimal price within the discretization set $\overline{\mathcal{P}}$ .

According to Theorem 3.1, our discretization scheme approaches $\mathrm{OPT}$ within a gap of $\frac{2\epsilon}{1+\epsilon}$ ,

\displaystyle\mathrm{OPT}\,-\,\underset{p\in\overline{\mathcal{P}}}{\max}\,% \mathrm{rev}(p)\leq\frac{2\epsilon}{1+\epsilon}\leq 2\epsilon.

Therefore, the first term can be bounded as,

\displaystyle T\cdot\mathrm{OPT}\,-\,T\cdot\underset{p\in\overline{\mathcal{P}% }}{\max}\,\mathrm{rev}(p)\leq 2\epsilon T.

(31)

By Lemma C.1, the second term, discretization regret, is upper bounded by

\displaystyle\mathbb{E}[\overline{R}_{T}]\leq 93m\sqrt{T\log T}

(32)

Combining (31) and (32) together, we have,

\displaystyle\mathbb{E}[R_{T}]\leq 2\epsilon T+93m\sqrt{T\log T}=\widetilde{% \mathcal{O}}(m\sqrt{T})

( as

\epsilon=\frac{1}{\sqrt{T}}

)

∎

Lemma C.1.

The discretization regret $\overline{R}_{T}$ defined in (30) is at most $\widetilde{\mathcal{O}}(m\sqrt{T})$ .

Proof of Lemma C.1.

The discretization regret $\overline{R}_{T}$

$\displaystyle\mathbb{E}[\overline{R}_{T}]\,$	$\displaystyle=\mathbb{E}\left[\,T\cdot\underset{p\in\overline{\mathcal{P}}}{% \max}\,\mathrm{rev}(p)\,-\,\sum_{t=1}^{T}r(i_{t},p_{t})\right]$
	$\displaystyle=\mathbb{E}\left[\sum_{t=1}^{T}\left(r(p^{\star},i_{t})\,-\,r(p_{% t},i_{t})\right)\right]$
	$\displaystyle=\sum_{t=1}^{T}\mathbb{E}\left[r(p^{\star},i_{t})\,-\,r(p_{t},i_{% t})\right]$
	$\displaystyle=\sum_{t=1}^{T}\mathbb{E}\left[\mathrm{rev}(p^{\star})\,-\,% \mathrm{rev}(p_{t})\right]$
	$\displaystyle=\sum_{t=1}^{T}\mathbb{E}\left[\left(\mathrm{rev}(p^{\star})\,-\,% \mathrm{rev}(p_{t})\right)\cdot\mathbbm{I}(A_{t})\right]\,+\,\sum_{t=1}^{T}% \mathbb{E}\left[\left(\mathrm{rev}(p^{\star})\,-\,\mathrm{rev}(p_{t})\right)% \cdot\mathbbm{I}(A_{t}^{c})\right]$
	$\displaystyle\stackrel{{\scriptstyle\Delta}}{{=}}\sum_{t=1}^{T}\mathbb{E}\left% [\delta_{p_{t}}\cdot\mathbbm{I}(A_{t})\right]\,+\,\sum_{t=1}^{T}\mathbb{E}% \left[\delta_{p_{t}}\cdot\mathbbm{I}(A_{t}^{c})\right].$	(33)

We can further decompose $\mathbb{E}[\overline{R}_{T}]$ into $\sum_{t=1}^{T}\mathbb{E}\left[\delta_{p_{t}}\cdot\mathbbm{I}(A_{t})\right]$ and $\sum_{t=1}^{T}\mathbb{E}\left[\delta_{p_{t}}\cdot\mathbbm{I}(A_{t}^{c})\right]$ . Where for any round $t$ , we define the good event $A_{t}$ as follows,

\displaystyle\forall i\in\left[m\right],\quad q_{i}\leq\widehat{q}_{i,t}\leq q% _{i}+2\sqrt{\frac{\log T}{T_{i,t}}}.

Define $\overline{q}_{i,t}\stackrel{{\scriptstyle\Delta}}{{=}}\frac{\sum_{\tau=1}^{t}% \mathbbm{I}(i\in S_{\tau},i_{\tau}=i)}{T_{i,t}}=\frac{\sum_{s=1}^{t}\mathbbm{I% }(i\in S_{\tau})\cdot\mathbbm{I}(i_{\tau}=i)}{\sum_{\tau=1}^{t}\mathbbm{I}(i% \in S_{\tau})}$ . Note that $\mathbbm{I}(i_{\tau}=i)$ is a random variable that follows Bernoulli distribution $\text{Ber}(q_{i})$ , and one can only observe $\mathbbm{I}(i_{\tau}=i)$ when $i\in S_{\tau}$ , let $\overline{x}_{i,j}$ denote the mean value of first $j$ i.i.d. observations of $\mathbbm{I}(i_{s}=i)$ . Then, we have

	$\displaystyle\mathbb{P}\left(\left\|\overline{q}_{i,t}-q_{i}\right\|>\sqrt{\frac% {\log T}{T_{i,t}}}\right)$	$\displaystyle=\sum_{j=0}^{t}\mathbb{P}\left(\left\|\overline{q}_{i,t}-q_{i}% \right\|>\sqrt{\frac{\log T}{T_{i,t}}},\ \ T_{i,t}=j\right)$
		$\displaystyle\leq\sum_{j=0}^{t}\mathbb{P}\left(\left\|\overline{x}_{i,j}-q_{i}% \right\|>\sqrt{\frac{\log T}{j}}\right)$
		$\displaystyle\leq\sum_{j=0}^{t}2\exp(-2\log T)$
		$\displaystyle\leq\frac{2}{T}.$

Where in the first inequality, the event $\left\{\left|\overline{q}_{i,t}-q_{i}\right|>\sqrt{\frac{\log T}{T_{i,t}}},\ % \ T_{i,t}=j\right\}$ indicates $\left\{\left|\overline{x}_{i,j}-q_{i}\right|>\sqrt{\frac{\log T}{j}}\right\},$ and the second inequality follows from Hoeffding’s inequality.

We then bound the second term in (33)

	$\displaystyle\sum_{t=1}^{T}\mathbb{E}\left[\delta_{p_{t}}\mathbbm{I}(A_{t}^{c}% )\right]$	$\displaystyle\leq\sum_{t=1}^{T}\mathbb{E}\left[\mathbbm{I}(A_{t}^{c})\right]$
		$\displaystyle\leq\sum_{t=1}^{T}\sum_{i=1}^{m}\mathbb{P}\left(\left\|\overline{q% }_{i,t}-q_{i}\right\|>\sqrt{\frac{\log T}{T_{i,t}}}\right)$
		$\displaystyle\leq\sum_{t=1}^{T}\sum_{i=1}^{m}\frac{2}{T}$
		$\displaystyle\leq 2m.$

Define event $H_{t}\stackrel{{\scriptstyle\Delta}}{{=}}\left\{0<\delta_{p_{t}}<2\sum_{i\in S% _{t}}\sqrt{\frac{\log T}{T_{i,t-1}}}\right\}$ . By Lemma C.3, we know that

\displaystyle\mathbbm{I}(A_{t-1},\,\delta_{p_{t}}>0)\implies\mathbbm{I}\left(0% <\delta_{p_{t}}<\sum_{i\in S_{t}}2\sqrt{\frac{\log T}{T_{i,t-1}}}\right)=% \mathbbm{I}(H_{T}).

It remains to prove the upper bound for $\sum_{t=1}^{T}\mathbb{E}\left[\delta_{p_{t}}\mathbbm{I}(A_{T})\right]$ .

For $t\in\{1,\dots,T\}$ and $k\in\mathbb{Z}_{+}$ , let

\displaystyle m_{k,t}\stackrel{{\scriptstyle\Delta}}{{=}}\begin{cases}\alpha_{% k}\left(\frac{m}{\delta_{p_{t}}}\right)^{2}\log T,&\delta_{p_{t}}>0,\\ +\infty,&\delta_{p_{t}}=0,\end{cases}

and

\displaystyle A_{k,t}\stackrel{{\scriptstyle\Delta}}{{=}}\left\{i\in S_{t}:T_{% i,t-1}\leq m_{k,t}\right\}.

Then, we define an event

\displaystyle\mathcal{G}_{k,t}\stackrel{{\scriptstyle\Delta}}{{=}}\left\{\left% |A_{k,t}\right|\geq\beta_{k}m\right\},

which means “In the $t$ -th round, at least $\beta_{k}m$ types in $S_{t}$ has been observed at most $m_{k,t}$ times”.

Then, by Lemma C.5, we have

\displaystyle\sum_{t=1}^{T}\mathbbm{I}(\mathcal{H}_{t})\cdot\delta_{p_{t}}\leq% \sum_{k=1}^{\infty}\sum_{t=1}^{T}\mathbbm{I}\left(\mathcal{G}_{k,t},\delta_{p_% {t}}>0\right)\cdot\delta_{p_{t}}.

For $i\in[m],k\in\mathbb{Z}_{+},t\in[T]$ , define an event

\displaystyle\mathcal{G}_{i,k,t}\stackrel{{\scriptstyle\Delta}}{{=}}\mathcal{G% }_{k,t}\cap\left\{i\in S_{t},\,T_{i,t-1}\leq m_{k,t}\right\}.

Then by the definitions of $\mathcal{G}_{k,t}$ and $\mathcal{G}_{i,k,t}$ we have

\displaystyle\mathbbm{I}\left(\mathcal{G}_{k,t},\,\delta_{p_{t}}>0\right)\leq% \frac{1}{\beta_{k}m}\sum_{i\in E_{\mathrm{B}}}\mathbbm{I}\left(\mathcal{G}_{i,% k,t},\,\delta_{p_{t}}>0\right).

Therefore,

\displaystyle\sum_{t=1}^{T}\mathbbm{I}(\mathcal{H}_{t})\cdot\delta_{p_{t}}\leq% \sum_{i\in E_{\mathrm{B}}}\sum_{k=1}^{\infty}\sum_{t=1}^{T}\mathbbm{I}\left(% \mathcal{G}_{i,k,t},\,\delta_{p_{t}}>0\right)\cdot\frac{\delta_{p_{t}}}{\beta_% {k}m}.

For any price function $p$ , define $\delta_{p}\stackrel{{\scriptstyle\Delta}}{{=}}\mathrm{rev}(p^{\star})-\mathrm{% rev}(p)$ . If $\delta_{p}>0$ , we call it a “bad” price. Let $E_{B}\stackrel{{\scriptstyle\Delta}}{{=}}\left\{i\in[m]:\text{type }i\text{ % would make a purchase at least one bad price}\right\}$ .

For each type $i\in E_{\mathrm{B}}$ , suppose $i$ is contained in $N_{i}$ bad prices $p_{i,1}^{\mathrm{B}},p_{i,2}^{\mathrm{B}},\ldots,p_{i,N_{i}}^{\mathrm{B}}$ . Let $\delta_{i,l}\stackrel{{\scriptstyle\Delta}}{{=}}\delta_{p_{i,l}^{\mathrm{B}}}% \left(l\in\left[N_{i}\right]\right)$ . Without loss of generality, we assume $\delta_{i,1}\geq\delta_{i,2}\geq\cdots\geq\delta_{i,N_{i}}$ . Let $\delta_{i,\min}\stackrel{{\scriptstyle\Delta}}{{=}}\delta_{i,N_{i}}$ . For convenience, we also define $\delta_{i,0}=+\infty$ , i.e., $\alpha_{k}\left(\frac{2m}{\delta_{i,0}}\right)^{2}=0$ . Then, we have

		$\displaystyle\sum_{t=1}^{T}\mathbbm{I}\left(\mathcal{H}_{t}\right)\delta_{p_{t}}$
	$\displaystyle\leq$	$\displaystyle\sum_{i\in E_{\mathrm{B}}}\sum_{k=1}^{\infty}\sum_{t=1}^{T}% \mathbbm{I}\left(\mathcal{G}_{i,k,t},\,\delta_{p_{t}}>0\right)\frac{\delta_{p_% {t}}}{\beta_{k}m}$
	$\displaystyle=$	$\displaystyle\sum_{i\in E_{\mathrm{B}}}\sum_{k=1}^{\infty}\sum_{t=1}^{T}\sum_{% l=1}^{N_{i}}\mathbbm{I}\left(\mathcal{G}_{i,k,t},\,p_{t}=p_{i,l}^{\mathrm{B}}% \right)\frac{\delta_{p_{t}}}{\beta_{k}m}$
	$\displaystyle=$	$\displaystyle\sum_{i\in E_{\mathrm{B}}}\sum_{k=1}^{\infty}\sum_{t=1}^{T}\sum_{% l=1}^{N_{i}}\mathbbm{I}\left(\mathcal{G}_{i,k,t},\,p_{t}=p_{i,l}^{\mathrm{B}}% \right)\frac{\delta_{i,l}}{\beta_{k}m}$
	$\displaystyle\leq$	$\displaystyle\sum_{i\in E_{\mathrm{B}}}\sum_{k=1}^{\infty}\sum_{t=1}^{T}\sum_{% l=1}^{N_{i}}\mathbbm{I}\left(T_{i,t-1}\leq m_{k,t},\,p_{t}=p_{i,l}^{\mathrm{B}% }\right)\frac{\delta_{i,l}}{\beta_{k}m}$
	$\displaystyle=$	$\displaystyle\sum_{i\in E_{\mathrm{B}}}\sum_{k=1}^{\infty}\sum_{t=1}^{T}\sum_{% l=1}^{N_{i}}\mathbbm{I}\left(T_{i,t-1}\leq\alpha_{k}\left(\frac{2m}{\delta_{i,% l}}\right)^{2}\log T,\,p_{t}=p_{i,l}^{\mathrm{B}}\right)\frac{\delta_{i,l}}{% \beta_{k}m}$
	$\displaystyle=$	$\displaystyle\sum_{i\in E_{\mathrm{B}}}\sum_{k=1}^{\infty}\sum_{t=1}^{T}\sum_{% l=1}^{N_{i}}\sum_{j=1}^{l}\mathbbm{I}\left(\alpha_{k}\left(\frac{2m}{\delta_{i% ,j-1}}\right)^{2}\log T<T_{i,t-1}\leq\alpha_{k}\left(\frac{2m}{\delta_{i,j}}% \right)^{2}\log T,\,p_{t}=p_{i,l}^{\mathrm{B}}\right)\frac{\delta_{i,l}}{\beta% _{k}m}$
	$\displaystyle\leq$	$\displaystyle\sum_{i\in E_{\mathrm{B}}}\sum_{k=1}^{\infty}\sum_{t=1}^{T}\sum_{% l=1}^{N_{i}}\sum_{j=1}^{l}\mathbbm{I}\left(\alpha_{k}\left(\frac{2m}{\delta_{i% ,j-1}}\right)^{2}\log T<T_{i,t-1}\leq\alpha_{k}\left(\frac{2m}{\delta_{i,j}}% \right)^{2}\log T,\,p_{t}=p_{i,l}^{\mathrm{B}}\right)\frac{\delta_{i,j}}{\beta% _{k}m}$
	$\displaystyle\leq$	$\displaystyle\sum_{i\in E_{\mathrm{B}}}\sum_{k=1}^{\infty}\sum_{t=1}^{T}\sum_{% l=1}^{N_{i}}\sum_{j=1}^{N_{i}}\mathbbm{I}\left(\alpha_{k}\left(\frac{2m}{% \delta_{i,j-1}}\right)^{2}\log T<T_{i,t-1}\leq\alpha_{k}\left(\frac{2m}{\delta% _{i,j}}\right)^{2}\log T,\,p_{t}=p_{i,l}^{\mathrm{B}}\right)\frac{\delta_{i,j}% }{\beta_{k}m}$
	$\displaystyle\leq$	$\displaystyle\sum_{i\in E_{\mathrm{B}}}\sum_{k=1}^{\infty}\sum_{t=1}^{T}\sum_{% j=1}^{N_{i}}\mathbbm{I}\left(\alpha_{k}\left(\frac{2m}{\delta_{i,j-1}}\right)^% {2}\log T<T_{i,t-1}\leq\alpha_{k}\left(\frac{2m}{\delta_{i,j}}\right)^{2}\log T% ,\,i\in S_{t}\right)\frac{\delta_{i,j}}{\beta_{k}m}$
	$\displaystyle\leq$	$\displaystyle\sum_{i\in E_{\mathrm{B}}}\sum_{k=1}^{\infty}\sum_{j=1}^{N_{i}}% \left(\alpha_{k}\left(\frac{2m}{\delta_{i,j}}\right)^{2}\log T-\alpha_{k}\left% (\frac{2m}{\delta_{i,j-1}}\right)^{2}\log T\right)\frac{\delta_{i,j}}{\beta_{k% }m}$
	$\displaystyle=$	$\displaystyle 4m\left(\sum_{k=1}^{\infty}\frac{\alpha_{k}}{\beta_{k}}\right)% \log T\cdot\sum_{i\in E_{\mathrm{B}}}\sum_{j=1}^{N_{i}}\left(\frac{1}{\delta_{% i,j}^{2}}-\frac{1}{\delta_{i,j-1}^{2}}\right)\delta_{i,j}$
	$\displaystyle\leq$	$\displaystyle 1068m\log T\cdot\sum_{i\in E_{\mathrm{B}}}\sum_{j=1}^{N_{i}}% \left(\frac{1}{\delta_{i,j}^{2}}-\frac{1}{\delta_{i,j-1}^{2}}\right)\delta_{i,% j},$

where the last inequality is due to Lemma C.4. Finally, for each $i\in E_{\mathrm{B}}$ we have

	$\displaystyle\sum_{j=1}^{N_{i}}\left(\frac{1}{\delta_{i,j}^{2}}-\frac{1}{% \delta_{i,j-1}^{2}}\right)\delta_{i,j}$	$\displaystyle=\frac{1}{\delta_{i,N_{i}}}+\sum_{j=1}^{N_{i}-1}\frac{1}{\delta_{% i,j}^{2}}\left(\delta_{i,j}-\delta_{i,j+1}\right)$
		$\displaystyle\leq\frac{1}{\delta_{i,N_{i}}}+\int_{\delta_{i,N_{i}}}^{\delta_{i% ,1}}\frac{1}{x^{2}}\mathrm{d}x$
		$\displaystyle=\frac{2}{\delta_{i,N_{i}}}-\frac{1}{\delta_{i,1}}$
		$\displaystyle\leq\frac{2}{\delta_{i,\min}}.$

It follows that

\displaystyle\sum_{t=1}^{T}\mathbbm{I}(\mathcal{H}_{t})\cdot\delta_{p_{t}}\leq 1% 068m\log T\cdot\sum_{i\in E_{\mathrm{B}}}\frac{2}{\delta_{i,\min}}=m\sum_{i\in E% _{\mathrm{B}}}\frac{2136}{\delta_{i,\min}}\log T

(34)

So far, the distribution-dependent regret bound is proven. To prove the distribution-independent bound, we decompose $\sum_{t=1}^{T}\mathbbm{I}(\mathcal{H}_{t})\cdot\delta_{p_{t}}$ into two parts:

	$\displaystyle\sum_{t=1}^{T}\mathbbm{I}(\mathcal{H}_{t})\cdot\delta_{p_{t}}$	$\displaystyle=\sum_{t=1}^{T}\mathbbm{I}\left(\mathcal{H}_{t},\,\delta_{p_{t}}% \leq\epsilon\right)\cdot\delta_{p_{t}}+\sum_{t=1}^{T}\mathbbm{I}\left(\mathcal% {H}_{t},\,\delta_{p_{t}}>\epsilon\right)\cdot\delta_{p_{t}}$
		$\displaystyle\leq\epsilon T+\sum_{t=1}^{T}\mathbbm{I}\left(\mathcal{H}_{t},% \delta_{p_{t}}>\epsilon\right)\cdot\delta_{p_{t}},$

where $\epsilon>0$ is a constant to be determined. The second term can be bounded in the same way as in the proof of the distribution-dependent regret bound, except that we only consider the case $\delta_{p_{t}}>\epsilon$ . (For each type $i\in E_{\mathrm{B}}$ , suppose $i$ is contained in $N_{i}$ bad prices $p_{i,1}^{\mathrm{B}},p_{i,2}^{\mathrm{B}},\ldots,p_{i,N_{i}}^{\mathrm{B}}$ . Let $\delta_{i,l}\stackrel{{\scriptstyle\Delta}}{{=}}\delta_{p_{i,l}^{\mathrm{B}}}% \left(l\in\left[N_{i}\right]\right)$ satisfies $\delta_{i,1}\geq\delta_{i,2}\geq\ldots\geq\delta_{i,N_{i}}\geq\epsilon$ . Also let $\delta_{i,\min}\stackrel{{\scriptstyle\Delta}}{{=}}\delta_{i,N_{i}}$ .) Thus, we can replace (34) by

\displaystyle\sum_{t=1}^{T}\mathbbm{I}\left(\mathcal{H}_{t},\delta_{p_{t}}>% \epsilon\right)\cdot\delta_{p_{t}}\leq m\cdot\sum_{i\in E_{\mathrm{B}},\delta_% {i,\min}>\epsilon}\frac{2136}{\delta_{i,\min}}\log T\leq\frac{2136m^{2}}{% \epsilon}\log T.

It follows that

\displaystyle\sum_{t=1}^{T}\mathbbm{I}(\mathcal{H}_{t})\cdot\delta_{S_{t}}\leq% \epsilon\,T+\ \frac{2136m^{2}}{\epsilon}\log T.

Finally, letting $\epsilon=\sqrt{\frac{2136m^{2}\log T}{T}}$ , we get

\displaystyle\sum_{t=1}^{T}\mathbbm{I}(\mathcal{H}_{t})\cdot\delta_{S_{t}}\leq 2% \sqrt{2136m^{2}T\log T}\leq 93\sqrt{m^{2}T\log T}.

∎

Lemma C.2.

Under good event $A_{t}$ , for any price function $p$ , let $S_{p}$ denote the set of types who would purchase at price $p$ , then we have

\displaystyle\forall t\in[T],\quad\mathrm{rev}(p)\leq\widehat{\mathrm{rev}}_{t% }(p)\leq\mathrm{rev}(p)+\sum_{i\in S_{p}}2\sqrt{\frac{\log T}{T_{i,t}}}.

Proof of Lemma C.2.

When $A_{t}$ happens,

\displaystyle q_{i}\leq\widehat{q}_{i,t}\leq q_{i}+2\sqrt{\frac{\log T}{T_{i,t% }}},

for all $i\in[m]$ .

Therefore, we have

\displaystyle\widehat{\mathrm{rev}}_{t}(p)=\sum_{i=1}^{m}\widehat{q}_{i,t}% \cdot r(i,p)\geq\sum_{i=1}^{m}q_{i}\cdot r(i,p)=\mathrm{rev}(p)

and

\displaystyle\widehat{\mathrm{rev}}_{t}(p)=\sum_{i=1}^{m}\widehat{q}_{i,t}% \cdot r(i,p)\leq\sum_{i=1}^{m}\left(q_{i}+2\sqrt{\frac{\log T}{T_{i,t}}}\right% )\cdot r(i,p)\leq\mathrm{rev}(p)+\sum_{i\in S_{p}}2\sqrt{\frac{\log T}{T_{i,t}% }}.

The last inequality is by $r(i,p)\leq 1$ . ∎

Lemma C.3.

For each $t\in[T]$ , under good event $A_{t-1}$ , the following inequality holds,

\displaystyle\delta_{p_{t}}\stackrel{{\scriptstyle\Delta}}{{=}}\mathrm{rev}(p^% {\star})-\mathrm{rev}(p_{t})\leq 2\sum_{i\in S_{t}}\sqrt{\frac{\log T}{T_{i,t-% 1}}}.

Proof of Lemma C.3.

When $A_{t-1}$ happens, by Lemma C.2,

\displaystyle\mathrm{rev}(p^{\star})\leq\widehat{\mathrm{rev}}_{t-1}(p^{\star}),

\displaystyle\mathrm{rev}(p_{t})\geq\widehat{\mathrm{rev}}_{t-1}(p_{t})-2\sum_% {i\in S_{t}}\sqrt{\frac{\log T}{T_{i,t-1}}}.

It then follows that,

\displaystyle\delta_{p_{t}}=\mathrm{rev}(p^{\star})-\mathrm{rev}(p_{t})\leq% \widehat{\mathrm{rev}}_{t-1}(p^{\star})-\left(\widehat{\mathrm{rev}}_{t-1}(p_{% t})-2\sum_{i\in S_{t}}\sqrt{\frac{\log T}{T_{i,t-1}}}\right)

Since $p_{t}=\mathop{\mathrm{argmax}}_{p\in\overline{\mathcal{P}}}\widehat{\mathrm{% rev}}_{t-1}(p)$ , we have

\displaystyle\widehat{\mathrm{rev}}_{t-1}(p_{t})\geq\widehat{\mathrm{rev}}_{t-% 1}(p^{\star}).

∎

Lemma C.4 (Theorem 4 of Kveton et al. [36]).

We can choose $\left\{\alpha_{k}\right\}_{k\geq 0}$ and $\left\{\beta_{k}\right\}_{k\geq 0}$ , which satisfy the following properties: $\left\{\alpha_{k}\right\}_{k\geq 0}$ and $\left\{\beta_{k}\right\}_{k\geq 0}$ are positive and

\displaystyle\alpha_{1}>\alpha_{2}>\ldots\ \text{ and }\ 1=\beta_{0}>\beta_{1}% >\beta_{2}>\ldots,

such that $\lim_{k\rightarrow\infty}\alpha_{k}=\lim_{k\rightarrow\infty}\beta_{k}=0$ . Moreover,

\displaystyle\sqrt{6}\sum_{k=1}^{\infty}\frac{\beta_{k-1}-\beta_{k}}{\sqrt{% \alpha_{k}}}\leq 1,\text{ and }\sum_{k=1}^{\infty}\frac{\alpha_{k}}{\beta_{k}}% <267.

Lemma C.5.

On round $t$ , if event $\mathcal{H}_{t}$ happens, then at least one event $\mathcal{G}_{k,t},\,k\in\mathbb{Z}_{+}$ happens, where

\displaystyle\mathcal{G}_{k,t}\stackrel{{\scriptstyle\Delta}}{{=}}\left\{\left% |A_{k,t}\right|\geq\beta_{k}m\right\},\quad\text{where }A_{k,t}\stackrel{{% \scriptstyle\Delta}}{{=}}\left\{i\in S_{t}:T_{i,t-1}\leq m_{k,t}\right\},

and $m_{k,t}=\alpha_{k}\left(\frac{m}{\delta_{p_{t}}}\right)^{2}\log T$ when $\delta_{p_{t}}>0$ and $+\infty$ otherwise.

Proof of Lemma C.5.

Assume that $\mathcal{H}_{t}$ happens and that none of $\mathcal{G}_{1,t},\mathcal{G}_{2,t},\ldots$ happens. Then $\left|A_{k,t}\right|<\beta_{k}m$ for all $k\in\mathbb{Z}_{+}$ . Let $A_{0,t}=S_{t}$ and $\bar{A}_{k,t}=S_{t}\backslash A_{k,t}$ for $k\in\mathbb{Z}_{+}\cup\{0\}$ . Thus $\bar{A}_{k-1,t}\subseteq\bar{A}_{k,t}$ for all $k\in\mathbb{Z}_{+}$ . Note that $\lim_{k\rightarrow\infty}m_{k,t}=0$ . Thus there exists $N\in\mathbb{Z}_{+}$ such that $\bar{A}_{k,t}=S_{t}$ for all $k\geq N$ , and then we have $S_{t}=\bigcup_{k=1}^{\infty}\left(\bar{A}_{k,t}\backslash\bar{A}_{k-1,t}\right)$ . Finally, note that for all $i\in\bar{A}_{k,t}$ , we have $T_{i,t-1}>m_{k,t}$ . Therefore

	$\displaystyle\sum_{i\in S_{t}}\frac{1}{\sqrt{T_{i,t-1}}}$	$\displaystyle=\sum_{k=1}^{\infty}\sum_{i\in\bar{A}_{k,t}\backslash\bar{A}_{k-1% ,t}}\frac{1}{\sqrt{T_{i,t-1}}}\leq\sum_{k=1}^{\infty}\sum_{i\in\bar{A}_{k,t}% \backslash\bar{A}_{k-1,t}}\frac{1}{\sqrt{m_{k,t}}}$
		$\displaystyle=\sum_{k=1}^{\infty}\frac{\left\|\bar{A}_{k,t}\backslash\bar{A}_{k% -1,t}\right\|}{\sqrt{m_{k,t}}}=\sum_{k=1}^{\infty}\frac{\left\|A_{k-1,t}% \backslash A_{k,t}\right\|}{\sqrt{m_{k,t}}}=\sum_{k=1}^{\infty}\frac{\left\|A_{k% -1,t}\right\|-\left\|A_{k,t}\right\|}{\sqrt{m_{k,t}}}$
		$\displaystyle=\frac{\left\|S_{t}\right\|}{\sqrt{m_{1,t}}}+\sum_{k=1}^{\infty}% \left\|A_{k,t}\right\|\left(\frac{1}{\sqrt{m_{k+1,t}}}-\frac{1}{\sqrt{m_{k,t}}}\right)$
		$\displaystyle<\frac{m}{\sqrt{m_{1,t}}}+\sum_{k=1}^{\infty}\beta_{k}m\left(% \frac{1}{\sqrt{m_{k+1,t}}}-\frac{1}{\sqrt{m_{k,t}}}\right)$
		$\displaystyle=\sum_{k=1}^{\infty}\frac{\left(\beta_{k-1}-\beta_{k}\right)m}{% \sqrt{m_{k,t}}}.$

Under event $\mathcal{H}_{t}$ , we have

	$\displaystyle\delta_{p_{t}}$	$\displaystyle\leq\sum_{i\in S_{t}}2\sqrt{\frac{\log T}{T_{i,t-1}}}=2\sqrt{\log T% }\cdot\sum_{i\in S_{t}}\frac{1}{\sqrt{T_{i,t-1}}}$
		$\displaystyle<2\sqrt{\log T}\cdot\sum_{k=1}^{\infty}\frac{\left(\beta_{k-1}-% \beta_{k}\right)m}{\sqrt{m_{k,t}}}=2\sum_{k=1}^{\infty}\frac{\beta_{k-1}-\beta% _{k}}{\sqrt{\alpha_{k}}}\cdot\delta_{p_{t}}\leq\delta_{p_{t}},$

where the last inequality is due to Lemma C.4. We reach a contradiction here, hence the lemma follows. ∎

Appendix D Miscellaneous

D.1 Notations

The following table contains the notations used in this paper.

Notation	Meaning
$N$	The total amount of data.
$n\in[N]$	The number of data.
$m$	The number of types.
$p:[N]\rightarrow[0,1]$	A price curve.
$\overline{\mathcal{P}}$	A set of discretized price curves.
$v_{i}:[N]\rightarrow[0,1]$	The valuation curve for type $i\in[m]$ .
$\mathcal{V}=\left\{v_{i}:i\in[m]\right\}$	The set of all valuation curves.
$n_{i,p}$	The amount of data type $i\in[m]$ purchases at price curve $p$ .
$r(i,p)=p(n_{i,p})$	The revenue from type $i\in[m]$ under price curve $p$ .
$q=(q_{1},q_{2},\dots,q_{m})$	The type distribution.
$\mathrm{rev}(p)$	The expected revenue under price $p$ .
$i_{t}\in[m]$	The type of buyer on round $t\in[T].$
$p_{t}:[N]\rightarrow[0,1]$	The price curve on round $t\in[T]$ .
$S_{t}$	The set of types that would make a purchase at price $p_{t}$ .
$S_{p}$	The set of types that would make a purchase at price $p$ .
$T_{i,t}\stackrel{{\scriptstyle\Delta}}{{=}}\sum_{\tau=1}^{t}\mathbbm{I}(i\in S% _{\tau})$	The number of times that type $i$ appears in set $S_{\tau}$ for $\tau\in\{1,\dots,t\}$ .
$\mathcal{P}=\{p\in[N]\rightarrow[0,1]:p(0)=0\}$	The set of all pricing curves.
$L$	Smoothness constant of valuation curves.
$J$	Diminishing return constant of valuation curves.

Table 3: Table of notations.

	$\displaystyle\left\|\overline{\mathcal{P}}\right\|$	$\displaystyle=\sum_{i=1}^{m}\binom{N-1}{i}\binom{\left\|W\right\|}{i}$
		$\displaystyle\leq\left(\sum_{i=1}^{m}\binom{N-1}{i}\right)\left(\sum_{i=1}^{m}% \binom{\left\|W\right\|}{i}\right)$
		$\displaystyle\leq\left(\sum_{i=0}^{m}\binom{N-1}{i}\right)\left(\sum_{i=0}^{m}% \binom{\left\|W\right\|}{i}\right)$
		$\displaystyle\leq\left(\frac{e(N-1)}{m}\right)^{m}\left(\frac{e\left\|W\right\|}% {m}\right)^{m}$
		$\displaystyle\leq\left(\frac{e(N-1)}{m}\right)^{m}\left(e\lceil(2+\epsilon)% \rceil\left\lceil\log_{1+\epsilon}\frac{1}{\epsilon}\right\rceil\right)^{m}$

	$\displaystyle\mathbb{P}\left(\left\|\overline{q}_{i,t}-q_{i}\right\|>\sqrt{\frac% {\log T}{T_{i,t}}}\right)$	$\displaystyle=\sum_{j=0}^{t}\mathbb{P}\left(\left\|\overline{q}_{i,t}-q_{i}% \right\|>\sqrt{\frac{\log T}{T_{i,t}}},\ \ T_{i,t}=j\right)$
		$\displaystyle\leq\sum_{j=0}^{t}\mathbb{P}\left(\left\|\overline{x}_{i,j}-q_{i}% \right\|>\sqrt{\frac{\log T}{j}}\right)$
		$\displaystyle\leq\sum_{j=0}^{t}2\exp(-2\log T)$
		$\displaystyle\leq\frac{2}{T}.$