Learning to Price Homogeneous Data
Abstract
We study a data pricing problem, where a seller has access to homogeneous data points (e.g. drawn i.i.d. from some distribution). There are types of buyers in the market, where buyers of the same type have the same valuation curve , where is the value for having data points. A priori, the seller is unaware of the distribution of buyers, but can repeat the market for rounds so as to learn the revenue-optimal pricing curve . To solve this online learning problem, we first develop novel discretization schemes to approximate any pricing curve. When compared to prior work, the size of our discretization schemes scales gracefully with the approximation parameter, which translates to better regret in online learning. Under assumptions like smoothness and diminishing returns which are satisfied by data, the discretization size can be reduced further. We then turn to the online learning problem, both in the stochastic and adversarial settings. On each round, the seller chooses an anonymous pricing curve . A new buyer appears and may choose to purchase some amount of data. She then reveals her type only if she makes a purchase. Our online algorithms build on classical algorithms such as UCB and FTPL, but require novel ideas to account for the asymmetric nature of this feedback and to deal with the vastness of the space of pricing curves. Using the improved discretization schemes previously developed, we are able to achieve regret in the stochastic setting and regret in the adversarial setting.
1 Introduction
Due to the rise in popularity of machine learning, there is an increased demand for data. However, not all users of data have the wherewithal to collect data on their own, and have to rely on data marketplaces to acquire the data they need. For example, a materials data platform (e.g. [17]), may have collected vast amounts of data from various proprietary sources. Materials scientists in smaller organizations and academia, who do not have large experimental apparatuses, may wish to purchase this data to aid in their research. Similarly, small businesses may wish to purchase customer data for advertising and product recommendations [5, 4], while small technology companies may wish to purchase data about cloud operations to optimize their computing infrastructure [3, 2].
Model. Motivated by the emergence of such data marketplaces, we study the following online data pricing problem. A seller has access to homogeneous data points, (e.g. drawn i.i.d. from some distribution). He wishes to sell the data to a sequence of distinct buyers over rounds, and intends to achieve large revenue. There are types of buyers in the data marketplace, with all buyers in type having the same valuation curve for the data, where represents the buyer’s value for having points. As data is homogeneous, we can treat an agent’s value as a function of the amount of data (we will illustrate this in the sequel). Valuation curves are monotone non-decreasing, as more data is better. At each round , the seller chooses a price curve , where is the price to the buyer for purchasing data points. Then a buyer with type arrives and purchases an amount of data that maximizes her utility (value minus price), provided that she can achieve non-negative utility. A buyer will reveal her type to the seller only if she makes a purchase, and only after she makes the purchase. The seller has knowledge of valuation curves of the types, but does not know the distribution over types (stochastic setting), or the buyer sequence (adversarial setting). Moreover, he cannot practice non-anonymous (discriminatory) pricing, as he needs to choose the pricing curve without knowledge of the buyer’s type on that round.
While there is extensive research on revenue-optimal pricing and learning to price, data marketplaces merit special attention, both due to their recent emergence and the unique characteristics of data. Typically the number of data (number of goods) is very large, but data usually satisfies additional properties such as smoothness (an agent’s value does not increase significantly with a small amount of additional data) and diminishing returns (additional data is more valuable when a buyer has less data). To illustrate further, note that two steps are essential to develop an effective online learning solution for data pricing. (1) First, we need to solve the planning problem, i.e. find a revenue-optimal pricing curve when the type distribution is known. (2) Second, when is unknown, we need to combine the algorithm in step (1) with estimates for to maximize long-term revenue.
Methods in the existing literature fall short in both steps. (1) When the type distribution is known, the data pricing problem resembles an ordered item pricing problem, which is known to be NP-hard [12, 24]. Hence prior work has aimed at approximating the optimal pricing curves via discretization schemes. Unfortunately, existing discretization schemes have poor, often exponential, dependence on the approximation parameter . However, achieving sublinear regret in online learning requires choosing that vanishes with longer time horizons, i.e. as . Therefore, directly using existing discretization schemes in an online setting leads to poor statistical and computational properties of the associated online algorithm. This requires us to leverage the above properties of data to design discretization schemes with better dependence on . (2) While there is prior work on learning optimal prices [32, 26, 21], these techniques either fall short of addressing the complexities in our setting, or fail to account for the properties of data, and hence do not scale gracefully when the amount of data is very large. Moreover, in our online learning setup, the seller faces a trade-off between setting high prices to maximize instantaneous revenue versus setting low prices so as to guarantee a purchase, which results in the buyer revealing their type, which in turn can be helpful in future rounds. Prior work has studied this asymmetric feedback model only in single-item markets [22, 46] which is significantly simpler, and only in the stochastic setting.
1.1 Summary of our contributions
Our contributions in this work are threefold: (1) First, in §3, we develop discretization schemes for revenue-optimal data pricing under a variety of assumptions, which we will use later in our online learning schemes. (2) In §4, we study learning a revenue-optimal price in a stochastic setting, where the customer types on each round are drawn from a fixed but unknown distribution . (3) Finally, in §5, we study online learning when the buyer types are chosen by an oblivious adversary.
1. Discretization (approximation) schemes for revenue-optimal data pricing. Assuming only monotonicity, we show that there is a discretization of size which is an additive approximation to any pricing curve. When compared to prior work [13, 24], our discretization scheme has smaller dependence on when the number of types is small (see Table 1). This will be useful, both statistically and computationally, when we study the online setting, as we need to choose as to achieve sublinear regret. This is still quite large in real-world data marketplaces, where may be very large. Hence, we also study two other assumptions. First, when valuations are smooth, satisfying an -Lipschitz-like condition, we construct a discretization of size , which has no dependence on . Next, under a diminishing returns condition, we construct a discretization of size , with only has polylog dependence on .
Key algorithmic insights. We first show that when there are only types, for any price function , there exists an “m-step” price function whose expected revenue is at least as much as that of on any type distribution . An -step function is a non-decreasing function where and differ at most times. This allows us to focus on -step functions, significantly narrowing the space of pricing functions when . We then consider discretizations of the data space and valuations and apply this insight to construct discretizations of pricing curves.
Algorithm | Assumptions | Size of discretization | Reference |
Hartline and Koltun [24] | – | – | |
Chawla et al. [13] | M | – | |
Algorithm 1 (ours) | M, F | Theorem 3.1 | |
Algorithm 5 (ours) | M, F, S | Theorem 3.2 | |
Algorithm 2 (ours) | M, F, D | Theorem 3.3 |
2. Learning to price in the stochastic setting. Next, we turn to the online learning problem described in the beginning in a stochastic setting. On each round, our algorithm computes an upper confidence bound (UCB) [8, 37] on the revenue for each price curve in the discretization previously developed; we then choose the price curve with the highest UCB. There are two challenges in realizing this scheme: First, naively maintaining UCBs for each price leads to large confidence intervals, and hence large regret as the size of the discretization is still quite large; instead, we construct confidence intervals on estimates of the type distribution, and translate them to UCBs for the revenue. Second, due to the asymmetric nature of the feedback, the construction and analysis of these confidence intervals is delicate, and requires novel ideas. As summarized in Table 2, this algorithm achieves a bound on the regret for any discretization scheme, including those from prior work. In the stochastic setting, the key advantage of our discretization schemes is computational.
3. Learning to price in the adversarial setting. Next, we study learning in an adversarial setting. Our algorithm builds on the Follow-the-Perturbed-leader (FTPL) [30], but is adapted to account for the fact that there may be no feedback on all rounds. For this, we use the information we have about the valuation curves to keep track of which customers would not have made a purchase given a price curve. If a purchase is made and we observe feedback, we use the usual FTPL update, but if not, we reward each pricing curve with the sum of revenue of all types that would not purchase in that current round. Table 2 shows the regret and time complexity of this learning method when paired with various discretization schemes. In the adversarial setting, our discretization schemes offer both computational and statistical advantages when compared to prior work.
1.2 Related work
Dynamic pricing. The online posted-price mechanism, also known as dynamic pricing, is a central research area in algorithmic market design [32, 18]. In the most classical setting [32], the seller sets a price for an item in each round, and a buyer purchases the item only if their valuation exceeds the posted price. While several extensions of this setting have been explored for both parametric [31, 19, 11, 27, 28, 45] and non-parametric [10, 43, 16, 38, 39] demands, most focus on single-parameter demands, i.e., selling a single item to buyers. Our data pricing problem is multi-parameter, as demands are parameterized by multiple outcomes, i.e. the number of data points.
Bayesian unit-demand pricing problem. Formally, our data pricing problem is a variant of the Bayesian Unit-demand Pricing Problem (BUPP) [12]. BUPP addresses the problem of (offline) revenue maximization over a known distribution of unit-demand buyers, meaning they want to buy at most one item from the inventory. In BUPP, a seller has distinct items to sell to a unit-demand buyer whose valuations are , where is the value of the th item. Given prices , the unit-demand buyer purchases a single item that maximizes their utility: . Assuming the valuation profile follows a known distribution , the goal of BUPP is to find the best prices that maximize the seller’s expected revenue.
Our data pricing problem is a variant of BUPP in two ways: (1) We study the sequential setting where type distributions are unknown, while valuation profiles for each type are known, and (2) We assume monotonic values , which is natural in data pricing. Unfortunately, BUPP is a computationally intractable problem, as is ours. BUPP is known to be NP-hard even when is a product distribution [15]. Moreover, even assuming that values are monotonic (i.e., ), the problem remains (strongly) NP-hard [13]. Therefore, we aim to provide a reasonably efficient no-regret algorithm for our problem, especially when the number of types is a fixed constant.
The previous works most relevant to our paper are Hartline and Koltun [24] and Chawla et al. [13], which study offline revenue maximization for unit-demand buyers. Buyers in our problem are also unit-demand, as each amount of data points can be seen as an individual item. Revenue maximization for unit-demand buyers is known to be computationally intractable [23], even with ordered (monotonic) buyer values [13], leading these works to focus on approximation algorithms. Hartline and Koltun [24] proposed an approximation algorithm with near-linear runtime in the number of buyers, given a fixed number of items. Chawla et al. [13] introduced a polynomial-time approximation scheme (PTAS) for unit-demand buyers with monotonic values. In this work, we extend the framework to the online setting with partial feedback, which has more practical implications.
Market design for data-sharing. In recent years, there has been a plethora of work devoted to algorithmic market design for data sharing [6, 7, 29, 42]. These works provide ingenious solutions to challenges unique to the data market, such as free replicability and the difficulty of valuation due to the combinatorial nature of data. Except for Agarwal et al. [6], the above-cited solutions are inherently offline or single-shot. While we focus on a simplified yet relevant setting where data comes from a single source, resulting in monotonic valuations, in this work, we tackle the problem in a sequential, dynamic setting, which has practical importance. In contrast to our approach, Agarwal et al. [6] considered the price to be a constant (i.e., a scalar rather than a price vector) to address the inherent computational intractability of multi-dimensional pricing. Instead, we maintain the price as a vector (i.e., a price function) but focus on cases where the valuation function satisfies natural properties such as monotonicity, smoothness, and diminishing returns.
2 Problem setting, assumptions, and challenges
A seller has homogeneous data points. There are types of buyers who wish to purchase this data. A buyer of type has a valuation curve , where is her value for data points. We will assume is non-decreasing as more data is valuable, and further that .
Example 1.
To motivate this model, consider a seller with ordered data points , drawn i.i.d. from a distribution . If a buyer purchases points, she receives the first points, . Her ex-post value may represent the accuracy of her ML model trained with . However, as the buyer has not seen the data before the purchase, she does not know which specific points she will receive, and hence her (ex-ante) value is the expected model accuracy when i.i.d points are drawn from . The different types could be buyers who use the data for different tasks or models. For instance, with ImageNet’s [20], 1.4 million data points, different types of buyers could perform different learning tasks such as object detection, identification, and segmentation, and/or train different models such as AlexNet [35], ResNet [25], and GoogLeNet [41]. Both empirically and theoretically, for many learning tasks, is non-decreasing, and satisfies additional characteristics such as smoothness and/or diminishing returns.
Pricing curves, buyer utility, and buyer purchase model. Let be a pricing curve chosen by the seller. Let denote the set of all pricing curves. If a buyer purchases points, her utility is . If a buyer can achieve non-negative utility, i.e. for some , she will purchase an amount of data to maximize her utility. To fully specify the buyer’s purchase model, we will assume that when there are multiple which maximizes her utility, she will choose the largest such . Formally, for a given pricing curve , a buyer of type will purchase points where,
(1) |
Optimal revenue. It follows that the revenue from a buyer of type is . Let be the distribution of the buyers. Under this distribution , the expected revenue for a price curve , the optimal price , and the optimal revenue as follows:
(2) |
We have omitted the dependence on in , , and . There is no closed-form solution to finding the optimal pricing curve, even when is known. Therefore, in §3, we explore discretization methods to approximate , which will then be used in §4 and §5 to develop online learning algorithms. Unfortunately, the size of this discretization can be very large in and without further assumptions. Therefore, we also consider two additional commonly satisfied conditions by data.
Our first such assumption states that buyer valuation curves satisfy a Lipschitz-like smoothness condition with Lipschitz constant . We use instead of since the number of data has a range , while the valuations only have a range . This condition states that a buyer’s valuation does not change significantly if she only purchases a few additional points.
Assumption 1 (Smoothness, S).
For all , we have .
Our second condition is based on the fact that data typically exhibits diminishing returns [34, 33]. This means that an additional data point is more valuable when there is less data, i.e. is decreasing with . We will in fact make a stronger assumption, and justify it below.
Assumption 2 (Diminishing returns, D).
There exists some such that, for all types , and for all , we have .
Assumption 2 quantifies the rate of decrease of diminishing returns. Following Example 1, the valuation (accuracy) curves for many learning problems take the form ; for instance, for binary classification in a VC class , may be the best accuracy in , where is the VC dimension, and [40]; similarly, for nonparametric regression of a twice differentiable function, and are constants while [44]. In such cases, Assumption 2 is satisfied with . Note that neither assumption subsumes the other: a non-concave Lipschitz function will not satisfy Assumption 2, while a suitable for a function which satisfies Assumption 2 may need to be very large for Assumption 1 to hold for small .
2.1 Learning to price in online settings
In this work, we will also study how a seller may learn to maximize revenue. In our learning problem, the seller is aware of the valuation curves of each type, but does not know the distribution of types (stochastic setting) or there may be no such distribution (adversarial setting).
Setup. The seller repeats the data market for rounds. At the beginning of each round, he chooses some price curve . After the seller has chosen , a new buyer of type appears and purchases amount of data (see (1)). The buyer is aware of her own valuation curve. If she makes a purchase, that is if , she pays to the seller and reveals her type . Otherwise, the buyer will make no payment and not reveal her type.
We have assumed that a priori, the seller is aware of the buyer valuation curves , and that buyers are aware of their own valuation curves. In Example 1, a seller can profile how different machine learning models perform with different amounts of data and publish them ahead of time. The buyers can also gauge their value from these curves, even though they do not have access to the data. Next, we have also assumed that buyers will reveal their type after the purchase. In modern machine learning as a service platforms [1, 17, 4], buyers directly run their jobs in the seller’s computing platform, so the seller can observe the buyers job type directly. Even if this is not the case, sellers can elicit this information via questionnaires and reviews from customers who have made a purchase [22].
Challenges. Despite these assumptions, the learning problem remains challenging for two main reasons. First, the space of price curves is vast: discretizing the valuations in into bins, still leaves possible price curves, which is both statistically and computationally intractable, especially for large . Second, in addition to the exploration-exploitation trade-off usually encountered in sequential decision-making, the seller faces a tension between high instantaneous revenue and information acquisition: setting high prices can yield high immediate revenue if a purchase occurs, but it also increases the risk of no purchase, resulting in no revenue and crucially no feedback about the buyer type which could help him in future rounds. This trade-off was recently studied for single-item markets in a stochastic setting [22, 46], but is more complex in our multi-item problem. Moreover, to our knowledge, no existing work addresses this asymmetric feedback model in an adversarial setting, even for single-item markets. Next, we describe the buyer arrival model and define the regret for the learning problem in both stochastic and adversarial settings.
Stochastic setting. Here, there is some fixed but unknown distribution of types . On each round, a buyer of type is drawn independently. The optimal expected revenue under type distribution is as defined in (2). The regret is as defined below. We wish to design algorithms which have small expected regret , where the expectation accounts for both the sampling of types and any randomness in the algorithm. We have,
(3) |
Adversarial setting. Here, the types on each round are chosen arbitrarily, possibly by an oblivious adversary, ahead of time. The type on round is revealed to the seller only at the end of the round, and only if there is a purchase. In the adversarial setting, we define our regret with respect to the single best price in in hindsight. We wish to design algorithms with small expected regret , where the expectation is with respect to any randomness in the algorithm. We have,
(4) |
3 Efficient discretization of price curves with small errors
We first study the revenue maximization problem in the offline setting, where the seller knows both the valuation curves , and the type distribution . Our goal is to design a discretization so as to achieve revenue within a gap of from . Before discussing our discretization algorithms, we first show that the optimal pricing curve is “simple” when there are at most types.
Lemma 3.1.
Assume there are types with non-decreasing value curves . For any non-decreasing price curve , there exists an “-step” price curve that yields expected revenue at least that of with respect to any distribution over the types. Here, -step refers to non-decreasing functions where in at most points (i.e., at most jumps).
Lemma 3.1, proven in Appendix A.1, will be an important tool in all three discretization algorithms of this section. It will allow us to reduce the space of pricing curves as we only need to focus on -step price curves. Next, we present our first discretization procedure in Algorithm 1, which only assumes the monotonicity of the valuation curves.
Discretization scheme under monotonic valuations. Our discretization proecdure, outlined in Algorithm 1, adapts the method in Hartline and Koltun [24] using Lemma 3.1. For this, we will first construct a discretization of the valuation space as follows. Let , be the powers of on price space . For each , we let be a uniform discretization of the interval uniformly with gap . Finally, let be the union of all such . According to Lemma 3.1, every price function in has the same revenue as an -step function. We set to be all choices of non-decreasing -step functions that take value in . We have the following theorem about Algorithm 1 which we prove in Appendix A.2.
Theorem 3.1.
Consider the discretization as constructed in Algorithm 1. For any type distribution, there exists such that . Moreover, we have .
Discretization scheme for smooth monotonic valuations. Due to space constraints, we present our algorithm, under Assumption 1 in Appendix A.3. We have the following theorem about Algorithm 5.
Theorem 3.2.
Discretization scheme for monotone valuations under diminishing returns. Finally, we study discretization schemes under the diminishing returns condition. Our procedure, outlined in Algorithm 2 proceeds as follows. We use the same discretization of the valuation space from Algorithm 1. Next, we will discretize the dataspace . To exploit the structure in the diminishing returns condition, we will need to do so more densely when is small. For this, let , be the powers of on data space . For each , the set further partitions the interval uniformly with gap . For smaller than , we do not discretize it as the valuations may change rapidly when is small. Let be the union of and all the set . Therefore, has a size of at most . We have the following theorem about Algorithm 2 which we prove in Appendix A.5.
Theorem 3.3.
Proof outline. By Lemma 3.1, we may assume the optimal price curve is an -step function, where denote the value of on step . We generate an -step price curve on space such that is obtained by rounding down to the closest value in , and . We then show that if a buyer purchases at step under price , she will not purchase at step under new price . Therefore, the revenue from this buyer is at least , which ensures that .
4 Online learning in the stochastic setting
We now study the online learning problem outlined in §2.1 in the stochastic setting. Our Algorithm, outlined in Algorithm 3 is based on the classical upper confidence bound (UCB) algorithm for stochastic bandits [8, 37]. It takes a discretization of the pricing curves as input, and on each round chooses a which has the largest UCB on the revenue.
The key technical challenge in realizing this scheme is in the construction of the UCB. As is large, naively constructing our UCBs over prices in will lead to a term in the UCB (say, when applying a union bound), and hence the regret. Instead, we will maintain UCBs for the type distribution, which will only have a term, and translate them to UCBs for the revenue. However, as we will see below, the analysis when constructing the UCB this way is nontrivial since we observe the types only if they make a purchase. In particular, our UCB depends on the number of times a buyer could have purchased at a given round, which is a random quantity that depends on the algorithm itself. We will first outline how we construct the UCBs.
Construction of UCB. We will now show how to construct the upper confidence bound at the end of round , which will be used in computing . For , let , defined below in (5), be the set of types who would have purchased in round at price had they appeared in that round. Then, for any type , we define to be the number of times that type appears in set for . That is, measures the number of times a buyer of type would have purchased during the first rounds. We have,
(5) |
Note that as we use the price function on round 1, i.e. , we have for all . Next, we estimate via the fraction of times that type has appeared in the past rounds, provided that for . We have defined this quantity, below in (6). Via a standard application of Hoeffding’s inequality, we can show that with high probability. Using this, we can construct an upper confidence bound as follows,
(6) |
We now translate the UCBs on to the UCBs on the revenue. Recall from (1) that a buyer of type will purchase points at price and the revenue from this buyer will be . Note that as the seller has access to the valuation curves, he can compute for any and price curve . Since , we have the following natural UCB for on round :
(7) |
This completes the description of our construction. The following theorem bounds the regret for Algorithm 3 when paired with any of the discretization schemes in §3. While the computational complexity of our method depends on , there is no dependence on the regret because of the above construction of the UCB. The proof is given in Appendix C.
Theorem 4.1.
Proof challenges. When bounding the regret, we first observe that the subsets induces a partitioning of the price curves, where belongs to the partition of , if all types in would make a purchase at price , and all types in would not make a purchase at price . With this insight, we can view the action of a seller as not just choosing a price curve, but also choosing a set . That is, can be viewed as a super-arm in a combinatorial semi-bandit problem [36].
5 Online learning in the adversarial setting
We now study the adversarial setting. Similar to the stochastic setting, our algorithm will use a discretization of the price curves from §3. We will control regret by bounding both the discretization error and the algorithm’s regret relative to the best pricing curve in the discretization.
Before proceeding, let us first contextualize our feedback model against prior work. If the buyers do not reveal their types, this becomes an adversarial bandit problem with arms (pricing curves) [32]. Using an algorithm such as EXP-3 [9] results in large regret, which is not ideal due to ’s exponential dependence in . Conversely, if buyers reveal their types regardless of purchase, this is equivalent to full information feedback, where algorithms such as Hedge or Follow-the-perturbed-leader (FTPL) [30] yield regret, translating to with our discretization schemes in §3. In our intermediate regime, where feedback is only revealed upon purchase, we aim for a middle ground. We show our algorithm, outlined in Algorithm 4, achieves regret, which is worse than full information, but still depends polynomially on .
Our algorithm takes a discretization and a perturbation parameter as input. First, it samples a random perturbation from an exponential distribution with pdf for each pricing curve in . It maintains rewards for each round and price curve . On each round, it chooses the price curve that maximizes the perturbed cumulative reward .
This scheme is similar to FTPL, but the key difference is in how we design the rewards . To describe this, let , defined exactly as in (5), be the set of agents who would have purchased in round at price . At the end of the round, if there was a purchase, for all prices , we set the reward to be , i.e. the payment we would have received from the buyer at that round, had the price been (see (1)). If there was no purchase, we know that , in which case we set . In this case, is an upper bound on , and this upper bound is tight around prices similar to the chosen price ; in fact, if there was no purchase. Intuitively, deals with the uncertainty of not knowing the type on round by providing a large reward (as we are taking the sum) to prices that could have resulted in a purchase, which encourages exploration of such prices in future rounds. This intuition will help us bound the regret.
6 Conclusion
We designed revenue-optimal learning algorithms for pricing data. First, we leveraged properties like smoothness and diminishing returns to create novel discretization schemes for approximating any pricing curve. These schemes were then used in our learning algorithms to improve their statistical and computational properties. Our algorithms build on classical methods like UCB and FTPL but required significant adaptations to handle the vast space of pricing curves and the asymmetric feedback. An interesting future direction would be to relax the assumption that the seller knows the valuation curves .
References
- aws [a] AWS Forecast. https://aws.amazon.com/forecast/, a. Accessed: 2024-05-12.
- aws [b] AWS Data Hub. https://aws.amazon.com/blogs/big-data/tag/datahub/, b. Accessed: 2024-05-11.
- [3] Azure Data Share. https://azure.microsoft.com/en-us/products/data-share. Accessed: 2024-05-10.
- [4] Delta Sharing. https://docs.databricks.com/en/data-sharing/index.html. Accessed: 2024-05-11.
- [5] Ads Data Hub. https://developers.google.com/ads-data-hub/guides/intro. Accessed: 2022-05-10.
- Agarwal et al. [2019] A. Agarwal, M. Dahleh, and T. Sarkar. A marketplace for data: An algorithmic solution. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 701–726, 2019.
- Agarwal et al. [2020] A. Agarwal, M. Dahleh, T. Horel, and M. Rui. Towards data auctions with externalities. arXiv preprint arXiv:2003.08345, 2020.
- Auer [2002] P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422, 2002.
- Auer et al. [2002] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
- Besbes and Zeevi [2009] O. Besbes and A. Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operations Research, 57(6):1407–1420, 2009.
- Besbes and Zeevi [2015] O. Besbes and A. Zeevi. On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Science, 61(4):723–739, 2015.
- Chawla et al. [2007] S. Chawla, J. D. Hartline, and R. Kleinberg. Algorithmic pricing via virtual valuations. In Proceedings of the 8th ACM Conference on Electronic Commerce, pages 243–251, 2007.
- Chawla et al. [2022] S. Chawla, R. Rezvan, Y. Teng, and C. Tzamos. Pricing ordered items. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 722–735, 2022.
- Chen et al. [2016] W. Chen, W. Hu, F. Li, J. Li, Y. Liu, and P. Lu. Combinatorial multi-armed bandit with general reward functions. Advances in Neural Information Processing Systems, 29, 2016.
- Chen et al. [2014] X. Chen, I. Diakonikolas, D. Paparas, X. Sun, and M. Yannakakis. The complexity of optimal multidimensional pricing. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1319–1328. SIAM, 2014.
- Cheung et al. [2017] W. C. Cheung, D. Simchi-Levi, and H. Wang. Dynamic pricing and demand learning with limited price experimentation. Operations Research, 65(6):1722–1731, 2017.
- Citrine Informatics [2024] Citrine Informatics. Citrine Informatics – Accelerating Materials Innovation. URL: https://citrine.io/, 2024. Accessed: March 9, 2024.
- Den Boer [2015] A. V. Den Boer. Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys in Operations Research and Management Science, 20(1):1–18, 2015.
- den Boer and Zwart [2014] A. V. den Boer and B. Zwart. Simultaneously learning and optimizing using controlled variance pricing. Management Science, 60(3):770–783, 2014.
- Deng et al. [2009] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Dudík et al. [2020] M. Dudík, N. Haghtalab, H. Luo, R. E. Schapire, V. Syrgkanis, and J. W. Vaughan. Oracle-efficient online learning and auction design. Journal of the ACM (JACM), 67(5):1–57, 2020.
- Guo et al. [2023] W. Guo, N. Haghtalab, K. Kandasamy, and E. Vitercik. Leveraging reviews: Learning to price with buyer and seller uncertainty. In Proceedings of the 24th ACM Conference on Economics and Computation, pages 816–816, 2023.
- Guruswami et al. [2005] V. Guruswami, J. D. Hartline, A. R. Karlin, D. Kempe, C. Kenyon, and F. McSherry. On profit-maximizing envy-free pricing. In SODA, volume 5, pages 1164–1173, 2005.
- Hartline and Koltun [2005] J. D. Hartline and V. Koltun. Near-optimal pricing in near-linear time. In Proceedings of the 9th International Conference on Algorithms and Data Structures, WADS’05, page 422–431, Berlin, Heidelberg, 2005. Springer-Verlag. ISBN 3540281010. doi: 10.1007/11534273_37. URL https://doi.org/10.1007/11534273_37.
- He et al. [2016] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Jagadeesan et al. [2021] M. Jagadeesan, A. Wei, Y. Wang, M. Jordan, and J. Steinhardt. Learning equilibria in matching markets from bandit feedback. Advances in Neural Information Processing Systems, 34:3323–3335, 2021.
- Javanmard [2017] A. Javanmard. Perishability of data: dynamic pricing under varying-coefficient models. The Journal of Machine Learning Research, 18(1):1714–1744, 2017.
- Javanmard and Nazerzadeh [2019] A. Javanmard and H. Nazerzadeh. Dynamic pricing in high-dimensions. The Journal of Machine Learning Research, 20(1):315–363, 2019.
- Jia et al. [2019] R. Jia, D. Dao, B. Wang, F. A. Hubis, N. Hynes, N. M. Gürel, B. Li, C. Zhang, D. Song, and C. J. Spanos. Towards efficient data valuation based on the Shapley value. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1167–1176. PMLR, 2019.
- Kalai and Vempala [2005] A. Kalai and S. Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
- Keskin and Zeevi [2014] N. B. Keskin and A. Zeevi. Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Operations Research, 62(5):1142–1167, 2014.
- Kleinberg and Leighton [2003] R. Kleinberg and T. Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 594–605. IEEE, 2003.
- Krause and Guestrin [2011] A. Krause and C. Guestrin. Submodularity and its applications in optimized information gathering. ACM Transactions on Intelligent Systems and Technology (TIST), 2(4):1–20, 2011.
- Krause et al. [2008] A. Krause, H. B. McMahan, C. Guestrin, and A. Gupta. Robust submodular observation selection. Journal of Machine Learning Research, 9(12), 2008.
- Krizhevsky et al. [2012] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
- Kveton et al. [2015] B. Kveton, Z. Wen, A. Ashkan, and C. Szepesvari. Tight regret bounds for stochastic combinatorial semi-bandits. In Artificial Intelligence and Statistics, pages 535–543. PMLR, 2015.
- Lai and Robbins [1985] T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4–22, 1985.
- Misra et al. [2019] K. Misra, E. M. Schwartz, and J. Abernethy. Dynamic online pricing with incomplete information using multiarmed bandit experiments. Marketing Science, 38(2):226–252, 2019.
- Perakis and Singhvi [2023] G. Perakis and D. Singhvi. Dynamic pricing with unknown nonparametric demand and limited price changes. Operations Research, 2023.
- Shalev-Shwartz and Ben-David [2014] S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
- Szegedy et al. [2015] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
- Wang et al. [2020] T. Wang, J. Rausch, C. Zhang, R. Jia, and D. Song. A principled approach to data valuation for federated learning. Federated Learning: Privacy and Incentive, pages 153–167, 2020.
- Wang et al. [2021] Y. Wang, B. Chen, and D. Simchi-Levi. Multimodal dynamic pricing. Management Science, 67(10):6136–6152, 2021.
- Wasserman [2006] L. Wasserman. All of nonparametric statistics. Springer Science & Business Media, 2006.
- Xu and Wang [2021] J. Xu and Y.-X. Wang. Logarithmic regret in feature-based dynamic pricing. Advances in Neural Information Processing Systems, 34:13898–13910, 2021.
- Zhao and Chen [2019] H. Zhao and W. Chen. Stochastic one-sided full-information bandit. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 150–166. Springer, 2019.
Appendix A Omitted Details from Section 3
A.1 Proof of Lemma 3.1
See 3.1
Proof of Lemma 3.1.
Fix a price curve . Let be the amount of data type purchase at price curve , that is
For , let be a permutation such that . Let . Then, define a function as follows,
so that has at most steps. Then, has following properties,
We next prove that for any , after changing the price function from to , the type buyer either purchases at or at .
For any type and any amount of data , there exists such that (let ), we then have
(as is non-decreasing and is a step function.) | ||||
(as ) | ||||
(as maximizes the buyer’s utility.) | ||||
(as ) |
As shown in the above, type still prefers purchasing data over all under price .
For , by the monotonicity of value curves, we have
Therefore, for any , type either purchases at , or purchases at under price . No matter in which case, type contributes no less revenue under than . It then follows that, for any type distribution ,
∎
A.2 Proof of Theorem 3.1
In this subsection, we prove Theorem 3.1 by decomposing it into three technical lemmas (Lemma A.1, A.2 and A.3). In Lemma A.1 and A.2, we prove the approximation guarantee of our discretization scheme and, in Lemma A.3 we provide an upper bound on the size of the discretization.
Lemma A.1.
For any type distribution, there exists a pricing function such that
Proof of Lemma A.1.
Consider the optimal pricing function , i.e., . Consider price curve where .
Let be the set of data quantities whose price under are the same as those under . Any buyer type who would have purchased amount of data under will purchase the same amount of data under . On the other hand, for buyer types who would have purchased amount of data under , since for , the expected revenue contribution from such buyers under is at most , hence no matter they purchase or not under , we have . ∎
Lemma A.2.
For any there exists such that , for any type distribution .
Proof of Lemma A.2.
For buyer types, by Lemma 3.1, there exists a non-decreasing step function with at most steps, whose expected revenue is at least . Assume has steps, . To simplify the notation, for , let denote the price on th step. That is,
Where are discontinuities in .
Recall the definitions of and as stated in Algorithm 1,
Let and for each , let be the price obtained by rounding down to the nearest value in . By constructions of and above, is a partition of interval . Let be the price obtained by rounding down to the nearest value in . Set and consider -step function defined by whose price at th step (denoted ) is , that is
By the tie-breaking rule and the monotonicity of valuation curves, buyers only purchase among number of data under and .
Subclaim. Then, and satisfies the following
(8) |
with respect to any type distribution.
Proof of the Subclaim. We prove the above subclaim with two steps.
Step 1: No buyer who prefers to purchase data under would prefer data for some under (i.e., one with a less price). This is because, when going from price to , the increase in the buyer’s utility for data is , which is higher than the increase for data. Formally, this can be seen as follows: For any we have,
as and . Moreover,
(9) |
The inequality holds because is the result of rounding down to the nearest value in .
By constructions of sets and , we have which implies . Then, by combining the above inequalities, we obtain
(10) |
Consider a buyer with value curve who prefers to purchase at under price , then it must be
(11) |
Then, by combining (10) and (11), we have
therefore the buyer would not purchase at under .
Step 2: Next, we claim that for all step . Since is obtained by rounding down to the nearest value in , we have
(12) |
By (9) and the above, we have
where the first inequality is by (9), the second is by (12), and the third is because .
Then, it follows that
So far we have proved and no type wants to change their preference to a smaller amount of data under . If one type purchase at under and under for , then . Therefore, we have
Since the construction of price is not relevant to type distribution, the above holds for any type distribution , which proves the subclaim. ∎
Note that constructed in the above subclaim is not necessarily non-decreasing as a larger amount of data surfers more price deduction when going from to . In this case, we can directly construct a non-decreasing price curve from such that
Let . If is empty, this implies that is non-decreasing, hence setting . If is not empty, we define as follows: Let be a -step function with the same jump points as . Let be the value of on th step. Then, for , let ; and for , let . By construction, is non-decreasing. Moreover, on set and on set .
Next, we claim that is non-decreasing for all . Both and are non-decreasing with respect to by the previous results. Hence,
(as ) | ||||
(as ) |
Therefore, any type that prefers to purchase at th step under would not prefer purchasing at any step under , and since , we have
∎
Lemma A.3.
When , .
Proof of Lemma A.3.
For any integer , the number of non-decreasing -step price function is , hence we have
In the last inequality, we use the fact that . ∎
A.3 Price discretization scheme for smooth monotonic valuations
A.4 Proof of Theorem 3.2
Discretization scheme for smooth monotonic valuations. We study discretization schemes to approximate monotone valuations under the smoothness condition in Assumption 1. Our procedure is outlined in Algorithm 5. The discretization of the valuation space follows Algorithm 1. Additionally, we uniformly split the data space into multiples of , denoting them as the set . We then set the discretization to be the class of all “-step” price curves on the function space . The following theorem, proven in Appendix A.4, outlines the main properties of this discretization scheme: the size of the discretization has no dependence on the number of data .
See 3.2
Proof of Theorem 3.2.
By Lemma 3.1, there is a revenue optimal price curve which is a -step function, for some . Where can be compactly represented as the following set of tuples:
where denote the locations of jumps and denote the value of on step (i.e. for ).
Let . Next, we generate a price using Algorithm 6, which ensures that the price curve generated in the following step (13) is non-decreasing. We demonstrate that in each round of Algorithm 6, we incur a revenue loss of at most . If , everything remains the same and thus does not affect the expected revenue. If not, we combine the price of step with step , let for . During this process, buyers either make purchases at the same step, or switch to purchase at a higher step. Note that , so the revenue loss of each type is at most . This implies that the revenue loss in each round is at most . As there are rounds, we lose expected revenue of at most . We conclude that is within a gap of from , i.e., .
After combining some steps in Algorithm 6, Assume that is a -step function () represented by
Then, we define a new price curve as follows: let , then is a -step function represented by
where
(13) |
First, we show that no buyer who purchases at step under would purchase at step under . Let the buyer’s valuation be . First, we prove that the buyer’s utility is non-negative at :
(by -Smoothness of .) | ||||
(as .) | ||||
Then, we prove that the buyer’s utility at is larger than that of for , therefore, the buyer would not prefer buying at step under price .
(by -Smoothness of .) | ||||
(as ) | ||||
(as ) | ||||
(as the buyer prefers than under .) |
Finally, fix the type distribution , then we have
(as .) |
Hence, is within a gap of from .
We then apply Theorem 3.1 to price . Therefore, it is enough to consider price functions from the set to to approximate the revenue within gap. Moreover, this discretization is of the size as . ∎
A.5 Proof of Theorem 3.3
See 3.3
Proof of Theorem 3.3.
For each , let , and be the set , i.e., splits the interval equally into parts.
The union of s and the set form a set of grids on , denoted by . There are at most grids in total.
By Lemma 3.1, there is a revenue optimal price curve which is a -step function, for some . Where can be compactly represented as the following set of tuples:
where denote the locations of jumps and denote the value of on step (i.e. for ).
Then, define a new -step price curve via
where is given by
Then we define below. If , let ; otherwise, let be the price obtained by rounding down to the nearest value in . By constructions of and above, is a partition of interval . Let be the price obtained by rounding down to the nearest value in . Set . Then define .
First, we prove for satisfying , if a buyer purchases at under price , she will not purchase at under new price . We prove this property separately when and .
(i) When .
The buyer’s utility at under price is,
(14) |
Let . Then is upper bounded by,
(15) |
where the third inequality is due to Lemma A.4.
By the construction of , we have
(16) |
Therefore, by (14), , buyer’s utility at under price is non-negative.
Next, we claim that . To prove this, for any , let , then we have
Where because the buyer prefers over under price . Recall that we have , then we bound as follows,
(17) |
By the construction of , we have,
(as ) | ||||
(as ) | ||||
(18) |
Therefore, combining (17) and (18) together, we have
We conclude that under price , the buyer prefers over , for any .
(ii) When .
In this case, , and for any , we still have . First, we prove the buyer’s utility at under is non-negative:
Then, we show that the buyer prefers over under :
where the first inequality is due to (18), and the second is because the buyer prefers over under .
So far we have completed the proof that for satisfying , if a buyer purchases at under price , she will not purchase at under new price .
Then, similar to Step 2 in the proof of Lemma A.2, we have pointwise. We then conclude the proof by observing
∎
Lemma A.4.
When , we have .
Proof of Lemma A.4.
By the construction of discretization set, must have the following form,
Since is obtained by rounding down to the nearest grid in , satisfies the following inequality,
Therefore, we have
Where in the last inequality, since is an integer, and we have
∎
Appendix B Proof of Theorem 5.1
See 5.1
Proof of Theorem 5.1.
Recall that the regret for the adversarial setting is
(19) |
We decompose into two regrets. The first term is the sacrifice of revenue on discretization. The second term is the algorithm regret when competing against the optimal price within the discretization set .
According to Theorem 3.1, our discretization scheme approaches optimal revenue within a gap of :
(20) |
Therefore, the first term can be bounded by .
Theorem B.1.
The discretization regret defined in (19) has upper bound .
Proof of Theorem B.1.
We first claim that all . If the buyer make a purchase at round , holds by definition. But if the buyer does not purchase at a price on round , . Since contains all the types that would not make a purchase at , we have , and
Therefore, holds for every round . Denote as,
Then, we decompose the regret as follows,
(22) |
We bound three terms in (22) separately.
The first term. For any price and any round , we have by definition. Hence,
(23) |
The second term. Since . We apply Lemma B.1 to ,
Note that both and are drawn i.i.d. from exponential distribution,
We have
(24) |
The third term. Note that for any price and any round , . Therefore we have,
The price curve on round is , then by the price updation rule,
which is equivalent to,
For all , let denote
(25) |
then is equivalent to
(26) |
Subclaim. If also satisfies the following condition (27),
(27) |
then .
Lemma B.1.
For any ,
(29) |
Appendix C Proof of Theorem 4.1
In this section, we prove, Theorem 4.1, our regret upper bound of Algorithm 3. We prove the theorem by first decomposing the regret into two parts: Regret with respect to the best price in a discretized set (called “discretization regret”) and the residual error due to discretization. The residual error is controlled by the approximation guarantees developed in Section 3. Then, the key lemma in this appendix is Lemma C.1 which controls the discretization. We prove Lemma C.1 using a technique adapted from Chen et al. [14].
See 4.1
Proof of Theorem 4.1.
For the sake of simplicity, we define as the revenue under type and price , i.e, . Therefore, on every round, we have .
Recall that the regret is
(30) |
We decompose into two parts. The first term is the sacrifice of revenue on discretization. The second term is the algorithm regret when competing against the optimal price within the discretization set .
Lemma C.1.
The discretization regret defined in (30) is at most .
Proof of Lemma C.1.
The discretization regret
(33) |
We can further decompose into and . Where for any round , we define the good event as follows,
Define . Note that is a random variable that follows Bernoulli distribution , and one can only observe when , let denote the mean value of first i.i.d. observations of . Then, we have
Where in the first inequality, the event indicates and the second inequality follows from Hoeffding’s inequality.
We then bound the second term in (33)
For and , let
and
Then, we define an event
which means “In the -th round, at least types in has been observed at most times”.
Then, by Lemma C.5, we have
For , define an event
Then by the definitions of and we have
Therefore,
For any price function , define . If , we call it a “bad” price. Let .
For each type , suppose is contained in bad prices . Let . Without loss of generality, we assume . Let . For convenience, we also define , i.e., . Then, we have
It follows that
(34) |
So far, the distribution-dependent regret bound is proven. To prove the distribution-independent bound, we decompose into two parts:
where is a constant to be determined. The second term can be bounded in the same way as in the proof of the distribution-dependent regret bound, except that we only consider the case . (For each type , suppose is contained in bad prices . Let satisfies . Also let .) Thus, we can replace (34) by
It follows that
Finally, letting , we get
∎
Lemma C.2.
Under good event , for any price function , let denote the set of types who would purchase at price , then we have
Lemma C.3.
For each , under good event , the following inequality holds,
Lemma C.4 (Theorem 4 of Kveton et al. [36]).
We can choose and , which satisfy the following properties: and are positive and
such that . Moreover,
Lemma C.5.
On round , if event happens, then at least one event happens, where
and when and otherwise.
Proof of Lemma C.5.
Assume that happens and that none of happens. Then for all . Let and for . Thus for all . Note that . Thus there exists such that for all , and then we have . Finally, note that for all , we have . Therefore
Under event , we have
where the last inequality is due to Lemma C.4. We reach a contradiction here, hence the lemma follows. ∎
Appendix D Miscellaneous
D.1 Notations
The following table contains the notations used in this paper.
Notation | Meaning |
---|---|
The total amount of data. | |
The number of data. | |
The number of types. | |
A price curve. | |
A set of discretized price curves. | |
The valuation curve for type . | |
The set of all valuation curves. | |
The amount of data type purchases at price curve . | |
The revenue from type under price curve . | |
The type distribution. | |
The expected revenue under price . | |
The type of buyer on round | |
The price curve on round . | |
The set of types that would make a purchase at price . | |
The set of types that would make a purchase at price . | |
The number of times that type appears in set for . | |
The set of all pricing curves. | |
Smoothness constant of valuation curves. | |
Diminishing return constant of valuation curves. |