After providing the algorithms and rigorous theoretical analysis, in this section we present comprehensive experiments to evaluate our proposal in practical scenarios. To do so, using real datasets, we first provide a proof-of-concept experiment that highlights the motivation of finding rank-regret representatives. We will then turn our attention to evaluating the performance of different algorithms under various settings.
8.1 Experiments Setup
Datasets. We used real datasets in the experiments. All values were normalized into the range \([0,1]\) and discretized into granularity of 0.01.
•
BlueNile dataset3: Blue Nile (BN) is the largest online diamond retailer in the world. We collected its catalog that contained 116,300 diamonds at the time of our collection. We considered the scalar attributes
Carat,
Depth,
LengthWidthRatio,
Table, and
Price. For all attributes, except
Price, higher values were preferred. The value of diamonds is sensitive to these measurement such that small changes in scores may mean a lot in terms of the quality of the jewel. For example, while the listed diamonds at Blue Nile range from 0.23 carat to 20.97 carat, minor changes in the carat affect the price. We considered two similar diamonds, where one was 0.5 carat in weight and the other was 0.53 carat. Even though all other measures were similar, the second diamond was 30% more expensive than the first one. This is also true for
Depth,
LengthWidthRatio, and
Table. The phenomenon that
slight changes in the scores may dramatically affect the value (and the rank) of the items highlights the motivation of rank-regret.
•
US Department of Transportation flight dataset4: The
US Department of Transportation (DoT) database is widely used by third-party websites to identify the on-time performance of flights, routes, airports, and airlines. After removing the records with missing values, the dataset contains 457,892 records, for all flights conducted by the 14 US carriers in the last months of 2017; we consider the scalar attributes
Dep-Delay,
Taxi-Out,
Actual-elapsed-time,
Arrival-Delay,
Air-time, and
Distance for our experiments.
As mentioned, BN and DoT datasets are 5D and 6D in their entirety. We generated \(d\) -dimensional versions (where \(d \in [2, 5]\) for BN and \(d \in [2, 6]\) for DoT) of each dataset by including the first \(d\) attributes in the order mentioned earlier.
Algorithms Evaluated. We will evaluate all the algorithms proposed in this article under different settings. Specifically, we will present two sets of experiments for the 2D and multi-dimensional cases (MD) where \(d \ge 3\) , involving the following algorithms:
•
net-extreme-skyline (2D, MD): Proposed in Section
3, this algorithm uses sampling to construct an
\(\epsilon\) -net. It further shrinks the size of the
\(\epsilon\) -net by removing the dominated items. Following Theorem
4 and Lemma
3, using a sample size of
\(O(\frac{n}{k} \log n)\) the algorithm returns a
\(k\) -representative set with probability at least
\(1-1/n^2\) .
•
exact (2D): The exact 2D algorithm works based on Theorem
7 and finds an optimal
\(k\) -representative by constructing an envelop chain as a sequence of line segments in
\(\tilde{O}(nk)\) time.
•
shallow-cutting (2D): The shallow-cutting algorithm, proposed in Section
5.3, provides a 2D regret-approximation algorithm for finding a
\((2 + \delta)k\) -representative of size at most
\({\it OPT}\) , where
\(\delta \gt 0\) can be an arbitrarily small constant, in
\(O(n\log n)\) time. The value of
\(\delta\) was set to 1, i.e., the regret approximation ratio was 3.
•
k-set (2D, MD): The \(k\) -set algorithm is a size-approximation algorithm that provides a \(k\) -representative of size at most \({\it OPT}\cdot O(\log {\it OPT})\) by first enumerating the \(k\) -sets and modeling the problem as an instance of hitting set. We will see in our experiments that this algorithm is expected to perform well when \(k\) is small.
•
rand-k-set (2D, MD): Due to the high complexity of enumerating the
\(k\) -sets, the randomized algorithm in Section
6 serves as a practical alternative for enumerating the
\(k\) -sets. Algorithm rand-
\(k\) -set is the same as the
\(k\) -set algorithm, except that the former uses the randomized algorithm for enumerating the
\(k\) -sets. The distribution
\(\mathcal {D}\) was set to
uniformity for rand-
\(k\) -set, which essentially says that we aimed to capture all weight vectors, instead of biasing toward particular vectors. The parameter
\(\delta\) for rand-
\(k\) -set was set to 0.01.
•
space-partitioning (2D, MD): The space-partitioning algorithm works based on the rank sum lemma proposed in Section
7.1. As will be shown in the experiments, this algorithm is expected to perform well as long as
\(k\) is not excessively small.
We preceded each of the above methods with a preprocessing step to shrink the input
\(P\) of Problem
1. As defined in Section
3, an object
\(o\) dominates another one
\(o^{\prime }\) if
\(o[i] \ge o^{\prime }[i]\) for all
\(i \in [1, d]\) . The
\(k\) -skyband of
\(P\) includes every object
\(o \in P\) that is dominated by at most
\(k-1\) other objects in
\(P\) . The
\(k\) -set of any weight vector must be fully contained in the
\(k\) -skyband [
53]. Therefore, as opposed to
\(P\) itself, we can solve Problem
1 on its
\(k\) -skyband instead, which is usually much smaller. For any fixed dimensionality
\(d\) , the
\(k\) -skyband can be found in
\(\tilde{O}(n)\) time [
58].
Evaluation measurements. We will evaluate the algorithms using three measures: (i) time, (ii) representative size, and (iii) rank-regret. Time evaluates the efficiency of an algorithm, while representative size and rank-regret measures evaluate how effective the algorithm is in finding good and compact representatives.
Default values. In every experiment, we vary one parameter while fixing the other parameters to the following default values: \(k=8\) , \(d=4\) , and \(n=116,\!300\) for BN and \(457,\!892\) for DoT.
8.2 Performance Evaluation
Having provided the proof of concept, we proceed to evaluate the performance of our algorithms under different settings.
Number of extreme points. As mentioned in Section
1, the 1-rank-regret representative of a dataset comprises the points on the boundary of the convex hull, i.e., the extreme points, which guarantee to contain the top choice of any linear ranking function. However, the number of extreme points can be very large. Table
2 shows the number of extreme points in the BN and DoT datasets at various dimensionalities. As will become evident in the upcoming experiments, the number of extreme points is usually several times the size of the rank-regret representative we find. This further strengthens our motivation and supports the necessity of developing efficient algorithms for discovering (small) rank-regret representatives.
2D, varying \(k\) . Henceforth, we will follow the paradigm explained in Section
8.1, namely, in every experiment, we will study the impact of one parameter, while fixing the other parameters to their default values.
The value of
\(k\) greatly impacts the ability to reduce the size of the rank-regret representative. For example, when
\(k=1\) , all the items on the boundary of the convex hull appear in the representative. As the value of
\(k\) increases, it gives us the freedom to have more choices for every ranking function, hence more opportunity to find items that cover large portions of the ranking functions. Besides the size of the representative, the running time of the algorithms may also depend on the choice of
\(k\) . Therefore, as the first 2D experiment, we vary the value of
\(k\) while fixing other parameters to their default values. The results are provided in Figures
17 to
22.
Figures
17 and
18 show the time taken by each algorithm to find a representative. First, one can observe that the
\(k\) -set algorithm was significantly slower than all other algorithms and its running time rapidly increased as the value of
\(k\) increased. The reason for the algorithm’s bad running time is that it requires enumerating all the
\(k\) -sets before solving a hitting set problem. Therefore, the running time of the algorithm significantly depends on the number of
\(k\) -sets. The number of
\(k\) -sets, however, depends on the value of
\(k\) . As observed in the experiments on both the BN and DoT datasets, the increase in the value of
\(k\) resulted in a significant increase in the number of
\(k\) -sets, causing the poor running time of the algorithm. Even though the rest of the algorithms did not have running times as bad as
\(k\) -set, still rand-
\(k\) -set had a worse running time and it got worse as
\(k\) increased. The main reason why rand-
\(k\) -set outperformed
\(k\) -set in the running time is that, compared to the graph enumeration approach for finding the
\(k\) -sets, rand-
\(k\) -set used a more efficient randomized algorithm for the same purpose. We also note that, at least theoretically, rand-
\(k\) -set may miss the
\(k\) -sets of certain weight vectors, resulting in a potentially smaller number of
\(k\) -sets in some cases. Among other algorithms, the exact 2D algorithm, even though initially fast, took noticeably more time than the others as
\(k\) increased. Net-extreme-skyline (labeled
nes in the legend) had a stable running time (but not the fastest) for different values of
\(k\) . The shallow-cutting and space-partitioning algorithms (labeled
sc and
sp in the legend) had similar running time and both were significantly faster than all other algorithms across all values of
\(k\) . We note that shallow-cutting is an algorithm specifically designed for 2D, while space-partitioning works for arbitrary dimensionalities.
Figures
19 and
20 show the size of the output (representative set) found by each algorithm, while Figures
21 and
22 show the rank-regret of the output. Please note that the exact algorithm guarantees to the optimal set (i.e., minimum size), while the output of the other algorithms is approximate. Among the approximation algorithms, net-extreme-skyline consistently returned the largest sets but its output always satisfied the rank-regret of
\(k\) . The outputs of all other algorithms were very close to optimality, a strong indication that they are effective in finding compact sets. The
\(k\) -set and space partitioning algorithms always guarantee the rank-regret of
\(k\) , rand-
\(k\) -set and net-extreme-skyline ensure the guarantee with very high probability, and shallow cutting was parameterized for a 3-approximate assurance on rank-regret. By comparing the exact algorithm with all other algorithms in Figures
21 and
22, one can notice that, interestingly, except shallow-cutting, the output of all algorithms satisfied the rank-regret of
\(k\) . In fact, the same is nearly true for shallow-cutting whose rank regret was always bounded by
\(k\) , except in a single case (BN,
\(k = 64\) ).
2D, varying the dataset size ( \(n\) ). Rank regret representatives are compact representatives that are intended to be significantly smaller than the dataset size. The connection to
\(\epsilon\) -nets (Section
3) provides an upper-bound on the size of the representative set that, however, needs to be
\(1/k\) of the original dataset. Therefore, our earlier results in Figures
19 and
20 suggest that traditional sampling approaches for finding an
\(\epsilon\) -net are not necessarily effective in practical scenarios. Our objective is to find the minimal set that satisfies the rank regret constraint. Recall that Section
3 also gave a lower-bound
\(n/k\) on the size of any
\(\epsilon\) -net in the worst case. Despite this negative result, Figures
19 and
20 indicate that this lower bound can be excessively pessimistic on real data. To further demonstrate these phenomena, in the next experiment, we varied the dataset size, while observing the algorithms performance, rank-regret, and the output size for each algorithm and setting.
The results are provided in Figures
23 to
28. For every dataset, we controlled
\(n\) by randomly selecting 20%, 40%, 60%, 80%, and 100% of the data. First, as in Figures
23 and
24, the running time of the algorithms was stable as the value of
\(n\) increased. Among different algorithms,
\(k\) -set had the longest running time and shallow-cutting had the least. Figures
25 and
26 show the output size for the BN and DoT datasets, while Figures
27 and
28 show the rank-regret of the generated results obtained by different algorithms. Similarly to the previous experiments, the output of net-extreme-skyline had the maximum size, while the others were close to the optimum (the output size of the exact algorithm). Furthermore, all algorithms returned representatives achieving a rank-regret of
\(k\) , except shallow-cutting, which guaranteed 3-approximation. These observations imply that all algorithms except shallow-cutting found near-optimal solutions. A perhaps more important observation is that even though the theoretical lower bound from the
\(\epsilon\) -net interpretation suggests that the output size should be only a constant factor smaller than the dataset (recall
\(k = 8\) , the default value, here), in practice this number may be only a handful and hardly increase with
\(n\) .
MD, varying \(k\) . After evaluating the 2D solutions, we now turn our attention to MD where
\(d\ge 3\) . In the upcoming experiments, we study the impact of varying the value of
\(k\) on the performance of different MD algorithms. Figures
29 to
34 show the results across different settings for the BN and DoT datasets.
First, looking at the running time of the algorithms in Figures
29 and
30, net-extreme-skyline was the fastest across different cases, while
\(k\) -set did not scale well with
\(k\) . An interesting observation, however, is that while the running time of
\(k\) -set and rand-
\(k\) -set monotonically
increased with
\(k\) , that of space-partitioning actually
decreased as
\(k\) went up. The reason for the increase in the running time of
\(k\) -set and rand-
\(k\) -set is that (assuming
\(k\lt n/2\) ) the number of
\(k\) -sets escalates as
\(k\) increases. This forces both algorithms to spend more time enumerating the
\(k\) -sets and solving the hitting set problem. For larger
\(k\) , however, the space-partitioning algorithm finds more opportunity to prune the search space, simply because it essentially looks for common elements in larger sets, which are the top-
\(k\) results (which are supersets of the results of smaller
\(k\) ). These observations together indicate that (rand-)
\(k\) -set and space partitioning are
complimentary algorithms for finding rank-regret representatives in different settings.
Next, we studied the output size (Figures
31 and
32) and rank-regret (Figures
33 and
34) for the BN and DoT datasets. Recall that, in theory, the space partitioning and
\(k\) -set algorithms guarantee the rank-regret of
\(k\) , while rand-
\(k\) -set and net-extreme-skyline guarantee the same with very high probability. However, in all settings across the two datasets, every algorithm managed to find
\(k\) -rank representatives. The net-extreme-skyline algorithm, in spite of being fast, failed to find compact representatives, especially as
\(k\) increases. The rand-
\(k\) -set and
\(k\) -set algorithms generated the smallest outputs, and their representative sizes decreased as
\(k\) increased. In particular, for all settings with
\(k \gt 10\) in both datasets the output size was always less than 10, fully echoing the motivation of rank-regret representatives.
MD, varying the number \(d\) of dimensions. In this experiment, we evaluate different MD algorithms for different values of
\(d\) . The results are provided in Figures
35 to
40.
Let us first look into the running time of the algorithms across different settings (Figures
35 and
36). The
\(k\) -set algorithm failed to scale beyond four dimensions, because the exact (graph-traversal) algorithm in Section
5.6 for enumerating the
\(k\) -sets did not finish within the time budget (20,000 seconds). In contrast, the rand-
\(k\) -set algorithm (being efficient in finding the
\(k\) -sets) scaled much better with respect to
\(d\) . The time performance of the space-partitioning algorithm worsened as
\(d\) became larger, due to the curse of dimensionality, i.e., significant enlargement in the search space. The net-extreme-skyline had the best time performance but, as discussed next, it failed to find compact representatives. Figures
37 and
38 show the output size, and Figures
39 and
40 show the rank-regret of the generated output for the BN and DoT datasets. Similar to the previous experiments, all algorithms were able to ensure the rank-regret of
\(k=8\) across different settings. Evidently, the representative sets of net-extreme-skyline were fairly large, especially as
\(d\) increased, while (rand-)
\(k\) -set managed to secure small representatives in all cases.
MD, varying the dataset size \(n\) . Finally, we conclude our experiments by studying the impact of varying the dataset size on the performance of different algorithms. To do so, similar to the corresponding 2D experiments, we selected 20% to 100% of the BN and DoT datasets for the default values of
\(k = 8\) and
\(d = 4\) . The results are provided in Figures
41 to
46.
Looking at Figures
41 and
42, one can see that, even as the value of
\(n\) increased, all algorithms had a stable running time for all values of
\(n\) . Also, looking at Figures
45 and
46, one can see that the output of all algorithms satisfied the rank-regret requirement (
\(k=8\) ) in all settings, as is consistent with the previous experiments. As shown in Figures
43 and
44, the output of net-extreme-skyline was the largest, while (rand-)
\(k\) -set could find representatives with size around 10 in all the scenarios.