On the Complexity of Package Recommendation Problems
Ting Deng
Wenfei Fan
Floris Geerts
School of Computer Science
and Engineering
Beihang University
Beijing, China
Lab. for Foundations of
Computer Science
School of Informatics
University of Edinburgh, UK
Dept. of Mathematics &
Computer Science
University of Antwerp
Antwerp, Belgium
wenfei@inf.ed.ac.uk
floris.geerts@ua.ac.be
dengting@act.buaa.edu.cn
ABSTRACT
1. INTRODUCTION
Recommendation systems aim to recommend items that are
likely to be of interest to users. This paper investigates
several issues fundamental to such systems.
(1) We model recommendation systems for packages of
items. We use queries to specify multi-criteria for item selections and express compatibility constraints on items in a
package, and use functions to compute the cost and usefulness of items to a user.
(2) We study recommendations of points of interest, to
suggest top-k packages. We also investigate recommendations of top-k items, as a special case. In addition, when
sensible suggestions cannot be found, we propose query relaxation recommendations to help users revise their selection
criteria, or adjustment recommendations to guide vendors to
modify their item collections.
(3) We identify several problems, to decide whether a set
of packages makes a top-k recommendation, whether a rating bound is maximum for selecting top-k packages, whether
we can relax the selection query to find packages that users
want, and whether we can update a bounded number of
items such that the users’ requirements can be satisfied. We
also study function problems for computing top-k packages,
and counting problems to find how many packages meet the
user’s criteria.
(4) We establish the upper and lower bounds of these problems, all matching, for combined and data complexity. These
results reveal the impact of variable sizes of packages, the
presence of compatibility constraints, as well as a variety of
query languages for specifying selection criteria and compatibility constraints, on the analyses of these problems.
Recommendation systems, a.k.a. recommender systems,
recommendation engines or platforms, aim to identify and
suggest information items or social elements that are likely
to be of interest to users. Traditional recommendation systems are to select top-k items from a collection of items, e.g.,
books, music, news, Web sites and research papers [3], which
satisfy certain criteria identified for a user, and are ranked
by ratings with a utility function. More recently recommendation systems are often used to find top-k packages, i.e.,
sets of items, such as travel plans [34], teams of players [22]
and various course combinations [19, 26, 27]. The items in
a package are required not only to meet multi-criteria for
selecting individual items, but also to satisfy compatibility
constraints defined on all the items in a package taken together, such as team formation [22] and course prerequisites
[26]. Packages may have variable sizes subject to a cost budget, and are ranked by overall ratings of their items [34].
Recommendation systems are increasingly becoming an
integral part of Web services [34], Web search [4], social networks [4], education software [27] and commerce services [3].
A number of systems have been developed for recommending items or packages, known as points of interest (POI) [34]
(see [3, 4] for surveys). These systems use relational queries
to specify selection criteria and compatibility constraints [2,
7, 19, 27, 34]. There has also been work on the complexity
of computing POI recommendations [22, 26, 27, 34].
However, to understand central issues associated with
recommendation systems, there is much more to be done.
(1) The previous complexity results were developed for
individual applications with specific selection criteria and
compatibility constraints. They may not carry over to other
settings. This highlights the need for studying recommendation problems in a uniform model. (2) In most cases only
lower bounds were given (NP-hard by e.g., [26, 27]). Worse
still, among the few upper bounds claimed, some are not
quite correct. It is necessary to set the record straight by
establishing matching upper and lower bounds. (3) No previous work has studied where high complexity arises. Is it
from variable sizes of packages, compatibility constraints or
from complex selection criteria? The need for understanding
this is evident when developing practical recommendation
systems. (4) In practice one often gets no sensible recommendations. When this happens, a system should be able to
come up with recommendations for the users to revise selection criteria, or for vendors to adjust their item collections.
However, no matter how important these issues are, no
previous work has studied recommendations beyond POI.
Categories and Subject Descriptors: H.3.5 [Information Systems]: Online Information Services – Web-based
services; F.2.0 [Theory of Computation]: Analysis of Algorithms and Problem Complexity – General
Keywords:
Recommendation problems,
Query relaxation.
Complexity,
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
PODS’12, May 21–23, 2012, Scottsdale, Arizona, USA.
Copyright 2012 ACM 978-1-4503-1248-6/12/05 ...$10.00.
Example 1.1: Consider a recommendation system for
travel plans, which maintains two relations specified by:
flight(f#, From, To, DT, DD, AT, AD, Pr),
POI(name, city, type, ticket, time).
Here a flight tuple specifies flight f# from From to To that
departs at time DT on date DD and arrives at time AT
on date AD, with airfare Pr. A POI tuple specifies a place
name to visit in the city, its ticket price, type (e.g., museum,
theater), and the amount of time needed for the visit.
(1) Recommendations of items. A user wants to find top-3
flights from edi to nyc with at most one stop, departing
on 1/1/2012, with lowest possible airfare and duration time.
This can be stated as item recommendation: (a) flights are
items; (b) the selection criteria are expressed as a union
Q1∪Q2 of conjunctive queries, where Q1 and Q2 select direct
and one-stop flights from edi to nyc leaving on 1/1/2012, respectively; and (c) the items selected are ranked by a utility
function f (): given an item s, f (s) is a real number computed from the airfare Pr and the duration Dur of s such
that the higher the Pr and Dur are, the lower the rating of s
is. Here Dur can be derived from DT, DD, AT and AD, and
f () may associate different weights with Pr and Dur.
(2) Recommendations of packages. One is planing a 5-day
holiday, by taking a direct flight from edi to nyc departing
on 1/1/2012 and visiting as many places in nyc as possible.
In addition, she does not want to have more than 2 museums
in a package, which is a compatibility constraint [34]. Moreover, she wants the plans to have the lowest overall price.
This is an example of package recommendations: (a) the
selection criteria are expressed as the following conjunctive
query (CQ) Q, which finds pairs of flights and POI as items:
Q(f#, Pr, name, type, ticket, time) = ∃ DT, AT, AD, xTo
flight(f#, edi, xTo , DT, 1/1/2012, AT, AD, Pr) ∧
POI(name, xTo , type, ticket, time) ∧ xTo = nyc ;
(b) a package N consists of some items that have the same
f# (and hence Pr); (c) the rating of N , denoted by val(N ),
is a real number such that the higher the sum of the Pr and
ticket prices of the items in N is, the lower val(N ) is; (d)
the compatibility constraint can be expressed as a CQ query
Qc such that Qc (N ) = ∅ iff the requirement is satisfied (see
Section 2); and (e) the cost of N , denoted by cost(N ), is
the total time taken on visiting all POI in N . Note that
the number of items in N is not fixed: N may contain as
many POI as possible, as long as cost(N ) does not exceed the
total time allocated for sightseeing in 5 days. Putting these
together, the travel planning is to find top-k such packages
ranked by val(N ), for a constant k chosen by the user.
(3) Computational complexity. To develop a recommendation system, one naturally wants to know the complexity for
computing top-k packages or top-k items. The complexity
may depend on what query language we use to specify selection criteria and compatibility constraints. For instance,
in the package recommendation example given above, the
criteria and constraints are expressed as CQ queries. Suppose that the user can bear with indirect flights with an
unlimited number of stops. Then we need to express the
selection criteria in, e.g., DATALOG, which is more costly
to evaluate than the CQ queries. What is the complexity
of package recommendations when criteria and constraints
are expressed in various languages? Will the complexity be
lower if compatibility constraints are absent? Will it make
our lives easier if we fix the size of each package? To the best
of our knowledge, these questions have not been answered.
(4) Query relaxation recommendations. One may not get
any direct flight from edi to nyc. Nevertheless, if we relax
the CQ Q given above by, e.g., allowing To to be a city
within 15 miles of nyc, then direct flights are available, e.g.,
from edi to ewr. This suggests that we help the user revise
her selection criteria by recommending query relaxations.
(5) Adjustment recommendations. The collection of POI in
the system may consist of museums only, which users may
not want to see too many, as indicated by the compatibility
constraint Qc above. This motivates us to study adjustment recommendations, by recommending the vendor of the
system to include, e.g., theaters, in their POI collection. ✷
These highlight the need for a full treatment of recommendation problems, to study them in a generic model, establish
their matching upper and lower bounds, and identify where
the complexity arises. Moreover, analogous to POI recommendations, query relaxation recommendations and adjustment recommendations should logically be part of a practical
system, and hence, deserve to be investigated.
A model for package recommendations. Following [2,
7, 19, 26, 27, 34] we consider a database D that includes
items in a recommendation system. We specify (a) multicriteria for selecting items as a relational query Q; (b) compatibility constraints on the items in a package N as another query Qc such that Qc (N, D) = ∅ iff N satisfies the
constraints; (c) a rating function val() from packages to real
numbers R such that val(N ) assesses the usefulness of a package N to a user; and (d) a cost budget C and a function
cost() from packages to R such that a package N is a “valid”
choice iff cost(N ) ≤ C. Given a constant k, package recommendation is to find top-k packages based on val() such that
each package consists of items selected by Q and satisfies
the constraints Qc . As shown in Example 1.1, packages may
have variable sizes: we want to maximize val(N ) as long as
cost(N ) does not exceed the budget C.
Traditional item recommendations are a special case. We
use a utility function f () that gives a rating in R to each
tuple in Q(D). For a given k, it is to find top-k items that
meet the criteria specified by Q, ranked by the function f ().
This yields a model for top-k package querying that subsumes previous models studied for, e.g., travel and course
recommendations. We study recommendation problems in
a generic setting when selection criteria and compatibility
constraints are expressed as queries, and when cost(), val()
and f () are only assumed to be computable in PTIME.
Recommendation problems. We identify several problems for POI recommendations. (a) Decision problems: Is a
set of packages a top-k recommendation? Is a constant B
the largest bound such that there exists a top-k recommendation in which each package is rated above B? (b) Function
problem: find a top-k recommendation if there exists one.
(c) Counting problem: how many valid packages are there
that have ratings above a bound B?
Beyond POI recommendations, we propose to study the
following features that future recommendater systems could
support. (a) Query relaxation recommendations: Can we
find a “minimum” relaxation of the users’ selection criteria
Q to allow a top-k recommendation? (b) Adjustment recommendations: Can we update a bounded number of items
in D such that the users’ requirements can be satisfied? We
parameterize each of these problems with various query languages LQ in which selection criteria Q and compatibility
constraints Qc are expressed. We consider the following LQ ,
all with built-in predicates =, =, <, ≤, >, ≥:
•
•
•
•
•
•
conjunctive queries (CQ),
union of conjunctive queries (UCQ),
positive existential FO queries (∃FO+ ),
nonrecursive datalog queries (DATALOGnr ),
first-order queries (FO), and
datalog (DATALOG).
Complexity results. For all these problems, we establish
its combined complexity and data complexity. We also study
special cases of package recommendations, such as when
compatibility constraints are absent, when packages have a
fixed size, and when both conditions are imposed (item recommendations). We provide their upper and lower bounds,
all matching, for all the query languages given above.
These results give a complete characterization of the complexity in this model, from decision problems to function and
counting problems. They tell us where complexity arises,
complementing previously stated results.
(a) Query languages dominate the complexity of recommendation problems, e.g., the problem for deciding the maximum bound for top-k package recommendations ranges
from Dp2 -complete for CQ, PSPACE-complete for FO and
DATALOGnr , to EXPTIME-complete for DATALOG.
(b) Variable package sizes do not make our lives harder when
combined complexity is concerned for all the languages given
above. Indeed, when packages may have variable sizes, all
these problems have the same combined complexity as their
counterparts when packages are restricted to be singleton
sets. In fact, variable sizes of packages have impact only
on data complexity, or when LQ is a simple language with
a PTIME complexity for its membership problem. These
clarify the impact of package sizes studied in, e.g., [34].
(c) The presence of compatibility constraints does not increase the complexity when the query language LQ is FO,
DATALOGnr or DATALOG. Indeed, for these languages, all
the problems for package recommendations and their counterparts for item recommendations have the same complexity. Moreover, these constraints do not complicate the data
complexity analyses. However, compatibility constraints increase combined complexity when LQ is contained in ∃FO+ .
(d) In the absence of compatibility constraints, the decision
problem for top-k package recommendations is DP-complete
and its function problem is FPNP -complete when LQ is
CQ. They are coNP-hard and FPNP -hard, respectively, even
when selection criteria are given by an identity query. These
give precise bounds for the problems studied in, e.g., [34].
These results are also of interest to the study of top-k query
answering, among other things. A variety of techniques are
used to prove the results, including a wide range of reductions, and constructive proofs with algorithms (e.g., for the
function problems). In particular, the proofs demonstrate
that the complexity of these problems for CQ, UCQ and
∃FO+ is inherent to top-k package querying itself, rather than
a consequence of the complexity of the query languages.
Related work. Traditional recommendation systems aim
to find, for each user, items that maximize the user’s utility
(see, e.g., [3] for a survey). Selection criteria are decided by
content-based, collaborative and hybrid approaches, which
consider preferences of each user in isolation, or preferences
of similar users [3]. The prior work has mostly focused on
how to choose appropriate utility functions, and how to extrapolate such functions when they are not defined on the
entire item space, by deriving unknown values from known
ones. Our model supports content-based, collaborative and
hybrid criteria in terms of various queries. We assume a
given utility function that is total, and focus on the computational complexity of recommendation problems.
Recently recommendation systems have been extended to
finding packages, which are presented to the user in a ranked
order based on some rating function [6, 22, 26, 27, 34]. A
number of algorithms have been developed for recommending packages of a fixed size [6, 22] or variable sizes [26, 27,
34]. Compatibility constraints [22, 26, 27, 34] and budget restrictions [34] on packages have also been studied. Instead of
considering domain-specific applications, we model recommendations of both items and packages (fixed size or polynomial size) by specifying general selection criteria and compatibility constraints as queries, and supporting aggregate
constraints defined with cost budgets and rating bounds.
Several decision problems for course package recommendations have been shown NP-hard [26, 27]. It was claimed
that problems of forming a team with compatibility constraints [22] and the problem of finding packages that satisfy
some budget restrictions (without compatibility constraints)
[34] are NP-complete. In contrast, we establish the precise
complexity of a variety of problems associated with POI recommendations (Table 1, Section 7). Moreover, we provide
the complexity of query relaxation and adjustment recommendations, which have not been studied by prior work.
There has also been a host of work on recommending items
and packages taken from views of the data [2, 7, 19, 23, 34].
Such views are expressed as relational queries, representing
preferences or points of interest [2, 7, 19]. Here recommendations often correspond to top-k query answers. Indeed,
top-k query answering retrieves the k-items (tuples) from a
query result that are top-ranked by some scoring function
[16]. Such queries either simply select tuples, or join and
aggregate multiple inputs to find the top-k tuples, by possibly incorporating user preference information [19, 29]. A
number of top-k query evaluation algorithms have been developed (e.g., [12, 23, 28]; see [16] for a survey), as well as
algorithms for incremental computation of ranked query results [10, 14, 24] that retrieve the top-k query answers one at
a time. A central issue there concerns how to combine different ratings of the same item based on multiple criteria. Our
work also retrieves tuples from the result of a query. It differs from the previous work in the following. (1) In contrast
to top-k query answering, we are to find items and sets of
items (packages) provided that a utility or rating function is
given. (2) We focus on the complexity of recommendations
problems rather than the efficiency or optimization of query
evaluation. (3) Beyond recommendations of POI, we also
study query relaxation and adjustment recommendations.
Query relaxations have been studied in, e.g., [8, 13, 17,
18]. Several query generalization rules are introduced in [8],
assuming that query acceptance conditions are monotonic.
Heuristic query relaxation algorithms are developed in [13,
17]. The topic is also studied for top-k query answering [18].
We focus on the main idea of query relaxation recommendations, and borrow query generalization rules from [8]. We
consider acceptance conditions (i.e., rating functions, compatibility constraints and aggregate constraints) that are not
necessarily monotonic. Moreover, none of the previous work
supports queries beyond CQ, while we consider more powerful languages such as FO and DATALOG. In addition, the
prior work focuses on the design of efficient relaxation algorithms, but does not study computational complexity.
Organization. Section 2 introduces the model for package
recommendations. Section 3 formulates and studies fundamental problems in connection with POI recommendations,
followed by special cases in Section 4. Query relaxation recommendations are studied in Section 5, followed by adjustment recommendations in Section 6. Section 7 summarizes
the main results of the paper and identifies open issues.
2.
MODELING RECOMMENDATIONS
We first specify recommendations of packages and items.
We then review query languages considered in this work.
Item collections. Following [2, 7, 19, 26, 27, 34], we assume a database D consisting of items for selection. The
database is specified with a relational schema R, with a collection of relation schemas (R1 , . . . , Rn ). Each schema Ri is
defined over a fixed set of attributes. For each attribute A
in Ri , its domain is specified in Ri , denoted as dom(Ri .A).
Package recommendations. As remarked earlier, in practice one often wants packages of items, e.g., combinations
of courses to be taken to satisfy the requirements for a degree [27], travel plans including multiple POI [34], and teams
of experts [22]. Package recommendation is to find top-k
packages such that the items in each package (a) meet the
selection criteria, (b) satisfy some compatibility constraints,
i.e., they have no conflicts, and moreover, (c) their ratings
and costs satisfy certain aggregate constraints. To specify
these, we extend the models proposed in [27, 34] as follows.
Selection criteria. We use a query Q in a query language
LQ to specify multi-criteria for selecting items from D. For
instance, as shown in Example 1.1, we use a query to specify
what flights and sites a user wants to find.
Compatibility constraints. To specify the compatibility constraints for a package N , we use a query Qc such that N
satisfies Qc iff Qc (N, D) = ∅. That is, Qc identifies inconsistencies among items in N . In Example 1.1, to assert “no
more than 2 museums” in a travel package N [34], we use
the following Qc that selects 3 distinct museums from N :
Qc () = ∃ f#, Pr, n1 , t1 , p1 , n2 , t2 , p2 , n3 , t3 , p3
RQ (f#, Pr, n1 , museum, p1 , t1 ) ∧
RQ (f#, Pr, n2 , museum, p2 , t2 ) ∧
RQ (f#, Pr, n3 , museum, p3 , t3 ) ∧
n1 = n2 ∧ n1 = n3 ∧ n2 = n3 ,
where RQ denotes the schema of the query answer Q(D).
As another example, for a course package N , we use a query
Qc to assure that for each course in N , its prerequisites are
also included in N [26]. This query needs to access not only
courses in N but also the prerequisite relation stored in D.
To simplify the discussion, we assume that query Qc
for specifying compatibility constraints and query Q for
specifying selection criteria are in the same language LQ . If
a system supports compatibility constraints in LQ , there is
no reason for not supporting queries in the same language
for selecting items. We defer to future work the study in the
setting when Qc and Q are expressed in different languages.
Note that queries in various query languages are capable
of expressing compatibility constraints commonly found in
practice, including those studied in [19, 22, 26, 27, 34].
Aggregate constraints. To specify aggregate constraints, we
define a cost function and a rating function over packages,
following [34]: (1) cost(N ) computes a value in R as the cost
of package N ; and (2) val(N ) computes a value in R as the
overall rating of N . For instance, cost(N ) in Example 1.1 is
computed from the total time taken for visiting POI, while
val(N ) is defined in terms of airfare and total ticket prices.
We just assume that cost() and val() are PTIME computable aggregate functions, defined in terms of e.g., max,
min, sum, avg, as commonly found in practice.
We also assume a cost budget C, and specify an aggregate
constraint cost(N ) ≤ C. For instance, the cost budget C in
Example 1.1 is the total time allowed for visiting POI in 5
days, and the aggregate constraint cost(N ) ≤ C imposes a
bound on the number of POI in a package N .
Top-k package selections. For a database D, queries Q and
Qc in LQ , a natural number k ≥ 1, a cost budget C, and
functions
cost() and
val(), a top-k package selection is a set
N = Ni |i ∈ [1, k] of packages such that for each i ∈ [1, k],
(1) Ni ⊆ Q(D), i.e., its items meet the criteria given in Q;
(2) Qc (Ni , D) = ∅, i.e., the items in the package satisfy the
compatibility constraints specified by query Qc ;
(3) cost(Ni ) ≤ C, i.e., its cost is below the budget;
(4) the number |Ni | of items in Ni is no larger than p(|D|),
where p is a predefined polynomial and |D| is the size of D;
indeed, it is not of much practical use to find a package with
exponentially many items; as will be seen in Section 4, we
shall also consider a constant bound Bp for |Ni |;
(5) for all packages N ′ ∈ N that satisfies conditions (1–4)
given above, val(N ′ ) ≤ val(Ni ), i.e., packages in N have the
k highest overall ratings among all feasible packages; and
(6) Ni = Nj if i = j, i.e., the packages are pairwise distinct.
Note that packages in N may have variable sizes. That
is, the number of items in each package is not bounded by
a constant. We just require that Ni satisfies the constraint
cost(Ni ) ≤ C and |Ni | does not exceed a polynomial in |D|.
Package recommendation is to find a top-k package selection for (Q, D, Qc , cost(), val(), C), if there exists one.
As shown in Example 1.1, users may want to find, e.g., a
top-k travel-plan selection with the minimum price.
Item recommendations. To rank items, we use a utility
function f () to measure the usefulness of items selected by
Q(D) to a user [3]. It is a PTIME-computable function that
takes a tuple s from Q(D) and returns a real number f (s) as
the rating of item s. The functions may incorporate users’
preference [29], and may be different for different users.
Given a constant k≥ 1, a top-k selection for (Q, D, f ) is a
set S = si | i ∈ [1, k] such that (a) S ⊆ Q(D), i.e., items in
S satisfy the criteria specified by Q; (b) for all s ∈ Q(D)\S
and i ∈ [1, k], f (s) ≤ f (si ), i.e., items in S have the highest
ratings; and (c) si = sj if i = j, i.e., items in S are distinct.
Given D, Q, f and k, item recommendation is to find a
top-k selection for (Q, D, f ) if there exists one.
For instance, a top-3 item selection is described in Example 1.1, where items are flights and the utility function f ()
is defined in terms of the airfare and duration of each flight.
The connection between item and package selections. Item
selections are a special
selections. Indeed,
case of package
a top-k selection S = si | i ∈ [1, k] for (Q, D, f ) is a topk package
selection N for (Q, D, Qc , cost(), val(), C), where
N = Ni | i ∈ [1, k] , and for each i ∈ [1, k], (a) Ni = {si },
(b) Qc is a constant query that returns ∅ on any input, referred to as the empty query; (c) cost(Ni ) = |Ni | if Ni = ∅, and
cost(∅) = ∞; that is, cost(Ni ) counts the number of items in
Ni if Ni = ∅, and the empty set is not taken as a recommendation; (d) the cost budget C = 1, and hence, Ni consists of
a single item by cost(Ni ) ≤ C; and (e) val(Ni ) = f (si ).
In the sequel, we use top-k package selection specified in
terms of (Q, D, f ) to refer to a top-k selection S for (Q, D, f ),
i.e., a top-k package selection for (Q, D, Qc , cost(), val(), C)
in which Qc , cost(), val() and C are defined as above.
We say that compatibility constraints are absent if Qc is
the empty query; e.g., Qc is absent in item selections.
One might want to consider general PTIME compatibility
constraints Qc . As will be seen in Section 4, the complexity
when Qc is in PTIME remains the same as its counterpart
when Qc is absent for all the problems studied in this paper.
Query languages. We consider Q, Qc in a query language
LQ , ranging over the following (see e.g., [1] for details):
(a) conjunctive queries (CQ), built up from atomic formulas
with constants and variables, i.e., relation atoms in database
schema R and built-in predicates (=, =, <, ≤, >, ≥), by closing under conjunction ∧ and existential quantification ∃;
(b) union of conjunctive queries (UCQ) of the form Q1 ∪· · ·∪
Qr , where for each i ∈ [1, r], Qi is in CQ;
(c) positive existential FO queries (∃FO+ ), built from atomic
formulas by closing under ∧, disjunction ∨ and ∃;
(d) nonrecursive datalog queries (DATALOGnr ), defined as
a collection of rules of the form p(x̄) ← p1 (x̄1 ), . . . , pn (x̄n ),
where the head p is an IDB predicate and each pi is either
an atomic formula or an IDB predicate, such that its dependency graph is acyclic; the dependency graph of a DATALOG
query Q is a directed graph GQ = (V, E), where V includes
all the predicates of Q, and (p′ , p) is an edge in E iff p′ is a
predicate that appears in a rule with p as its head [9];
(e) first-order logic queries (FO) built from atomic formulas
using ∧, ∨, negation ¬, ∃ and universal quantification ∀; and
(f) datalog queries (DATALOG), defined as a collection of
rules p(x̄) ← p1 (x̄1 ), . . . , pn (x̄n ), for which the dependency
graph may possibly be cyclic, i.e., DATALOG is an extension
of DATALOGnr with an inflational fixpoint operator.
These languages specify both multi-criteria for item selections and compatibility constraints for package selections.
3.
RECOMMENDATIONS OF POI’S
In this section we investigate POI recommendations. We
identify four problems for package recommendations (Section 3.1), and establish their complexity (Section 3.2).
3.1 Recommendation Problems
We investigate four problems, stated as follows, which are
fundamental to computing package recommendations. We
start with a decision problem for package selections. Con-
sider a database D, queries Q and Qc in a query language
LQ , functions val() and cost(), a cost budget C, and a natural number k ≥ 1. Given a set N consisting of k packages, it is to decide whether N makes a top-k package selection. That is, each package N in N satisfies the selection
criteria Q, compatibility constraint Qc , and aggregate constraints cost(N ) ≤ C and val(N ) ≥ val(N ′ ) for all N ′ ∈ N . As
remarked earlier, we assume a predefined polynomial such
that |N | ≤ p(|D|) (omitted from the problem statement below for simplicity). Intuitively, this problem is to decide
whether a set N of packages should be recommended.
RPP(LQ ):
INPUT:
The recommendation problem (package).
A database D, two queries Q and Qc
in LQ , two functions cost() and val(),
natural
numbers Cand k ≥ 1, and a set
N = Ni | i ∈ [1, k] .
QUESTION:
Is N a top-k package selection for
(Q, D, Qc , cost(), val(), C)?
After all, recommendation systems have to compute topk packages, rather than expecting that candidate selections
are already in place. This highlights the need for studying
the function problem below, to compute top-k packages.
FRP(LQ ):
INPUT:
The function rec. problem (packages).
D, Q, Qc , cost(), val(), C, k as in RPP.
OUTPUT:
A top-k package selection for (Q, D, Qc ,
cost(), val(), C) if it exists.
The next question concerns how to find a maximum rating
bound for computing top-k packages. We say that a constant B is a rating bound for (Q, D, Qc , cost(),
if
val(), C, k)
(a) there exists a top-k package selection N = Ni | i ∈ [1, k]
for (Q, D, Qc , cost(), val(), C) and moreover, (b) val(Ni ) ≥ B
for each i ∈ [1, k]. That is, B allows a top-k package selection. We say that B is the maximum bound for packages with
(Q, D, Qc , cost(), val(), C, k) if for all bounds B ′ , B ≥ B ′ . Obviously B is unique if it exists. Intuitively, when B is identified, we can capitalize on B to compute top-rated packages.
Furthermore, vendors could decide, e.g., price for certain
items on sale with such a bound, for risk assessment.
MBP(LQ ):
INPUT:
The maximum bound problem (packages).
D, Q, Qc , cost(), val(), C, k as in RPP,
and a natural number B.
QUESTION: Is B the maximum bound for packages
with (Q, D, Qc , cost(), val(), C, k)?
A package N is valid for (Q, D, Qc , cost(), val(), C, B) if
(a) N ⊆ Q(D), (b) Qc (N, D) = ∅, (c) cost(N ) ≤ C, and (d)
val(N ) ≥ B, where |N | is bounded by a polynomial in |D|.
Given B, one naturally wants to know how many valid packages are out there, and hence, can be selected. This suggests
that we study the following counting problem.
CPP(LQ ):
INPUT:
The counting problem (packages).
D, Q, Qc , cost(), val(), C, B as in MBP.
OUTPUT:
The number of packages that are valid for
(Q, D, Qc , cost(), val(), C, B).
3.2 Deciding, Finding and Counting Top-k
Packages
We now establish the complexity of RPP(LQ ), FRP(LQ ),
MBP(LQ ) and CPP(LQ ), including their (1) combined complexity, when the query Q, compatibility constraint Qc and
database D may vary, and (2) data complexity, when only D
varies, while Q and Qc are predefined and fixed. We study
these problems for all the query languages LQ of Section 2.
Deciding package selections. We start with RPP(LQ ).
The result below tells us that the combined complexity of
the problem is mostly determined by what query language
LQ we use to specify selection criteria and compatibility constraints. Indeed, it is Πp2 -complete when LQ is CQ, PSPACEcomplete for DATALOGnr and FO, and it becomes EXPTIMEcomplete when LQ is DATALOG. The data complexity is
coNP-complete for all the languages considered.
Theorem 3.1: For RPP(LQ ), the combined complexity is
• Πp2 -complete when LQ is CQ, UCQ, or ∃FO+ ;
• PSPACE-complete when LQ is DATALOGnr or FO; and
• EXPTIME-complete when LQ is DATALOG.
The data complexity is coNP-complete for all the languages
presented in Section 2, i.e., when LQ is CQ, UCQ, ∃FO+ ,
DATALOGnr , FO or DATALOG.
✷
Proof sketch: (1) We show that RPP is Πp2 -hard for CQ by
reduction from the complement of the compatibility problem.
The latter is to decide whether there exists a valid package
N with val(N ) > B for some bound B. We show that the
compatibility problem is Σp2 -complete for CQ by reduction
from the ∃∗ ∀∗ 3DNF problem, which is Σp2 -complete [30].
For RPP(∃FO+ ), we give a Πp2 algorithm that first tests
whether a given set N of packages satisfies the criteria and
constraints, in DP; it then checks whether there exists no
package with a higher rating than some N ∈ N , in Πp2 .
(2) We show that RPP is PSPACE-hard for DATALOGnr by
reduction from Q3SAT (cf. [25]), and for FO by reduction
from the membership problem for FO (“given a query Q, a
database D and a tuple t, whether t ∈ Q(D)”) [32], which are
PSPACE-complete. We provide an NPSPACE (= PSPACE)
algorithm to check RPP for these two languages.
(3) For DATALOG, we show that RPP is EXPTIME-hard
by reduction from the membership problem for DATALOG,
which is EXPTIME-complete [32]. For the upper bound, we
give an EXPTIME algorithm to check RPP(DATALOG).
(4) For the data complexity, we first show that the compatibility problem is NP-complete when Q and Qc are fixed
CQ queries, by reduction from 3SAT, an NP-complete problem (cf. [25]). From this it follows that RPP(CQ) is already
coNP-hard. We also give a coNP algorithm for RPP when Q
✷
and Qc are fixed queries in either FO or DATALOG.
One might think that the absence of compatibility constraints Qc would make our lives easier. Indeed, RPP(CQ)
becomes DP-complete in the absence of Qc , as opposed to
Πp2 -complete in the presence of Qc . However, when LQ is
powerful enough to express FO or DATALOGnr queries, dropping Qc does not help: RPP(LQ ) in this case has the same
complexity as its counterpart when Qc is present.
Theorem 3.2: In the absence of Qc , RPP(LQ ) is
• DP-complete when LQ is CQ, UCQ, or ∃FO+ ;
• PSPACE-complete when LQ is DATALOGnr or FO; and
• EXPTIME-complete when LQ is DATALOG.
Its data complexity remains coNP-complete for all the query
languages given in Section 2.
✷
Proof sketch: (1) We show that RPP(CQ) is DP-hard by
reduction from SAT-UNSAT. The latter is to decide, given
a pair (ϕ1 , ϕ2 ) of 3SAT instances, whether ϕ1 is satisfiable
and ϕ2 is not satisfiable. It is DP-complete (cf. [25]).
In the absence of Qc , the algorithm given earlier for
RPP(∃FO+ ) is in DP, and hence so is RPP(∃FO+ ).
(2-4) The lower bound proofs (2-4) of Theorem 3.1 do not
use compatibility constraints and hence remain intact. The
upper bounds obviously carry over to this special case. ✷
Computing top-k packages. We give the complexity of
the function problem FRP(LQ ) as follows:
Theorem 3.3: For FRP(LQ ), the combined complexity is
p
• FPΣ2 -complete when LQ is CQ, UCQ or ∃FO+ ;
• FPSPACE(poly)-complete if LQ is DATALOGnr or FO;
• FEXPTIME(poly)-complete when LQ is DATALOG.
In the absence of compatibility constraints, its combined
complexity remains unchanged for DATALOGnr , FO and
DATALOG, but it is FPNP -complete for CQ, UCQ and ∃FO+ .
Its data complexity is FPNP -complete for all the languages,
in the presence or absence of compatibility constraints. ✷
Here FPNP is the class of all functions from strings to
strings that can be computed by a PTIME Turing machine
p
with an NP oracle (cf. [25]), and FPΣ2 is the class of all functions computable by a PTIME 2-alternating max-min Turing
machine [20]. By FPSPACE(poly) (resp. FEXPTIME(poly))
we mean the class of all functions associated with a twoargument predicate RL that satisfies the following conditions: (a) RL is polynomially balanced, i.e., there is a polynomial q such that for all strings x and y, if RL (x, y)
then |y| ≤ q(|x|), and (b) the decision problem “given x and
y, whether RL (x, y)” is in PSPACE (resp. EXPTIME) [21].
Given a string x, the function associated with RL is to find
a string y such that RL (x, y) if such a string exists.
These results tell us that it is nontrivial to find top-k
packages. Indeed, to express compatibility constraints on
travel plans given in [34], we need at least CQ; for course
combination constraints of [19, 26, 27], we need FO; and for
connectivity of flights we need DATALOG. These place FRP
p
in FPΣ2 , FPSPACE(poly) and FEXPTIME(poly), respectively.
It was claimed in several earlier papers that when k = 1,
it is NP-complete to find a top-1 package. Unfortunately, it
is not the case. Indeed, the proofs of Theorems 3.1, 3.2 and
3.3 tell us that when k = 1, the function problem FRP(LQ )
p
remains FPΣ2 -complete and the decision problem RPP(LQ )
p
is Π2 -complete even when LQ is CQ, not to mention more
expressive LQ . Even when Q and Qc are both fixed, FRP is
FPNP -complete and RPP is coNP-complete when k = 1.
In the absence of compatibility constraints, only the analyses of the combined complexity of FRP for CQ, UCQ and
∃FO+ are simplified. This is consistent with Theorem 3.2.
p
Proof sketch: (1) We show that FRP(CQ) is FPΣ2 -hard by
p
reduction from the maximum Σp2 problem, which is FPΣ2 complete [20]. The latter is to find, given a formula ϕ(X) =
∀Y ψ(X, Y ), the truth assignment µlast
X of X that satisfies ϕ
and comes last in the lexicographical ordering if it exists,
p
where ψ is a 3SAT instance. We give an FPΣ2 algorithm for
FRP(∃FO+ ) to compute a top-k package selection if it exists.
(2) We show that FRP(LQ ) is FPSPACE(poly)-hard by reducing to it all functions computable by a PSPACE Turing machine in which the output on the working tape is
bounded by a polynomial, when LQ is DATALOGnr or FO;
similarly for FRP(DATALOG). For the upper bounds, we give
an algorithm in FPSPACE(poly) (resp. FEXPTIME(poly)) for
FRP(LQ ) when LQ is DATALOGnr or FO (resp. DATALOG).
(3) When Q and Qc are fixed, we show that FRP(CQ) in
the absence of Qc is FPNP -hard by reduction from MAXWEIGHT SAT, which is FPNP -complete (cf. [25]). Given a
set C of clauses with weights, MAX-WEIGHT SAT is to find
a truth assignment that satisfies a set of clauses in C with
the most total weight. For the upper bound, we give an
FPNP algorithm for FRP(LQ ) when LQ is FO or DATALOG.
(4) When Qc is absent, FRP(CQ) is FPNP -hard by proof (3)
given above, and the algorithm for FRP(∃FO+ ) given in
proof (1) is now in FPNP . The proofs for DATALOGnr , FO
and DATALOG given in (2) still work in this special case, as
no Qc is used there when verifying the lower bounds.
✷
Deciding the maximum bound. We show that MBP(CQ)
is Dp2 -complete. Here Dp2 is the class of languages recognized
by oracle machines that make a query to a Σp2 oracle and
a query to a Πp2 oracle. That is, L is in Dp2 if there exist
languages L1 ∈ Σp2 and L2 ∈ Πp2 such that L = L1 ∩L2 [33],
analogous to how DP is defined with NP and coNP [25].
When LQ is FO, DATALOGnr or DATALOG, MBP(LQ ) and
RPP(LQ ) have the same complexity. Moreover, the absence
of Qc has the same impact on MBP(LQ ) as on RPP(LQ ).
Theorem 3.4: For MBP(LQ ), the combined complexity is
• Dp2 -complete when LQ is CQ, UCQ or ∃FO+ ;
• PSPACE-complete when LQ is DATALOGnr or FO; and
• EXPTIME-complete when LQ is DATALOG.
When compatibility constraints are absent, its combined
complexity remains unchanged for DATALOGnr , FO and
DATALOG, but it is DP-complete for CQ, UCQ and ∃FO+ .
Its data complexity is DP-complete for all the languages,
in the presence or absence of compatibility constraints. ✷
Proof sketch: (1) We show that MBP(CQ) is Dp2 -hard by
reduction from ∃∗ ∀∗ 3DNF–∀∗ ∃∗ 3CNF. Given a pair (ϕ1 , ϕ2 )
of ∃∗ ∀∗ 3DNF instances, the latter is to decide whether ϕ1 is
true and ϕ2 is false, and is Dp2 -complete [33]. We show that
MBP(∃FO+ ) is in Dp2 by giving an algorithm that makes a
query to a Σp2 oracle and a query to a Πp2 oracle.
(2) For DATALOGnr , FO and DATALOG, the proof for MBP
extends its counterpart in the proofs (2-3) of Theorem 3.1.
(3) When Q and Qc are fixed, we show that MBP(CQ) is
DP-hard by reduction from SAT-UNSAT, and give a DP algorithm for MBP(LQ ) when LQ is FO or DATALOG.
(4) In the absence of Qc , the lower bound proofs given in (3)
for CQ and in (2) for the others remain intact, since no Qc
is used there. The upper bounds given in (2-3) still hold. ✷
Counting valid packages. When it comes to the counting
problem CPP(LQ ), we provide its complexity as follows.
Theorem 3.5: For CPP(LQ ), the combined complexity is
• #·coNP-complete when LQ is CQ, UCQ or ∃FO+ ;
• #·PSPACE-complete when LQ is DATALOGnr or FO;
• #·EXPTIME-complete when LQ is DATALOG.
In the absence of compatibility constraints, its combined
complexity remains unchanged for DATALOGnr , FO and
DATALOG, but it is #·NP-complete for CQ, UCQ and ∃FO+ .
Its data complexity is #·P-complete for all the languages
in the presence or absence of compatibility constraints. ✷
Here we use the framework of predicate-based counting
classes introduced in [15]. For a complexity class C of decision problems, #·C is the class of all counting problems
associated with a predicate RL that satisfies the following
conditions: (a) RL is polynomially balanced (see its definition above); and (b) the decision problem “given x and y,
whether RL (x, y)” is in C. A counting problem is to compute
the cardinality of the set {y | RL (x, y)}, i.e., it is to find
how many y there are such that RL (x, y) is satisfied.
It is known that #·P = #P, #·NP ⊆ #NP = #·PNP =
#·coNP, but #·NP = #·coNP iff NP = coNP, where #P and
#NP are counting classes in the machine-based framework
of [31]. From these we know that the combined complexity
of CPP(CQ) is #NP-complete, and the data complexity of
CPP(LQ ) is #P-complete for all the languages considered.
Proof sketch: (1) We show that CPP(CQ) is #·coNP-hard
by reduction from #Π1 SAT. Given ϕ(X, Y ) = ∀X ψ(X, Y ),
where ψ is a 3DNF, #Π1 SAT is to count the number of truth
assignments of Y that satisfy ϕ, and is #·coNP-complete
[11]. The reduction is an 1-1 mapping from the solutions to
CPP(CQ) to the truth assignments for ϕ(X, Y ), and hence is
parsimonious. We also show that CPP(∃FO+ ) is in #·coNP.
(2) We show that CPP is #·PSPACE-hard for DATALOGnr
and FO by parsimonious reductions from #QBF, which is
#·PSPACE-complete [21]. Given ϕ = ∃X ∀y1 P2 y2 · · · Pn yn ψ,
where ψ is a 3SAT instance over X and {yi | i ∈ [1, n]}, and
Pi is ∀ or ∃, #QBF is to count the number of truth assignments of X that satisfy ϕ. For DATALOG, we verify that
CPP is #·EXPTIME-hard by parsimonious reduction from
all counting problems in #·EXPTIME. We also show that
CPP is in #·PSPACE (resp. #·EXPTIME) for DATALOGnr
and FO (resp. DATALOG) by the definition of #·C classes.
(3) When Q and Qc are fixed, we show that CPP(CQ) is
#·P-complete by parsimonious reduction from #SAT, which
is #·P-complete (cf. [25], by #P =#·P). Given an instance
ψ of 3SAT, #SAT is to count truth assignments that satisfy
ψ. We also show that CPP is in #·P for all the languages.
(4) When Qc is absent, we show that CPP(CQ) is #·NP-hard
by parsimonious reduction from #Σ1 SAT, which is #·NPcomplete [11]. Given ϕ(X, Y ) = ∃X ψ(X, Y ), #Σ1 SAT is to
count truth assignments of Y that satisfy ϕ, where ψ is a
3DNF. We also show that CPP(∃FO+ ) is in #·NP. When
LQ is DATALOGnr , FO or DATALOG, the proofs of (2) given
above carry over (its lower bound proofs do not use Qc ). ✷
4. SPECIAL CASES OF POI RECOMMENDATIONS
The results of Section 3 tell us that RPP, FRP, MBP and
CPP have rather high complexity. In this section we revisit
these problems for special cases of package recommendations, to explore the impact of various parameters of these
problems on their complexity. We consider the settings when
packages are bounded by a constant instead of a polynomial,
when LQ is a language for which the membership problem
is in PTIME, and when compatibility constraints are simply
PTIME functions. We also study item recommendations, for
which each package has a single item, and compatibility constraints are absent. Our main conclusion of this section is
that the complexity of these problem is rather robust: these
restrictions simplify the analyses, but not much.
Packages with a fixed bound. One might be tempted to
think that fixing package size would simplify the analyses.
Below we study the impact of fixing package sizes on package
selections, in the presence of compatibility constraints Qc ,
by considering packages N such that |N | ≤ Bp , where Bp is
a predefined constant rather than a polynomial.
We show that fixing package sizes does not make our lives
easier when combined complexity is concerned. In contrast,
this does simplify the analyses of data complexity.
Corollary 4.1: For packages with a constant bound Bp ,
the combined complexity bounds of RPP, FRP, MBP and
CPP are the same as given in Theorems 3.1, 3.3, 3.4 and
3.5, respectively; and the data complexity is
• in PTIME for RPP,
• in FP for FRP,
• in PTIME for MBP, and
• in FP for CPP,
for all the languages of Section 2. The complexity remains
unchanged even when Bp is fixed to be 1.
✷
Proof sketch: (1) The lower bounds of RPP, FRP, MBP
and CPP in the presence of Qc hold here, since their proofs
(Theorems 3.1, 3.3, 3.4 and 3.5) use only top-1 package with
one item, and all the upper bounds carry over here. (2) For
fixed Q and Qc , we give algorithms in PTIME, FP, PTIME
and FP for RPP, FRP, MBP and CPP, respectively.
✷
SP queries. In contrast, for queries that have a PTIME
complexity for their membership problem, variable package
sizes lead to higher complexity of RPP, FRP, MBP and CPP
than their counterparts for packages with a fixed bound.
To illustrate this, we consider SP queries, a simple fragment of CQ queries that support projection and selection
operators only. An SP query is of the form
Q(x ) = ∃x y (R(x, y ) ∧ ψ(x, y )),
where ψ is a conjunction of predicates =, =, <, ≤, > and ≥.
The result below holds for all query languages with a
PTIME membership problem, including but not limited to
SP. In fact the lower bounds remain intact even when the
selection criteria are specified by an identity query, when
|
y | = 0 and ψ is a tautology in an SP query.
Corollary 4.2: For SP queries, the combined complexity
and data complexity are
• coNP-complete for RPP, but in PTIME for packages
with a fixed (constant) bound Bp ;
• FPNP -complete for FRP, but in FP for fixed Bp ;
• DP-complete for MBP, but in PTIME for fixed Bp ; and
• #·P-complete for CPP, but in FP for fixed Bp .
when compatibility constraints are present or absent.
✷
Proof sketch: (1) For packages of variable sizes, the lower
bounds of RPP, FRP, MBP and CPP with fixed Q in CQ hold
for SP. Indeed, their proofs of Theorems 3.1, 3.3, 3.4 and 3.5
use an identity query as Q, which is in SP. For the upper
bounds, the algorithms given there for RPP, FRP, MBP and
CPP with a fixed Q apply to arbitrary SP queries.
(2) For packages with a constant bound, the algorithms for
fixed Q of Corollary 4.1 apply to SP queries, fixed or not. ✷
PTIME compatibility constraints. One might also think
that we would get lower complexity with PTIME compatibility constraints. That is, we simply treat compatibility
constraints as PTIME functions rather than queries in LQ .
In this setting, the complexity remains the same as its counterpart when Qc is absent, no better and no worse.
Corollary 4.3: With PTIME compatibility constraints Qc ,
the combined complexity and data complexity of RPP, FRP
MBP and CPP remain the same as their counterparts in the
absence of Qc , as given in Theorems 3.2, 3.3, 3.4 and 3.5,
respectively, for all the languages of Section 2.
✷
Proof sketch: The lower bounds of RPP, FRP, MBP and
CPP in the absence of Qc obviously carry over to this setting,
since when Qc is empty (see Section 2), Qc is in PTIME. The
upper bound proofs for Theorems 3.2, 3.3, 3.4 and 3.5 in
the absence of Qc also remain intact here. Indeed, adding
an extra PTIME step for checking Qc (N, D) = ∅ does not
increase the complexity of the algorithms given there.
✷
Item recommendations. As remarked in Section 2, item
recommendations are a special case of package recommendations when (a) compatibility constraints Qc are absent, and
(b) each package consists of a single item, i.e., with a fixed
size 1. Given a database D, a query Q ∈ LQ , a utility function f () and a natural number k ≥ 1, a top-k item selection
is a top-k package selection specified in terms of (Q, D, f ).
When Qc is absent and packages have a fixed size 1, one
might expect that the recommendation analyses would become much simpler. Unfortunately, this is not the case.
Theorem 4.4: For items, RPP, FRP, MBP and CPP have
• the same combined complexity as their counterparts in
the absence of Qc (Theorems 3.2, 3.3, 3.4, 3.5), and
• the same data complexity as their counterparts for
packages with a constant bound (Corollary 4.1),
for all the query languages given in Section 2.
✷
Proof sketch: (1) Combined complexity. The upper
bounds of these problems in the absence of Qc (Theorems
3.2, 3.3, 3.4, 3.5) obviously remain intact here. The lower
bound proofs for RPP and CPP given there are still valid for
item recommendations, since they require only top-1 packages with a single item. For FRP and MBP, however, new
lower bound proofs are required for item recommendations.
More specifically, we show that FRP(CQ) is FPNP -hard by
reduction from MAX-WEIGHT SAT, and that MBP(CQ) is
DP-hard by reduction from SAT-UNSAT, for item recommendations. For other languages LQ , the proofs for FRP(LQ )
and MBP(LQ ) are given along the same lines as their counterparts for Theorems 3.3 and 3.4, respectively.
(2) Data complexity. The algorithms developed for Corollary 4.1 suffice for item selections when Q is fixed.
✷
Summary. From these results we find the following.
Variable sizes of packages. (1) For simple queries that have
a PTIME membership problem, such as SP, the problems
with variable package sizes have higher combined and data
complexity than their counterparts with a fixed (constant)
package size. This is in line with the claim of [34]. (2) In
contrast, for any query language that subsumes CQ, variable
sizes of packages have no impact on the combined complexity
of these problems. This is consistent with the observation
of [26]. (3) When it comes to the data complexity, however,
variable (polynomially) package sizes make our lives harder:
RPP, FRP, MBP and CPP in this setting have a higher data
complexity than their counterparts with a fixed package size.
Compatibility constraints. (1) For CQ, UCQ and ∃FO+ , the
presence of Qc increases the combined complexity of the
analyses. (2) In contrast, for more powerful languages such
as DATALOGnr , FO and DATALOG, neither Qc nor variable
sizes make any difference. Indeed, RPP, FRP, MBP and CPP
have exactly the same combined complexity as their counterparts for item recommendations, in the presence or absence
of Qc . (3) For data complexity, the presence of Qc has no
impact. Indeed, when Qc is fixed, it is in PTIME to check
Qc (N, D) = ∅ for all LQ in which Qc is expressed; hence Qc
can be encoded in the cost() function, and no longer needs to
be treated separately. (4) To simplify the discussion we use
LQ to specify Qc . Nonetheless, all the complexity results
remain intact for any class C of Qc whose satisfiability problem has the same complexity as the membership problem
for LQ . In particular, when C is a class of PTIME functions,
the presence of Qc has no impact on the complexity.
The number k of packages. All the lower bounds of RPP,
FRP and MBP remain intact when k = 1 (k is irrelevant to
CPP), i.e., they carry over to top-1 package selections.
5.
RECOMMENDATIONS OF QUERY RELAXATIONS
We next study query relaxation recommendations. In
practice a selection query Q often finds no sensible packages.
When this happens, the users naturally want the recommendation system to suggests how to revise their selection
criteria by relaxing the query Q. We are not aware of any
recommendation systems that support this functionality.
Below we first present query relaxations (Section 5.1). We
then identify two query relaxation recommendation problems, and establish their complexity bounds (Section 5.2).
5.1 Query Relaxations
Consider a query Q, in which a set X of variables (free or
bound) and a set E of constants are parameters that can be
modified, e.g., variables or constants indicating departure
time and date of flights. Following [8], we relax Q by replacing constants in E with variables, and replacing repeated
variables in X with distinct variables, as follows.
(1) For each constant c ∈ E, we associate a variable wc with
c. We denote the tuple consisting of all such variables as w.
(2) For each variable x ∈ X that appears at least twice in
atoms of Q, we introduce a new variable ux and substitute
ux for one of the occurrences of x. For instance, an equijoin Q1 (v , y)∧Q2 (y,v ′ ) is converted to Q1 (v , y)∧Q2 (uy ,v ′ ),
a Cartesian product. This is repeated until no variable has
multiple occurrences. Let u be the tuple of all such variables.
We denote the domain of wc (resp. ux ) as dom(R.A) if c
(resp. x) appears in Q as an A-attribute value in relation R.
To prevent relaxations that are too general, we constrain
variables in w
and
u with certain ranges, by means of techniques developed for query relaxations [8, 18] and preference
queries [29]. To simplify the discussion, we assume that
for each attribute A in a relation R, a distance function
distR.A (a, b) is defined. Intuitively, if distR.A (a, b) is within
a bound, then b is close enough to a, and we can relax Q by
replacing a with its “neighbor” b. For instance, DB can be
generalized to CS if dist(DB, CS) is small enough [8]. We
denote by Γ the set of all such distance functions.
Given Γ, we define a relaxed query QΓ of Q(x) as:
∃u Q′ (x, w,
u ) ∧ ψw (w
) ∧ ψu (u ) ,
QΓ (x ) = ∃w
where Q′ is obtained from Q by substituting wc for constant
c, and ux for a repeated occurrence of x. Here ψw (w)
is
a conjunction of predicates of either (a) distR.A (wc , c) ≤ d,
where the domain of wc is dom(R.A), and d is a constant, or
(b) wc = c, i.e., the constant c is unchanged. Query ψw (w)
includes such a conjunct for each wc ∈ w;
similarly for ψu (u).
We define the level gap(γ) of relaxation of a predicate γ
in ψw (w)
as follows: gap(γ) = d if γ is distR.A (wc , c) ≤ d, and
gap(γ) = 0 if γ is wc = c; similarly for a predicate in ψu (u).
We define the level of relaxation of query QΓ , denoted by
gap(QΓ ), to be sumγ∈(ψw (w)∪ψ
u)) gap(γ).
u (
Example 5.1: Recall query Q defined on flight and POI in
Example 1.1. The query finds no items, as there is no direct
flight from edi to nyc. Suppose that E has constants edi,
nyc, 1/1/2012 and X = {xTo }, and that the user accepts
a city within 15 miles of the original departure city (resp.
destination) as From (resp. To), where dist( ) measures the
distances between cities. Then we can relax Q as:
Q1(f#,Pr, nm, tp, tkt, tm) = ∃DT, AT, AD, uTo , wEdi , wNYC , wDD
flight(f#,wEdi , xTo ,DT, wDD , AT, AD, Pr) ∧ xTo = wNYC
∧ POI(nm, uTo , tp, tkt, tm) ∧ wDD = 1/1/2012
∧ dist(wNYC , nyc) ≤ 15 ∧ dist(wEdi , edi) ≤ 15 ∧ xTo = uTo .
The relaxed Q1 finds direct flights from edi to ewr, since
the distance between nyc to ewr is within 15 miles.
We can relax Q1 by allowing wDD to be within 3 days of
1/12/11, where the distance function for dates is distd ():
Q2(f#,Pr, nm, tp, tkt, tm) = ∃DT, AT, AD, uTo , wEdi , wNYC , wDD
flight(f#, wEdi , xTo , DT, wDD , AT, AD, Pr) ∧ xTo = wNYC
∧ POI(nm, uTo , tp, tkt, tm) ∧ distd (wDD , 1/1/2012) ≤ 3
∧ dist(wNYC , nyc) ≤ 15 ∧ dist(wEdi , edi) ≤ 15 ∧ xTo = uTo .
Then Q2 may find more available direct flights than Q1 ,
with possibly cheaper airfare. One can further relax Q2 by
allowing uTo and xTo to match different cities nearby, i.e.,
we convert the equijoin to a Cartesian product.
✷
We consider simple query relaxation rules here just to illustrate the main idea of query relaxation recommendations,
and defer a full treatment of this issue to future work.
5.2 Query Relaxation Recommendations
We now study recommendation problems for query relaxations, for package selections and for item selections.
The query relaxation problem for packages. Consider
a database D, queries Q and Qc in LQ , functions cost()
and val(), a cost budget C, a rating bound B, and a natural number k ≥ 1. When there exists no top-k package selection for (Q, D, Qc , cost(), val(), C), we need to relax Q to
find more packages for the users. More specifically, let Γ
be a collection of distance functions, and X and E be sets
of variables and constants in Q, respectively, which are parameters that can be modified. We want to find a relaxed
query QΓ of Q such that there exists a set N of k valid packages for (QΓ , D, Qc , cost(), val(), C, B), i.e., for each N ∈ N ,
N ⊆ QΓ (D), Qc (N, D) = ∅, cost(N ) ≤ C, val(N ) ≥ B, and |N |
is bounded by a polynomial in |D|. Moreover, we want QΓ
to minimally differ from the original Q, stated as follows.
For a constant g, a relaxed query QΓ of Q is called
a relaxation of Q for (Q, D, Qc , cost(), val(), C, B, k, g) if
(a) there exists a set N of k distinct valid packages for
(QΓ , D, Qc , cost(), val(), C, B), and (b) gap(QΓ ) ≤ g.
QRPP(LQ ): The query relaxation rec. problem (packages)
INPUT:
A database D, a query Q ∈ LQ with sets
X and E identified, a query Qc ∈ LQ ,
two functions cost() and val(), natural
numbers C, B, g and k ≥ 1, and a collection Γ of distance functions.
QUESTION: Does there exist a relaxation QΓ of Q
for (Q, D, Qc , cost(), val(), C, B, k, g)?
No matter how important, QRPP is nontrivial: it is Σp2 complete for CQ, PSPACE-complete for DATALOGnr and FO,
and EXPTIME-complete for DATALOG. It is NP-complete
when selection criteria Q and compatibility constraints Qc
are both fixed. Fixing Qc alone reduces the combined complexity of QRPP(LQ ) when LQ is CQ, UCQ or ∃FO+ , but it
does not help when it comes to DATALOGnr , FO and DATALOG, or when the data complexity is concerned.
Theorem 5.1: For QRPP(LQ ), the combined complexity is
ommendations. However, it gets no better than QRPP in the
absence of Qc when the combined complexity is concerned.
Corollary 5.2: For all the query languages LQ given in
Section 2, QRPP(LQ ) for items (1) has the same combined
complexity as QRPP(LQ ) in the absence of compatibility
constraints; and (2) its data complexity is in PTIME.
✷
Proof sketch: For items, (1) we show that QRPP(CQ) is
NP-hard by reduction from 3SAT. To check QRPP(∃FO+ ) we
give an NP algorithm. For DATALOGnr , FO or DATALOG,
the lower bounds of Theorem 5.1 hold here since their proofs
use top-1 items only. The upper bounds of the combined
complexity also carry over. (2) We give a PTIME algorithm
to check QRPP for fixed Q in FO or DATALOG.
✷
Remarks. (1) All the lower bounds of this section remains
intact when k = 1, i.e., for top-1 package or item selections.
(2) The proofs of Theorem 5.1 and Corollary 5.2 also tell us
that for packages with a constant bound, QRPP(LQ ) has the
same combined complexity as its counterpart for packages
with variable sizes, and it has the same data complexity
as its counterpart for items. (3) In addition, when Qc is
a PTIME function, QRPP(LQ ) has the same combined and
data complexity as its counterpart in the absence of Qc .
These are consistent with Corollaries 4.1 and 4.3.
• Σp2 -complete when LQ is CQ, UCQ or ∃FO+ ;
6. ADJUSTMENT RECOMMENDATIONS
• PSPACE-complete when LQ is DATALOGnr or FO; and
We next study adjustment recommendations. In practice the collection D of items maintained by a recommendation system may fail to provide items that most users want.
When this happens, the vendors of the system would want
the system to recommend how to “minimally” modify D such
that users’ requests could be satisfied. Below we first present
adjustments to D (Section 6.1). We then study adjustment
recommendations problems (Section 6.2).
• EXPTIME-complete when LQ is DATALOG.
In the absence of compatibility constraints, its combined
complexity remains unchanged for DATALOGnr , FO and
DATALOG, and it is NP-complete for CQ, UCQ and ∃FO+ .
Its data complexity is NP-complete for all the languages,
in the presence or absence of compatibility constraints. ✷
Proof sketch: (1) We verify that QRPP(CQ) is Σp2 -hard
by reduction from the ∃∗ ∀∗ 3DNF problem, by using relaxed
queries. We show that QRPP(∃FO+ ) is in Σp2 by giving a
nondeterministic PTIME algorithm that calls an NP oracle.
(2) For DATALOGnr and FO, we show that QRPP is PSPACEhard by reductions from Q3SAT and the membership problem for FO, respectively, by using relaxed queries. We give
a PSPACE algorithm to check QRPP. Along the same lines
we show that QRPP(DATALOG) is EXPTIME-complete.
(3) When Q is fixed, we show that QRPP(CQ) is already
NP-hard in the absence of Qc , by reduction from 3SAT. We
show the upper bound by giving an NP algorithm to check
QRPP(LQ ) for fixed queries Q and Qc in FO or DATALOG.
(4) In the absence of Qc , it has been shown that QRPP(CQ)
is NP-hard by the proof of (3) above, and the algorithm for
QRPP(∃FO+ ) given in (1) is now in NP. For DATALOGnr ,
FO and DATALOG, the proofs given in (2) above can be
applied here, since their lower bound proofs do not use Qc ,
and their upper bounds still hold in this special case.
✷
The query relaxation problem for items. We also
study a special case of QRPP, for item selections. Given
D, Q, Γ, B, k, and a utility function f (), QRPP for
items is to decide whether there exist a relaxation QΓ of Q
for (Q, D, Qc , cost(), val(), C, B, k, g), when Qc is empty, and
cost(), val() and C are derived from f () as given in Section 2.
Compared to its package counterpart, item selections simplify the data complexity analyses of query relaxation rec-
6.1 Adjustments to Item Collections
Consider a database D consisting of items provided by
a system, and a collection D′ of additional items. We use
∆(D, D′ ) to denote adjustments to D, which is a set consisting of (a) tuples to be deleted from D, and (b) tuples from
D′ to be inserted into D. We use D⊕∆(D, D′ ) to denote
the database obtained by modifying D with ∆(D, D′ ).
Consider queries Q, Qc in LQ , functions cost() and val(),
a cost budget C, a rating bound B, and a natural number
k ≥ 1, such that there exists no top-k package selection for
(Q, D, Qc , cost(), val(), C). We want to find a set ∆(D, D′ )
of adjustments to D such that there exists a set N of k
valid packages for (Q, D⊕∆(D, D′ ), Qc , cost(), val(), C, B),
i.e., D⊕∆(D, D′ ) yields k packages N that are rated
above B, and satisfy the selection criteria Q, compatibility
constraints Qc as well as aggregate constraints cost(N ) ≤ C.
One naturally wants to find a “minimum” ∆(D, D′ ) to
adjust D. For a constant k′ ≥ 1, we call ∆(D, D′ ) a package adjustment for (Q, D, Qc , cost(), val(), C, B, k, k′ ) if (a)
|∆(D, D′ )| ≤ k′ , and (b) there exist k distinct valid packages
for (Q, D⊕∆(D, D′ ), Qc , cost(), val(), C, B).
6.2 Deciding Adjustment Recommendations
These suggest that we study the following problem.
The adjustment recommendation problem. Given
D, D′ , Q, Qc , cost(), val(), k and k′ , the adjustment
recommendation problem for packages, ARPP, is to decide whether there is a package adjustment ∆(D, D′ ) for
(Q, D, Qc , cost(), val(), C, B, k, k′ ). This problem is no easier
than the analyses of query relaxation recommendations. Indeed, ARPP(LQ ) has the same combined and data complexity as QRPP(LQ ), although their proofs are quite different.
Theorem 6.1: The combined complexity of ARPP(LQ ) is
• Σp2 -complete when LQ is CQ, UCQ or ∃FO+ ;
• PSPACE-complete when LQ is DATALOGnr or FO; and
• EXPTIME-complete when LQ is DATALOG.
In the absence of compatibility constraints, its combined
complexity remains unchanged for DATALOGnr , FO and
DATALOG, and it is NP-complete for CQ, UCQ and ∃FO+ .
Its data complexity is NP-complete for all the languages,
in the presence or absence of compatibility constraints. ✷
Proof sketch: (1) We show that ARPP(CQ) is Σp2 -hard by
reduction from the ∃∗ ∀∗ 3DNF problem. The reduction here
makes use of updates ∆(D, D′ ), and is different from the one
for QRPP(CQ). To verify the upper bound, we give an NP
algorithm that uses an NP oracle to check ARPP(∃FO+ ).
(2) For DATALOGnr and FO, we show that ARPP is PSPACEhard by reductions from Q3SAT and the membership problem for FO, respectively, which again use ∆(D, D′ ) and are
different from the ones used earlier. We give a PSPACE algorithm to check ARPP for DATALOGnr and FO. Similarly
we show that ARPP(DATALOG) is EXPTIME-complete.
(3) When Q and Qc are fixed, we show that ARPP(CQ) is
NP-hard without Qc by reduction from 3SAT. We also provide an NP algorithm to check ARPP for FO and DATALOG.
(4) When Qc is absent, ARPP(CQ) is NP-hard even when Q
is fixed by the proof of (3), and the algorithm for ∃FO+ given
in (1) is now in NP. For the languages considered in (2), their
lower bound proofs do not use Qc , and their upper bound
proofs cover this special case. Moreover, the algorithm developed in (3) and the lower bound of ARPP(CQ) verify that
the data complexity is NP-complete.
✷
The adjustment recommendation problem for items.
Given D, D′ , Q, B, k, k′ and a utility function f (), ARPP for
items is to decide whether there is an adjustment ∆(D, D′ )
for (Q, D, Qc , cost(), val(), C, B, k, k′ ), where Qc is empty, and
cost(), val(), C are derived from f () (see Section 2).
One might expect that fixing package sizes in item selections would simplify the analyses of adjustment recommendations. Recall that all the problems we have studied so far
have a lower data complexity for item selections than their
counterparts for packages. For instance, the data complexity of QRPP for items is in PTIME while it is NP-complete
for packages; similarly for RPP, FRP, MBP and CPP. In
contrast, we show below that the data complexity of ARPP
for packages is robust: it remains intact for items. In other
words, fixing package sizes does not help here.
Corollary 6.2: For all the languages LQ given in Section 2,
ARPP in the absence of compatibility constraints and ARPP
for items have the same combined and data complexity. ✷
Proof sketch: All the lower bound proofs for ARPP(LQ ) in
the absence of Qc use top-k item selections, and ARPP(CQ)
is NP-hard for fixed Q. Hence for all LQ , ARPP without Qc
has the same combined complexity as ARPP for items, and
the data complexity of ARPP carries over to item selections.
Note that when proving ARPP(CQ) is NP-hard for fixed Q,
we do not use k = 1, the only case in the entire paper.
✷
Remarks. One can find the following from the proofs of
Theorem 6.1 and Corollary 6.2. (1) For packages with a
constant bound, ARPP(LQ ) has the same combined complexity as ARPP(LQ ) for packages with variable sizes, and
it has the same data complexity as ARPP(LQ ) for items. (3)
When Qc is in PTIME, ARPP(LQ ) has the same combined
and data complexity as ARPP(LQ ) in the absence of Qc .
7. CONCLUSIONS
We have studied a general model for recommendation systems, and investigated several fundamental problems in the
model, from decision problems RPP, MBP to function problem FRP and counting problem CPP. Beyond POI recommendations, we have proposed and studied QRPP for query
relaxation recommendations, and ARPP for adjustment recommendations. We have also investigated special cases of
these problems, when compatibility constraints Qc are absent or in PTIME, when all packages are bounded by a constant Bp , and when both Qc is absent and Bp is fixed to be 1
for item selections. We have provided a complete picture of
the lower and upper bounds of these problems, all matching,
for both their data complexity and combined complexity,
when LQ ranges over a variety of query languages. These
results tell us where complexity of these problems arises.
The main complexity results are summarized in Table 1,
annotated with their corresponding theorems (the results
for SP (Corollary 4.2) are excluded). As remarked earlier,
(1) the data complexity is independent of query languages,
and remains unchanged in the presence of compatibility constraints Qc or not. However, it varies when packages have
variable sizes or a constant bound, as shown in Table 1. (2)
The complexity bounds of these problems for CQ, UCQ and
∃FO+ vary when Qc is present or not, and when packages
have a constant bound or not. In contrast, the bounds for
FO, DATALOGnr and DATALOG are robust, regardless of the
presence of Qc and package sizes. (3) When Qc is a PTIME
function, these problems have the same complexity as their
counterparts in the absence of Qc . (4) Item selections do not
come with Qc and have a fixed package size (see Table 1).
The study of recommendation problems is still preliminary. First, we have only considered simple rules for query
relaxations and adjustment recommendations, to focus on
the main ideas. These issues deserve a full investigation.
Second, this work aims to study a general model that subsumes previous models developed for various applications,
and hence adopts generic functions cost(), val() and f ().
These need to be fine tuned by incorporating information
about users, collaborative filtering and specific aggregate
functions. Third, to simplify the discussion we assume that
selection criteria Q and compatibility constraints Qc are
expressed in the same language (albeit PTIME Qc ). It is
worth studying different languages for Q and Qc . Fourth,
the recommendation problems are mostly intractable. An
interesting topic is to identify practical and tractable cases.
Another issue to consider are group recommendations [5],
to a group of users instead of a single user.
Acknowledgments. Fan and Geerts are supported in part
by an IBM scalable data analytics award, and the RSENSFC Joint Project Scheme. Fan is also supported in part
by the 973 Program 2012CB316200 and NSFC 61133002 of
China. Deng is supported in part by 863 2012AA011203,
NSFC 61103031 and CPSF 2011M500208 of China.
Problems
Languages
RPP
CQ, UCQ, ∃FO+
DATALOGnr , FO
DATALOG
FRP
CQ, UCQ, ∃FO+
DATALOGnr , FO
DATALOG
MBP
CQ, UCQ, ∃FO+
DATALOGnr , FO
DATALOG
CPP
CQ, UCQ, ∃FO+
DATALOGnr , FO
DATALOG
QRPP
CQ, UCQ, ∃FO+
DATALOGnr , FO
DATALOG
ARPP
CQ, UCQ, ∃FO+
DATALOGnr , FO
DATALOG
With Qc
Combined complexity
Without Qc
Πp2 -complete(§)
DP-complete(⋆,†)
PSPACE-complete(§)
PSPACE-complete(⋆,†)
EXPTIME-complete(§)
EXPTIME-complete(⋆,†)
(Th. 3.1)
(Th. 3.2)
p
FPΣ2 -complete(§)
FPNP -complete(⋆,†)
FPSPACE(poly)-complete(§)
FPSPACE(poly)-complete(⋆,†)
FEXPTIME(poly)-complete(§) FEXPTIME(poly)-complete(⋆,†)
(Th. 3.3)
(Th. 3.3)
Dp2 -complete
DP-complete(⋆,†)
PSPACE-complete(§)
PSPACE-complete(⋆,†)
EXPTIME-complete
EXPTIME-complete(⋆,†)
(Th. 3.4)
(Th. 3.4)
#·coNP-complete(§)
#·NP-complete(⋆,†)
#·PSPACE-complete(§)
#·PSPACE-complete(⋆,†)
(§)
#·EXPTIME-complete
#·EXPTIME-complete(⋆,†)
(Th. 3.5)
(Th. 3.5)
Σp2 -complete(§)
NP-complete(⋆,†)
PSPACE-complete(§)
PSPACE-complete(⋆,†)
(§)
EXPTIME-complete
EXPTIME-complete(⋆,†)
(Th. 5.1)
(Th. 5.1)
Σp2 -complete(§)
NP-complete(⋆,†)
PSPACE-complete(§)
PSPACE-complete(⋆,†)
EXPTIME-complete(§)
EXPTIME-complete(⋆,†)
(Th. 6.1)
(Th. 6.1)
Table 1: Complexity results ((⋆) : items (Th. 4.4);
8.
(§)
REFERENCES
[1] S. Abiteboul, R. Hull, and V. Vianu. Foundations of
Databases. Addison-Wesley, 1995.
[2] G. Adomavicius and A. Tuzhilin. Multidimensional
recommender systems: A data warehousing approach. In
WELCOM, 2001.
[3] G. Adomavicius and A. Tuzhilin. Toward the next generation
of recommender systems: A survey of the state-of-the-art and
possible extensions. TKDE, 17(6):734–749, 2005.
[4] S. Amer-Yahia. Recommendation projects at Yahoo! IEEE
Data Eng. Bull., 34(2):69–77, 2011.
[5] S. Amer-Yahia, S. B. Roy, A. Chawla, G. Das, and C. Yu.
Group recommendation: Semantics and efficiency. PVLDB,
2(1):754–765, 2009.
[6] A. Angel, S. Chaudhuri, G. Das, and N. Koudas. Ranking
objects based on relationships and fixed associations. In
EDBT, 2009.
[7] A. Brodsky, S. Henshaw, and J. Whittle. CARD: A
decision-guidance framework and application for
recommending composite alternatives. In RecSys, 2008.
[8] S. Chaudhuri. Generalization and a framework for query
modification. In ICDE, 1990.
[9] S. Chaudhuri and M. Y. Vardi. On the equivalence of recursive
and nonrecursive Datalog programs. JCSS, 54(1):61–78, 1997.
[10] S. Cohen and Y. Sagiv. An incremental algorithm for
computing ranked full disjunctions. JCSS, 73(4):648–668, 2007.
[11] A. Durand, M. Hermann, and P. G. Kolaitis. Subtractive
reductions and complete problems for counting complexity
classes. TCS, 340(3):496–513, 2005.
[12] R. Fagin, A. Lotem, and M. Naor. Optimal aggregation
algorithms for middleware. JCSS, 66(4):614–656, 2003.
[13] T. Gaasterland and J. Lobo. Qualifying answers according to
user needs and preferences. Fundam. Inform., 32(2):121–137,
1997.
[14] K. Golenberg, B. Kimelfeld, and Y. Sagiv. Optimizing and
parallelizing ranked enumeration. PVLDB, 4(11):1028–1039,
2011.
[15] L. A. Hemaspaandra and H. Vollmer. The satanic notations:
counting classes beyond #P and other definitional adventures.
SIGACT News, 26(1):2–13, 1995.
[16] I. F. Ilyas, G. Beskales, and M. A. Soliman. A survey of top-k
query processing techniques in relational database systems.
ACM Comput. Surv., 40(4):11:1–11:58, 2008.
Data complexity
Poly-bounded
Constant bound
coNP-complete(†)
coNP-complete(†)
coNP-complete(†)
(Th. 3.1)
FPNP -complete(†)
FPNP -complete(†)
FPNP -complete(†)
(Th. 3.3)
DP-complete(†)
DP-complete(†)
DP-complete(†)
(Th. 3.4)
#·P-complete(†)
#·P-complete(†)
#·P-complete(†)
(Th. 3.5)
NP-complete(†)
NP-complete(†)
NP-complete(†)
(Th. 5.1)
NP-complete(†)
NP-complete(†)
NP-complete(†)
(Th. 6.1)
: constant bound (Cor. 4.1);
(†)
PTIME (⋆)
PTIME (⋆)
PTIME (⋆)
(Cor. 4.1)
FP (⋆)
FP (⋆)
FP (⋆)
(Cor. 4.1)
PTIME (⋆)
PTIME (⋆)
PTIME (⋆)
(Cor. 4.1)
FP (⋆)
FP (⋆)
FP (⋆)
(Cor. 4.1)
PTIME (⋆)
PTIME (⋆)
PTIME (⋆)
(Cor. 5.2)
NP-complete(⋆)
NP-complete(⋆)
NP-complete(⋆)
(Cor. 6.2)
: PTIME Qc (Cor. 4.3))
[17] A. Kadlag, A. V. Wanjari, J. Freire, and J. R. Haritsa.
Supporting exploratory queries in databases. In DASFAA,
2004.
[18] N. Koudas, C. Li, A. K. H. Tung, and R. Vernica. Relaxing
join and selection queries. In VLDB, 2006.
[19] G. Koutrika, B. Bercovitz, and H. Garcia-Molina. FlexRecs:
expressing and combining flexible recommendations. In
SIGMOD, 2009.
[20] M. W. Krentel. Generalizations of Opt P to the polynomial
hierarchy. TCS, 97(2):183–198, 1992.
[21] R. E. Ladner. Polynomial space counting problems. SIAM J.
Comput., 18(6):1087–1097, 1989.
[22] T. Lappas, K. Liu, and E. Terzi. Finding a team of experts in
social networks. In KDD, 2009.
[23] C. Li, M. A. Soliman, K. C.-C. Chang, and I. F. Ilyas.
RankSQL: supporting ranking queries in relational database
management systems. In VLDB, 2005.
[24] A. Natsev, Y.-C. Chang, J. R. Smith, C.-S. Li, and J. S.
Vitter. Supporting incremental join queries on ranked inputs.
In VLDB, 2001.
[25] C. H. Papadimitriou. Computational Complexity. AW, 1994.
[26] A. G. Parameswaran, H. Garcia-Molina, and J. D. Ullman.
Evaluating, combining and generalizing recommendations with
prerequisites. In CIKM, 2010.
[27] A. G. Parameswaran, P. Venetis, and H. Garcia-Molina.
Recommendation systems with complex constraints: A course
recommendation perspective. TOIS, 29(4), 2011.
[28] K. Schnaitter and N. Polyzotis. Evaluating rank joins with
optimal cost. In PODS, 2008.
[29] K. Stefanidis, G. Koutrika, and E. Pitoura. A survey on
representation, composition and application of preferences in
database systems. TODS, 36(3), 2011.
[30] L. J. Stockmeyer. The polynomial-time hierarchy. TCS,
3(1):1–22, 1976.
[31] L. Valiant. The complexity of computing the permanent. TCS,
8(2):189 – 201, 1979.
[32] M. Y. Vardi. The complexity of relational query languages. In
STOC, 1982.
[33] M. Wooldridge and P. E. Dunne. On the computational
complexity of qualitative coalitional games. Artif. Intell.,
158(1):27–73, 2004.
[34] M. Xie, L. V. S. Lakshmanan, and P. T. Wood. Breaking out
of the box of recommendations: from items to packages. In
RecSys, 2010.