Determining the Currency of Data
Wenfei Fan
Floris Geerts
Jef Wijsen
University of Edinburgh &
Harbin Institute of Technology
School of Informatics
University of Edinburgh
Institut d’Informatique
Université de Mons
wenfei@inf.ed.ac.uk
fgeerts@inf.ed.ac.uk
jef.wijsen@umons.ac.be
Abstract
Data in real-life databases become obsolete rapidly. One
often finds that multiple values of the same entity reside in
a database. While all of these values were once correct, most
of them may have become stale and inaccurate. Worse still,
the values often do not carry reliable timestamps. With this
comes the need for studying data currency, to identify the
current value of an entity in a database and to answer queries
with the current values, in the absence of timestamps.
This paper investigates the currency of data. (1) We propose a model that specifies partial currency orders in terms
of simple constraints. The model also allows us to express
what values are copied from other data sources, bearing currency orders in those sources, in terms of copy functions defined on correlated attributes. (2) We study fundamental
problems for data currency, to determine whether a specification is consistent, whether a value is more current than
another, and whether a query answer is certain no matter
how partial currency orders are completed. (3) Moreover,
we identify several problems associated with copy functions,
to decide whether a copy function imports sufficient current
data to answer a query, whether such a function copies redundant data, whether a copy function can be extended to
import necessary current data for a query while respecting
the constraints, and whether it suffices to copy data of a
bounded size. (4) We establish upper and lower bounds of
these problems, all matching, for combined complexity and
data complexity, and for a variety of query languages. We
also identify special cases that warrant lower complexity.
s1 :
s2 :
s3 :
s4 :
s5 :
FN
Mary
Mary
Mary
Bob
Robert
LN
address
salary
Smith
2 Small St
50k
Dupont 10 Elm Ave
50k
Dupont
6 Main St
80k
Luth
8 Cowan St
80k
Luth
8 Drum St
55k
(a) Relation Emp
status
single
married
married
married
married
t1 :
t2 :
t3 :
t4 :
dname
R&D
R&D
R&D
R&D
mgrFN
Mary
Mary
Mary
Ed
budget
6500k
7000k
6000k
6000k
mrgLN
Smith
Smith
Dupont
Luth
(b) Relation
mgrAddr
2 Small St
2 Small St
6 Main St
8 Cowan St
Dept
Figure 1: A company database
is, in a database of 500 000 customer records, 10 000 records
may go stale per month, 120 000 records per year, and within
two years about 50% of all the records may be obsolete. In
light of this, we often find that multiple values of the same
entity reside in a database, which were once correct, i.e.,
they were true values of the entity at some time. However,
most of them have become obsolete and inaccurate. As an
example from daily life, when one moves to a new address,
her bank may retain her old address, and worse still, her
credit card bills may still be sent to her old address for quite
some time (see, e.g., [22] for more examples). Stale data is
one of the central problems to data quality. It is known that
dirty data costs us businesses 600 billion usd each year [15],
and stale data accounts for a large part of the losses.
This highlights the need for studying the currency of data,
which aims to identify the current values of entities in a
database, and to answer queries with the current values.
The question of data currency would be trivial if all data
values carried valid timestamps. In practice, however, one
often finds that timestamps are unavailable or imprecise
[34]. Add to this the complication that data values are often
copied or imported from other sources [2, 12, 13], which may
not support a uniform scheme of timestamps. These make
it challenging to identify the current values.
Not all is lost. One can often deduce currency orders from
the semantics of the data. Moreover, data copied from other
sources inherit currency orders from those sources. Taken
together, these often provide sufficient current values of the
data to answer certain queries, as illustrated below.
Categories and Subject Descriptors: H.2.3 [Information Systems]: Database Management – Languages; F.4.1
[Mathematical Logic and Formal Languages]: Mathematical Logic — Computational Logic
General Terms: Languages, Theory, Design.
1. Introduction
The quality of data in a real-life database quickly degenerates over time. Indeed, it is estimated that “ 2% of records
in a customer file become obsolete in one month” [15]. That
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
PODS’11, June 13–15, 2011, Athens, Greece.
Copyright 2011 ACM 978-1-4503-0660-7/11/06 ...$10.00.
Example 1.1: Consider two relations of a company shown
in Fig. 1. Each Emp tuple is an employee record with name,
address, salary and marital status. A Dept tuple specifies
the name, manager and budget of a department. Records in
these relations may be stale, and do not carry timestamps.
By entity identification techniques (see, e.g., [16]), we know
71
that tuples s1 , s2 and s3 refer to the same employee Mary,
but s4 and s5 represent different people distinct from Mary.
Consider the following queries posed on these relations.
(1) Query Q1 is to find Mary’s current salary. No timestamps are available for us to tell which of 50k or 80k is
more current. However, we may know that the salary of each
employee in the company does not decrease, as commonly
found in real world. This yields currency orders s1 ≺salary s3
and s2 ≺salary s3 , i.e., s3 [salary] is more current than both
s1 [salary] and s2 [salary]. Hence the answer to Q1 is 80k.
(2) Query Q2 is to find Mary’s current last name. We can no
longer answer Q2 as above. Nonetheless, we may know the
following: (a) marital status can only change from single to
married and from married to divorced; but not from married
to single; and (b) Emp tuples with the most current marital
status also contain the most current last name. Therefore,
s1 ≺LN s2 and s1 ≺LN s3 , and the answer to Q2 is Dupont.
(3) Query Q3 is to find Mary’s current address. We may
know that Emp tuples with the most current status or salary
contain the most current address. Putting this and (1) above
together, we know that the answer to Q3 is “6 Main St”.
(4) Finally, query Q4 is to find the current budget of department R&D. Again no timestamps are available for us
to evaluate the query. However, we may know the following: (a) Dept tuples t1 and t2 have copied their mgrAddr
values from s1 [address] in Emp; similarly, t3 has copied from
s3 , and t4 from s4 ; and (b) in Dept, tuples with the most
current address also have the most current budget. Taken
together, these tell us that t1 ≺budget t3 and t2 ≺budget t3 . Observe that we do not know which budget in t3 or t4 is more
current. Nevertheless, in either case the most current budget is 6000k, and hence it is the answer to Q4 .
✷
practice. For instance, all the currency relations we have
seen in Example 1.1 can be expressed as denial constraints.
(3) We define a copy relationship from relation Dj to
Dk in terms of a partial mapping, referred to as a copy
function. It specifies what attribute values in Dj have
been copied from Dk along with their currency orders in
Dk . It also assures that correlated attributes are copied
together. As observed in [2, 12, 13], copy functions are
common in real world, and can be automatically discovered.
Putting these together, we consider D = (D1 , . . . , Dn ), a
collection of relations such that (a) each Dj has currency
orders partially defined on its tuples for each attribute, indicating available currency information; (b) each Dj satisfies a
set Σj of denial constraints, which expresses currency orders
derived from the semantics of the data; and (c) for each pair
Dj , Dk of relations, there are possibly copy functions defined
on them, which import values from one to another.
We study consistent completions Djc of Dj , which extend
≺A in Dj to a total order on all tuples pertaining to the
same entity, such that Djc satisfies Σj and constraints imposed by the copy functions. One can construct from Djc
the current tuple for each entity w.r.t. ≺A , which contains
the entity’s most current A value for each attribute A. This
yields the current instance of Djc consisting of only the current tuples of the entities in Dj , from which currency orders
are removed. We evaluate a query Q on current instances of
relations in D, without worrying about currency orders. We
study certain current answers to Q in D, i.e., tuples that
are the answers to Q in all consistent completions of D.
These suggest that we give a full treatment of data currency, and answer the following questions. How should
we specify currency orders on data values in the absence
of timestamps but in the presence of copy relationships?
When currency orders are only partly available, can we decide whether an attribute value is more up-to-date than another? How can we answer a query with only current data
in a database? To answer a query, do we need to import
current data from another source, and what to copy? The
ability to answer these questions may provide guidance for
practitioners to decide, e.g., whether the answer to a query is
corrupted by stale data, or what copy functions are needed.
Reasoning about data currency. We study fundamental problems for data currency. (a) The consistency problem is to determine, given denial constraints Σj imposed on
each Dj and copy functions between these relations, whether
there exist consistent completions of every Dj , i.e., whether
the specification makes sense. (b) The certain ordering problem is to decide whether a currency order is contained in all
consistent completions. (c) The deterministic current instance problem is to determine whether the current instance
of each relation remains unchanged for all consistent completions. The ability to answer these questions allows us to
determine whether an attribute value is certainly more current than another, and to identify the current value of an
entity. (d) The certain current query answering problem is
to decide whether a tuple t is a certain current answer to a
query Q, i.e., it is certainly computed using current data.
A model for data currency. To answer these questions,
we approach data currency based on the following.
(1) For each attribute A of a relation D, we assume an (implicit) currency order ≺A on its tuples such that for tuples
t1 and t2 in D that represent the same real-world entity,
t1 ≺A t2 indicates that t2 is more up-to-date than t1 in the
A attribute value. Here ≺A is not a total order since in practice, currency information is only partially available. Note
that for distinct attributes A and B, we may have t1 ≺A t2
and t2 ≺B t1 , i.e., there may be no single tuple that is most
up-to-date in all attribute values.
(2) We express additional currency relationships as denial
constraints [3, 7], which are simple universally quantified FO
sentences that have been used to improve the consistency
of data. We show that the same class of constraints also
suffices to express currency semantics commonly found in
Currency preserving copy functions. It is natural to
ask what values should be copied from one data source to
another in order to answer a query. To characterize this intuition we introduce a notion of currency preservation. Consider data sources D = (D1 , . . . , Dp ) and D′ = (D1′ , . . . , Dq′ ),
each consisting of a collection of relations with denial constraints imposed on them. Consider copy functions ρ from
relations in D′ to those in D. For a query Q posed on D, we
say that ρ is currency preserving if no matter how we extend
ρ by copying from D′ more values of those entities in D, the
certain current answers to Q in D remain unchanged. In
other words, ρ has already imported data values needed for
computing certain current answers to Q.
We identify several problems associated with currencypreserving copy functions. (a) The currency preservation
problem is to determine, given Q, ρ, D, D′ and their denial
72
Closer to this work are [31, 24, 25, 20] on querying indefinite data. In [31], the evaluation of CQ queries is studied
on data that is linearly ordered but only provides a partial
order. The problem studied there is similar to (yet different from) certain current query answering. An extension
of conditional tables [19, 21] is proposed in [24] to incorporate indefinite temporal information, and in that setting,
the complexity bounds for FO query evaluation are provided
in [25]. Recently the non-emptiness problem for datalog on
linear orders is investigated in [20]. However, none of these
considers copying data from external sources, or the analyses of certain ordering and currency-preserving copy functions. In addition, we answer queries using current instances
of relations, which are normal relations without (currency)
ordering. This semantics is quite different from its counterparts in previous work. We also consider denial constraints
and copy functions, which are not expressible in CQ or datalog studied in [31, 20]. In contrast to our work, [24, 25]
assume explicit timestamps, while we use denial constraints
to specify data currency. To encode denial constraints in extended conditional tables of [24, 25], an exponential blowup
is inevitable. Because of these reasons, the results of [31,
24, 25, 20] cannot carry over to our setting, and vice versa.
There has also been a large body of work on the temporal constraint satisfaction problem (TCSP), which is to
find a valuation of temporal variables that satisfies a set of
temporal constraints (see, e.g., [4, 29]). It differs from our
consistency problem in that it considers neither completions
of currency orders that satisfy denial constraints, nor copy
relationships. Hence the results for TCSP are not directly
applicable to our consistency problem, and vice versa.
Copy relationships between data sources have recently
been studied in [2, 12, 13]. The previous work has focused
on automatic discovery of copying dependencies and functions. Copy relationships are also related to data provenance, which studies propagation of annotations in data
transformations and updates (see [5, 6] for recent surveys
on data provenance). However, to the best of our knowledge, no previous work has studied currency-preserving copy
functions and their associated problems.
Denial constraints have proved useful in detecting data
inconsistencies and data repairing (see, e.g., [3, 7]). We
adopt the same class of constraints to specify the currency of
data, so that data currency and consistency could be treated
in a uniform logical framework. Denial constraints can also
be automatically discovered, along the same lines as data
dependency profiling (see, e.g., [17]).
The study of data currency is also related to research on
incomplete information (see [32] for a survey), when missing
data concerns data currency. In contrast to that line of work,
we investigate how to decide whether a value is more current
than another, and study the properties of copy functions.
We use denial constraints to specify data currency, which
are, as remarked earlier, more succinct than, e.g., C-tables
and V-tables for representing incomplete information [19,
21]. In addition, we evaluate queries using current instances,
a departure from the study of incomplete information.
Certain query answers have been studied in data integration and exchange. In data integration, for a query Q
posed on a global database DG , it is to find the certain answers to Q over all data sources that are consistent with
DG w.r.t. view definitions (see e.g., [27]). In data exchange,
it is to find the certain answers to a query over all target
constraints, whether ρ is currency preserving for Q. Intuitively, we want to know whether we need to extend ρ in
order to answer Q. (b) The minimal copying problem is to
decide whether ρ is minimal among all currency-preserving
copy functions for Q, i.e., ρ copies the least amount of data.
This helps us inspect whether ρ copies unnecessary data.
(c) The existence problem is to determine whether ρ can be
extended to be currency preserving for Q. (d) Moreover, the
bounded copying problem is to decide whether there exists
such an extension that imports additional data of a bounded
size. Intuitively, we want to find currency-preserving copy
functions that import as few data values as possible.
Complexity results. We provide combined complexity and
data complexity of all the problems stated above. For the
combined complexity of the problems that involve queries,
we investigate the impact of various query languages,
including conjunctive queries (CQ), unions of conjunctive
queries (UCQ), positive existential FO (∃FO+ ) and FO. We
establish upper and lower bounds of these problems, all
matching, ranging over O(1), NP, coNP, ∆p2 , Πp2 , Σp2 , ∆p3 ,
Πp3 , Σp3 , Σp4 and PSPACE. We find that most of the problems
are intractable. In light of this, we also identify special
practical cases with lower complexity, some in PTIME. We
also study the impact of denial constraints. For example, in
the absence of denial constraints, the certain current query
answering problem is in PTIME for SP queries (CQ queries
without “join”), but it becomes intractable when denial
constraints are present, even when the constraints are fixed.
This work is a first step towards a systematic study of data
currency in the absence of reliable timestamps but in the
presence of copy relationships. The results help practitioners specify data currency, analyze query answers and design
copy functions. We also provide a complete picture of complexity bounds for important problems associated with data
currency and copy functions, which are proved by using a
variety of reductions and by providing (PTIME) algorithms.
Related work. There has been a host of work on temporal
databases (see, e.g., [8, 30] for surveys). Temporal databases
provide support for valid time, transaction time, or both.
They assume the availability of timestamps, and refer to
“now” by means of current-time variables [9, 14]. Dynamic
and temporal integrity constraints allow to restrict the set of
legal database evolutions. Our currency model differs from
temporal data models in several respects. We do not assume explicit timestamps. Nevertheless, if such timestamps
are present, they can be related to currency by means of denial constraints. Unlike temporal databases that timestamp
entire tuples, our model allows that different values within
the same tuple have distinct currencies. That is, the same
tuple can contain an up-to-date value for one attribute, and
an outdated value for another attribute.
Since currency orders are different from temporal orders used in temporal databases, our currency (denial) constraints differ from traditional temporal constraints. Currency constraints can sometimes be derived from temporal
constraints. For example, when salaries are constrained to
be non-decreasing, we can express that the highest salary
is the most current one. Also, our copy functions can require certain attributes to be copied together when these
attributes cannot change independently, as for example expressed by the dynamic functional dependencies in [33].
73
th [Al ] are identical (resp. distinct) values; (3) tj [Al ] = c
(resp. tj [Al ] 6= c), where c is a constant; and (4) possibly
other built-in predicates defined on particular domains.
The constraint is interpreted over completions Dtc of temporal instances of R. We say that Dtc satisfies ϕ, denoted
by Dtc |= ϕ, if for all tuples t1 , . . . , tk in D that have the same
EID value, if these tuples satisfy the predicates in ψ following the standard semantics of FO, then tu ≺Ai tv . The use of
EID in ϕ enforces that ϕ is imposed on tuples that refer to
the same entity. We say that Dtc satisfies a set Σ of denial
constraints, denoted by Dtc |= Σ, if Dtc |= ϕ for all ϕ ∈ Σ.
databases generated from data sources via schema mapping
(see [23]). In contrast, we consider certain answers to a
query over all completions of currency orders, which satisfy
denial constraints and constraints from copy functions. Certain current query answering is also different from consistent
query answering (see, e.g., [3, 7]), which is to find certain answers to a query over all repairs of a database and does not
distinguish between stale and current data in the repairs.
Finally, whereas it may be possible to model our setting as
a data exchange scenario with built-in constraints [11], our
complexity results do not follow gratuitously and a careful
analysis of the chase is required in this setting.
Example 2.1: Recall relations Emp and Dept given in
Fig. 1. Denial constraints on these relations include:
Organization. Section 2 presents the data currency model.
Section 3 states its related problems. Section 4 establishes
the complexity bounds of those problems. Section 5 introduces the notion of currency preservation and its fundamental problems, followed by their complexity analysis in Section 6. Section 7 summarizes the main results of the paper.
ϕ1 : ∀s, t : Emp (s[EID] = t[EID]∧s[salary] > t[salary]) → t ≺salary s
ϕ2 : ∀s, t : Emp (s[EID] = t[EID] ∧ s[status] = “married” ∧
t[status] = “single”) → t ≺LN s
ϕ3 : ∀s, t : Emp (s[EID] = t[EID]∧t ≺salary s) → t ≺address s
ϕ4 : ∀s, t : Dept (s[EID] = t[EID] ∧ t ≺mgrAddr s) → t ≺budget s
Here ϕ1 states that when Emp tuples s and t refer to the
same employee, if s[salary] > t[salary], then s is more current
than t in attribute salary. Note that ‘<’ denotes the builtin predicate “less-than” in the numeric domain of salary,
whereas ≺salary is the currency order for salary. Constraint
ϕ2 asserts that if s[status] is married and t[status] is single,
then s is more current than t in LN. Constraint ϕ3 states
that if s is more current than t in salary, then s is also more
current than t in address; similarly for ϕ4 .
✷
2. Data Currency
We introduce a model for specifying data currency. A
specification consists of (a) partial currency orders, (b) denial constraints, and (c) copy functions. We first present
these notions, and then study consistent completions of currency orders. Finally, we show how queries are answered on
current instances that are derived from these completions.
Data with partial currency orders. A relation schema
is specified as R = (EID, A1 , . . . , An ), where EID denotes entity id that identifies tuples pertaining to the same entity,
as introduced by Codd [10]. EID values can be obtained
using entity identification techniques (a.k.a. record linkage,
matching and data deduplication; see, e.g., [16]). A finite
instance D of R is referred to as a normal instance of R.
A temporal instance Dt of R is given as (D, ≺A1 , . . . , ≺An ),
where each ≺Ai is a strict partial order on D such that
for tuples t1 and t2 in D, t1 ≺Ai t2 implies t1 [EID] = t2 [EID].
We call ≺Ai the currency order for attribute Ai . Recall
that a strict partial order is irreflexive and transitive, and
therefore asymmetric. Intuitively, if t1 ≺Ai t2 , then t1 and
t2 refer to the same entity, and t2 contains a more current
Ai -value for that entity than t1 , i.e., t2 is more current than
t1 in attribute Ai . A currency order ≺Ai is empty when no
currency information is known for attribute Ai .
A completion of Dt is a temporal instance Dtc = (D, ≺cA1 ,
. . . , ≺cAn ) of R, such that for each i ∈ [1, n], (1) ≺Ai ⊆≺cAi ,
and (2) for all t1 , t2 ∈ D, t1 and t2 are comparable under ≺cAi
iff t1 [EID] = t2 [EID]. The latter condition implies that ≺cAi
induces a total order on tuples that refer to the same entity,
while tuples representing distinct entities are not comparable under ≺cAi . We call ≺cAi a completed currency order.
Copy functions. Consider two temporal instances D(t,1) =
(D1 , ≺A1 , . . . , ≺Ap ) and D(t,2) = (D2 , ≺B1 , . . . , ≺Bq ) of (possibly distinct) relation schemas R1 and R2 , respectively.
~ ⇐ R2 [B]
~ is a partial
A copy function ρ of signature R1 [A]
~
~ =
mapping from D1 to D2 , where A = (A1 , . . . , Al ) and B
(B1 , . . . , Bl ), denoting attributes in R1 and R2 , respectively.
Here ρ is required to satisfy the copying condition: for each
tuple t in D1 , if ρ(t) = s, then t[Ai ] = s[Bi ] for all i ∈ [1, l].
Intuitively, for tuples t ∈ D1 and s ∈ D2 , ρ(t) = s indicates
~ attributes of t have been imported
that the values of the A
~ attributes of tuple s in D2 . Here A
~ specifies a
from the B
list of correlated attributes that should be copied together.
The copy function ρ is called ≺-compatible (w.r.t. the currency orders found in D(t,1) and D(t,2) ) if for all t1 , t2 ∈ D1 ,
for each i ∈ [1, l], if ρ(t1 ) = s1 , ρ(t2 ) = s2 , t1 [EID] = t2 [EID] and
s1 [EID] = s2 [EID], then s1 ≺Bi s2 implies t1 ≺Ai t2 .
Intuitively, ≺-compatibility requires that copy functions
preserve currency orders. In other words, when attribute
values are imported from D2 to D1 the currency orders on
corresponding tuples defined in D(t,2) are inherited by D(t,1) .
Example 2.2: Consider relations Emp and Dept shown in
Fig. 1. A copy function ρ of signature Dept[mgrAddr] ⇐
Emp[address], depicted in Fig. 1 by arrows, is given as follows: ρ(t1 ) = s1 , ρ(t2 ) = s1 , ρ(t3 ) = s3 and ρ(t4 ) = s4 . That
is, the mgrAddr values of t1 and t2 have both been imported from s1 [address], while t3 [mgrAddr] and t4 [mgrAddr]
are copied from s3 [address] and s4 [address], respectively. The
function satisfies the copying condition, since t1 [mgrAddr] =
t2 [mgrAddr] = s1 [address], t3 [mgrAddr] = s3 [address], and
t4 [mgrAddr] = s4 [address].
Suppose that ≺A is empty for each attribute A in Emp or
Dept. Then copy function ρ is ≺-compatible w.r.t. these
temporal instances of Emp and Dept. In contrast, as-
Denial constraints. We use denial constraints [3, 7] to
specify additional currency information derived from the semantics of data, which enriches ≺Ai . A denial constraint ϕ
for R is a universally quantified FO sentence of the form:
^
(t1 [EID] = tj [EID]∧ψ) → tu ≺Ai tv ,
∀t1 , . . . , tk : R
j∈[1,k]
where u, v ∈ [1, k], each tj is a tuple variable denoting a tuple of R, and ψ is a conjunction of predicates of the form
(1) tj ≺Al th , i.e., th is more current than tj in attribute
Al ; (2) tj [Al ] = th [Al ] (resp. tj [Al ] 6= th [Al ]), i.e., tj [Al ] and
74
sume that partial currency orders s1 ≺address s3 on Emp and
t3 ≺mgrAddr t1 are given. Then ρ is not ≺-compatible. Indeed,
since s1 , s3 pertain to the same person Mary, and t1 , t3 to
the same department R&D, the relation s1 ≺address s3 should
carry over into t1 ≺mgrAddr t3 , as ρ(t1 ) = s1 and ρ(t3 ) = s3 .
Clearly, t3 ≺mgrAddr t1 and t1 ≺mgrAddr t3 are contradictory. ✷
Consistent completions of temporal orders. A specification S of data currency consists of (1) a collection of temporal instances D(t,i) of schema Ri for i ∈ [1, s], (2) a set Σi of
denial constraints imposed on each D(t,i) , and (3) a (possibly
empty) copy function ρ(i,j) that imports data from D(t,i) to
D(t,j) for i, j ∈ [1, s]. It specifies data values and entities (by
normal instances embedded in D(t,i) ), partial currency orders known for each relation (by D(t,i) ), additional currency
information derived from the semantics of the data (Σi ),
and data that has been copied from one source to another
(ρ(i,j) ). These D(t,i) ’s may denote different data sources,
i.e., they may not necessarily be in the same database.
A consistent completion Dc of S consists of temporal inc
stances D(t,i)
of Ri such that for all i, j ∈ [1, s],
D
Dt
Dtc
S
Dc
LST(Dc )
ρ̄
ρ̄e
Se
De
a normal instance of a relation schema R
a temporal instance of R with partial currency orders
a completion of partial currency orders in Dt
a specification of data currency
a consistent completion of a specification S
the current instance of Dc
a collection of copy functions in S
an extension of copy functions ρ̄
an extension of specification S by ρ̄e
an extension of temporal instances by ρ̄e
Table 1: A summary of notations
As another example, suppose that there is a copy function
ρ2 that imports budget attribute values of t1 and t3 from
the budget attributes of s′′1 and s′′3 in another source D2 ,
respectively, where s′′1 = t1 and s′′3 = t3 , but in D2 , s′′3 ≺budget
s′′1 . Then there is no consistent completion in this setting
either. Indeed, all completed currency orders of ≺budget in
Dept have to satisfy denial constraints ϕ1 , ϕ3 and ϕ4 , which
enforce t1 ≺budget t3 , but ρ2 is not ≺-compatible with this
currency order. This shows the interaction between denial
constraints and currency constraints of copy functions. ✷
Current instances. In a temporal instance Dt = (D, ≺A1 ,
. . . , ≺An ) of R, let E = {t[EID] | t ∈ D}, and for each entity
e ∈ E, let Ie = {t ∈ D | t[EID] = e}. That is, E contains all
EID values in D, and Ie is the set of tuples pertaining to the
entity whose EID is e.
In a completion Dtc of Dt , for each attribute A of R, the
current A value for entity e ∈ E is the value t[A], where t is
the greatest (i.e., most current) tuple in the totally ordered
set (Ie , ≺cA ). The current tuple for entity e ∈ E, denoted by
LST(e, Dtc ), is the tuple te such that for each attribute A of
R, te [A] is the current A value for entity e.
We use LST(Dtc ) to denote LST(e, Dtc ) | e ∈ E , referred
to as the current instance of Dtc . Observe that LST(Dtc ) is a
normal instance of R, carrying no currency
orders. For any
c
c
Dc ∈ Mod(S), we define LST(Dc ) = LST(D(t,i)
) | D(t,i)
∈
c
D , the set of all current instances.
c
1. D(t,i)
is a completion of D(t,i) ,
c
2. D(t,i)
|= Σi , and
3. ρ(i,j) is compatible w.r.t. the completed currency orc
c
ders found in D(t,i)
and D(t,j)
.
We use Mod(S) to denote the set of all consistent completions of S. We say that S is consistent if Mod(S) 6= ∅, i.e.,
there exists at least one consistent completion of S.
Intuitively, if D(t,i) = (Di , ≺A1 , . . . , ≺An ) is part of a specc
ification and D(t,i)
= (Di , ≺cA1 , . . . , ≺cAn ) is part of a consistent completion of that specification, then each ≺cAj extends
≺Aj to a completed currency order, and the completed orders satisfy the denial constraints Σi and the constraints imposed by copy functions. Observe that the copying condition
and ≺-compatibility impose constraints on consistent completions. This is particularly evident when a data source imports data from multiple sources, and when two data sources
copy from each other, directly or indirectly. In addition,
these constraints interact with denial constraints.
Example 2.4: Recall the completion
Dc0 of S0 from Exam
ple 2.3. Then LST(Dc0 ) = LST(Emp), LST(Dept) , where
LST(Emp) = {s3 , s4 , s5 }, and LST(Dept) = {t3 }. Note that
LST(Emp) and LST(Dept) are normal instances.
As another example, suppose that s4 and s5 refer to the
same person. Consider an extension of the currency orders
given in Dc0 by adding s4 ≺A s5 and s5 ≺B s4 , where A ranges
over FN, LN, address and status while B is salary. Then the
current tuple of this person is (Robert, Luth, 8 Drum St,
80k, married), in which the first four attributes are taken
from s5 while its salary attribute is taken from s4 .
✷
Example 2.3: Consider a specification S0 consisting of
Emp and Dept of Fig. 1, the denial constraints ϕ1 –ϕ4 given
in Example 2.1, and the copy ρ defined in Example 2.2.
Assume that no currency orders are known for Emp and
Dept initially. A consistent completion Dc0 of S0 defines (1)
s1 ≺A s2 ≺A s3 when A ranges over FN, LN, address, salary
and status for Emp tuples, and (2) t1 ≺B t2 ≺B t4 ≺B t3 when
B ranges over mgrFN, mgrLN, mgrAddr and budget for Dept
tuples (here we assume that dname is the EID attribute of
Dept). One can verify that Dc0 satisfies the denial constraints
and the constraints imposed by ρ, and hence, Dc0 ∈ Mod(S0 ).
Note that no currency order is defined between any of s1 , s2 ,
s3 and any of s4 , s5 , since they represent different entities.
Evaluating queries with current values. Consider a
query Q posed on normal instances of (R1 , . . . , Rl ), which
does not refer to currency orders, where Ri is in specification
S for i ∈ [1, l]. We say that a tuple t is a certain current
answer to Q w.r.t. S if t is in
\
Q LST(Dc ) .
Suppose that Dept also copies from a source D1 consisting of a single tuple s′1 , which is the same as s1 except that
s′1 [address] = “5 Elm Ave”. It uses a copy function ρ1 that
imports s′1 [address] to t1 [mrgAddr]. Then there exists no
consistent completion in this setting since t1 may not import
distinct values s′1 [address] and s1 [address] for t1 [mrgAddr]. In
other words, the constraints imposed by the copying conditions of ρ and ρ1 cannot be satisfied at the same time.
Dc ∈Mod(S)
That is, t is warranted to be an answer computed from the
current values no matter how the partial currency orders
in S are completed, as long as the denial constraints and
constraints imposed by the copy functions of S are satisfied.
Example 2.5: Recall queries Q1 , Q2 , Q3 and Q4 from Example 1.1, and specification S0 from Example 2.3. One can
75
c
Dc ∈ Mod(S0 ), if DEmp
is the completion of the Emp instance
c
c
in D , then LST(DEmp
) = {s3 , s4 , s5 }.
✷
verify that answers to the queries given in Example 1.1 are
certain current answers w.r.t. S0 , i.e., the answers remain
unchanged in LST(Dc ) for all Dc ∈ Mod(S0 ).
✷
Query answering. Given a query
Q, we want to know
whether a tuple t is in Q LST(Dc ) for all Dc ∈ Mod(S).
We summarize notations in Table 2, including those given
in this section and notations to be introduced in Section 5.
CCQA(LQ ): The certain current query answering problem.
INPUT:
A specification S, a tuple t and a query
3. Decision Problems for Data Currency
Q ∈ LQ .
We study four problems associated with data currency.
QUESTION: Is t a certain current answer to Q w.r.t. S?
The consistency of specifications. The first problem is
to decide whether a given specification S makes sense, i.e.,
whether there exists any consistent completion of S. As
shown in Example 2.3, there exist specifications S such that
Mod(S) is empty, because of the interaction between denial
constraints and copy functions, among other things.
We study CCQA(LQ ) when LQ ranges over the following
query languages (see, e.g., [1] for the details):
• CQ, the class of conjunctive queries built up from relation atoms and equality (=), by closing under conjunction ∧ and existential quantification ∃;
• UCQ, unions of conjunctive queries of the form Q1 ∪
· · ·∪Qk , where for each i ∈ [1, k], Qi is in CQ;
CPS:
The consistency problem for specifications.
INPUT:
A specification S of data currency.
QUESTION: Is Mod(S) nonempty?
• ∃FO+ , first-order logic (FO) queries built from atomic
formulas, by closing under ∧, disjunction ∨ and ∃; and
Certain currency orders. The next question studies
whether a given currency order is contained in all consistent
completions of a specification. Given two temporal instances
D(t,1) = (D, ≺A1 , . . . , ≺An ) and D(t,2) = (D, ≺′A1 , . . . , ≺′An ) of
the same schema R, we say that D(t,1) is contained in D(t,2) ,
denoted by D(t,1) ⊆ D(t,2) , if ≺Aj ⊆≺′Aj for all j ∈ [1, n].
Consider a specification S in which there is a temporal
instance Dt = (D, ≺A1 , . . . , ≺An ) of schema R. A currency
order for Dt is a temporal instance Ot = (D, ≺′A1 , . . . , ≺′An )
of R. Observe that Ot does not necessarily contain Dt .
• FO queries built from atomic formulas using ∧, ∨,
negation ¬, ∃ and universal quantification ∀.
While different query languages have no impact on the data
complexity of CCQA(LQ ), as will be seen soon, they do make
a difference when the combined complexity is concerned.
4. Reasoning about the Currency of Data
In this section we focus on CPS, COP, DCIP and CCQA.
We establish the data complexity and combined complexity
of these problems. For the data complexity, we fix denial
constraints and queries (for CCQA), and study the complexity in terms of varying size of data sources and copy functions. For the combined complexity we also allow denial
constraints and queries to vary (see, e.g., [1] for a detailed
discussion of data and combined complexity).
COP:
INPUT:
The certain ordering problem.
A specification S in which Dt is a temporal
instance, and a currency order Ot for Dt .
QUESTION: Is for all Dc ∈ Mod(S), Ot ⊆ Dtc ? Here Dtc
is the completion of Dt in Dc .
Example 3.1: Consider specification S0 of Example 2.3.
We want to know whether s1 ≺salary s3 is assured by every
completion Dc ∈ Mod(S0 ). To this end we construct a currency order Ot = (Emp, ≺FN , ≺LN , ≺address , ≺salary , ≺status ),
in which s1 ≺salary s3 is in ≺salary , but the partial orders for all
other attributes are empty. One can verify that Ot is indeed
a certain currency order, as assured by denial constraint ϕ1 .
Similarly, one can define a currency order Ot′ to check
whether t3 ≺mgrFN t4 is entailed by all Dc ∈ Mod(S0 ). One
can readily verify that it is not the case. Indeed, there exists
Dc1 ∈ Mod(S0 ), such that t4 ≺mgrFN t3 is given in Dc1 .
✷
The consistency of specifications. We start with CPS,
which is to decide, given a specification S consisting of partial currency orders, denial constraints and copy functions,
whether there exists any consistent completion in Mod(S).
The result below tells us the following. (1) The problem
is nontrivial: it is Σp2 -complete. It remains intractable when
denial constraints are fixed (data complexity). (2) Denial
constraints are a major factor that makes the problem hard.
Indeed, the complexity bounds are not affected even when
no copy functions are defined in S.
Certain current instances. Given a specification S of
data currency, one naturally wants to know whether every
consistent completion of S yields the same current instances.
We say that a specification S of data currency is deterministic for current instances if for all consistent completions
Dc1 , Dc2 ∈ Mod(S), LST(Dc1 ) = LST(Dc2 ). This definition naturally carries over to a particular relation schema R: specification S is said to be deterministic for current R instances if
for all consistent completions Dc1 , Dc2 ∈ Mod(S), the instance
of R in LST(Dc1 ) is equal to the instance of R in LST(Dc2 ).
Theorem 4.1: For CPS, (1) the combined complexity is
Σp2 -complete, and (2) the data complexity is NP-complete.
The upper bounds and lower bounds remain unchanged even
in the absence of copy functions.
✷
Proof sketch: (1) Lower bounds. For the combined complexity, we show that CPS is Σp2 -hard by reduction from the
∃∗ ∀∗ 3sat problem, which is Σp2 -complete (cf. [28]). Given a
sentence φ = ∃X∀Y ψ(X, Y ), we construct a specification S
consisting of a single temporal instance Dt of a binary relation schema and a set Γ of denial constraints, such that φ is
true iff Mod(S) 6= ∅. We use Dt to encode truth assignments
µX for X, and Γ to assure that µX satisfies ∀Y ψ(X, Y ) if
there exists a consistent completion of Dt . VHere ∀Y ψ(X, Y )
is encoded by leveraging the property ∀Y ( i∈[1,r] Ci (X, Y ))
V
V
= i∈[1,r] ∀Y Ci (X, Y ), for ψ(X, Y ) = i∈[1,r] Ci (X, Y ).
DCIP:
The deterministic current instance problem
INPUT:
A specification S.
QUESTION: Is S deterministic for current instances?
Example 3.2: The specification S0 of Example 2.3 is
deterministic for current Emp instances. Indeed, for all
76
For the data complexity, we show that CPS is NP-hard
by reduction from the Betweenness problem, which is NPcomplete (cf. [18]). Given two sets E and F = { (ei , ej , ek ) |
ei , ej , ek ∈ E }, the Betweenness problem is to decide whether
there is a bijection π : E → { 1, . . . , |E| } such that for each
(ei , ej , ek ) ∈ E, either π(ei ) < π(ej ) < π(ek ) or π(ek ) < π(ej ) <
π(ei ). Given E and F , we define a specification S with a
temporal instance Dt of a 4-ary schema, and a set of fixed
denial constraints. We show that there exists a solution to
the Betweenness problem iff Mod(S) is nonempty.
(2) Upper bounds. We provide an algorithm that, given a
specification S, guesses a completion Dc of total orders in
S, and then checks whether Dc ∈ Mod(S). The checking involves (a) denial constraints and (b) the copying condition
and ≺-compatibility of copy functions in S. Step (b) is in
PTIME. Step (a) is in PTIME if the denial constraints are
fixed, and it uses an NP-oracle otherwise. Hence CPS is in
p
NP for data complexity and in Σ2 for combined complexity.
In the proofs for the lower bounds, no copy functions are
defined, and the relation schemas are fixed.
✷
(a) For CQ, we verify it by reduction from the ∀∗ ∃∗ 3sat
problem, which is Πp2 -complete (cf. [28]). Given a sentence
φ = ∀X∃Y ψ(X, Y ), we define a CQ query Q, a fixed tuple t,
and a specification S consisting of five temporal instances of
fixed schemas. We use these temporal instances to encode
(i) disjunction and negation, which are not expressible in
CQ, (ii) truth assignments µX for X, with an instance DX ,
and (iii) relations for inspecting whether t is an answer to
Q. Query Q encodes ∃Y ψ(X, Y ) w.r.t. µX , such that φ is
true iff t is an answer to Q for each consistent completion of
DX , i.e., when µX ranges over all truth assignments for X.
(b) For FO, we show that CCQA is PSPACE-hard by reduction from Q3SAT, which is PSPACE-complete (cf. [28]).
Given an instance φ of Q3SAT, we define an FO query
Q, a fixed tuple t and a specification S with a single
temporal instance. Query Q encodes φ, and the relation
encodes Boolean values for which there is a single completion Dc0 in Mod(S). We show that φ is true iff t is in Q(Dc0 ).
For the data complexity, we show that CCQA is coNP-hard
even for CQ, by reduction from the complement of 3sat.
Given a propositional formula ψ, we define a fixed CQ query
Q, a fixed tuple t and a specification S consisting of two
temporal instances Dψ and D¬ψ of fixed relation schemas.
We use Dψ to encode (i) truth assignments µX for variables
X in ψ, and (ii) literals in ψ. We encode the negations of
clauses in ψ using D¬ψ , for which there is a unique consistent
completion. For each consistent completion of Dψ , i.e., each
µX for X, query Q returns t iff ψ is not satisfied by µX .
In the lower bound proofs, neither denial constraints nor
copy functions are defined, and all the schemas are fixed.
Upper bounds. We develop an algorithm that, given a query
Q, a tuple t and a specification S, returns
“no” if there exists
Dc ∈ Mod(S) such that t 6∈ Q LST(Dc ) . The algorithm first
c
guesses Dc , and then
checks whether (a) D ∈ Mod(S) and
c
(b) t 6∈ Q LST(D ) . Step (b) is in coNP when Q is in ∃FO+ ,
and is in PSPACE if Q is in FO. When Q is fixed, step (b) is
in PTIME no matter whether Q is in CQ or FO. Putting these
together with Theorem 4.1 (for step (a)), we conclude that
the data complexity of CCQA is in coNP, and its combined
✷
complexity is in Πp2 for ∃FO+ and in PSPACE for FO.
The certainty of currency orders. We next study COP
and DCIP. The certain currency ordering problem COP is to
determine, given a specification S and a currency order Ot ,
whether each t ≺A s in Ot is entailed by the partial currency
orders, denial constraints and copy functions in S. The deterministic current instance problem DCIP is to decide, given
S, whether the current instance of each temporal instance
of S is unchanged for all consistent completions of S. These
problems are, unfortunately, also beyond reach in practice.
Corollary 4.2: For both COP and DCIP, (1) the combined
complexity is Πp2 -complete, and (2) the data complexity is
coNP-complete. The complexity bounds remain unchanged
when no copy functions are present.
✷
Proof sketch: (1) Lower bounds. For both COP and DCIP,
the lower bounds are verified by reduction from the complement of CPS, for data complexity and combined complexity.
(2) Upper bounds. A non-deterministic algorithm is developed for each of COP and DCIP, which is in coNP (data
complexity) and Πp2 (combined complexity).
✷
Special cases. The results above tell us that it is nontrivial
to reason about data currency. In light of this, we look into
special cases of these problems with lower complexity. As
shown by Theorem 4.1 and Corollary 4.2, denial constraints
make the analyses of CPS, COP and DCIP intricate. Indeed,
these problems are intractable even when denial constraints
are fixed. Hence we consider specifications with no denial
constraints, but containing partial currency orders and copy
functions. The result below shows that the absence of denial
constraints indeed simplifies the analyses.
Query answering. The certain currency query answering
problem CCQA(LQ ) is to determine, given a tuple t, a spec
ification S and a query Q ∈ LQ , whether t ∈ Q LST(Dc ) for
c
all D ∈ Mod(S). The result below provides the data complexity of the problem, as well as its combined complexity
when LQ ranges over CQ, UCQ, ∃FO+ and FO. It tells us the
following. (1) Disjunctions in UCQ and ∃FO+ do not incur
extra complexity to CCQA. Indeed, CCQA has the same
complexity for CQ as for UCQ and ∃FO+ . (2) In contrast,
the presence of negation in FO complicates the analysis. (3)
Copy functions have no impact on the complexity bounds.
Theorem 4.4: In the absence of denial constraints, CPS,
COP and DCIP are in PTIME.
✷
Theorem 4.3: The combined complexity of CCQA(LQ ) is
• Πp2 -complete when LQ is CQ, UCQ or ∃FO+ , and
• PSPACE-complete when LQ is FO.
The data complexity is coNP-complete when LQ ∈ {CQ,
UCQ,∃FO+ , FO}. These complexity bounds are unchanged
in the absence of copy functions.
✷
Proof sketch: For CPS, we develop an algorithm that,
given a specification S with no denial constraints defined,
checks whether Mod(S) 6= ∅. Let ρ̄ denote the collection of
copy functions in S. One can verify that in the absence of
denial constraints, S is consistent iff there exists no violation of the copying condition or ≺-compatibility of ρ̄ in any
temporal instance of S. Hence it suffices to detect violations in the instances of S, rather than in their completions.
Proof sketch: Lower bounds. For the combined complexity, we show the following: (a) CCQA is already Πp2 -hard for
CQ, and (b) it is PSPACE-hard for FO.
77
is coNP-hard (data complexity) and Πp2 -hard (combined
complexity) by reduction from the complement of CPS, for
which the complexity is established in Theorem 4.1. Given
a specification S, we construct a specification S′ , a tuple t
and an identity query Q, such that Mod(S) is empty iff t is
in Q LST(Dc ) for each Dc ∈ Mod(S′ ). The upper bounds
of CCQA is this setting follow from Theorem 4.3.
✷
As shown in Example 2.2, it is not straightforward to check
≺-compatibility, especially when tuples are imported indirectly from other sources. Nonetheless, we show that this
can be done in O(|S|2 ) time, where |S| is the size of S.
For COP, we show that given a specification S without
denial constraints, to decide whether a currency order is
contained in all completions of S, it suffices to check temporal instances of S. This can be decided by a variation of the
algorithm for CPS, also in PTIME; similarly for DCIP.
✷
5. Currency Preservation in Data Copying
As we have seen earlier, copy functions tell us what data
values in a relation have been imported from other data
sources. Naturally we want to leverage the imported values
to improve query answers. This gives rise to the following
questions: do the copy functions import sufficient current
data values for answering a query Q? If not, how do we
extend the copy functions such that Q can be answered with
more up-to-date data values? To answer these questions we
introduce a notion of currency-preserving copy functions.
We consider a specification S of data currency consisting
of two collections of temporal instances (data sources) D =
(D1 , . . . , Dp ) and D′ = (D1′ , . . . , Dq′ ), with (1) a set Σi (resp.
Σ′j ) of denial constraints on Di for each i ∈ [1, p] (resp. Dj′
for j ∈ [1, q]), and (2) a collection ρ of copy functions ρ(j,i)
that imports tuples from Dj′ to Di , for i ∈ [1, p] and j ∈ [1, q],
i.e., all the functions of ρ import data from D′ to D.
In contrast, the absence of denial constraints does not
make our lives easier when it comes to CCQA. Indeed, in
the proof of Theorem 4.3, the lower bounds of CCQA are
verified using neither denial constraints nor copy functions.
Corollary 4.5: In the absence of denial constraints,
CCQA(LQ ) remains coNP-hard (data complexity) and
• Πp2 -hard (combined complexity) even for CQ, and
• PSPACE-hard (combined complexity) for FO.
✷
Theorem 4.3 tells us that the complexity of CCQA for
CQ is rather robust: adding disjunctions does not increase
the complexity. We next investigate the impact of removing
Cartesian product from CQ on the complexity of CCQA. We
consider SP queries, which are CQ queries of the form
Q(~x) = ∃e ~
y R(e, ~x, ~
y) ∧ ψ ,
where ψ is a conjunction of equality atoms and ~x and ~
y are
disjoint sequences of variables in which no variable appears
twice. SP queries support projection and selection only. For
instance, Q1 – Q4 of Example 1.1 are SP queries. SP queries
in which ψ is a tautology are referred to as identity queries.
We show that for SP queries, denial constraints make a
difference. Without denial constraints, CCQA is in PTIME
for SP queries. In contrast, when denial constraints are imposed, CCQA is no easier for identity queries than for ∃FO+ .
Extensions.
To formalize currency preservation, we
first present the following notions. Assume that Di =
(D, ≺A1 , . . . , ≺An ) and Dj′ = (D′ , ≺B1 , . . . , ≺Bm ) are temporal instances of relation schemas Ri = (EID, A1 , . . . , An ) and
Rj′ = (EID, B1 , . . . , Bm ), respectively. Assume that n ≤ m.
An extension of Di is a temporal instance Die = (De , ≺eA1
, . . . , ≺eAn ) of Ri such that (1) D ⊆ De , (2) ≺Ah ⊆≺eAh for
all h ∈ [1, n], and (3) πEID (De ) = πEID (D). Intuitively, Die extends Di by adding new tuples for those entities that are
already in Di . It does not introduce new entities.
Consider two copy functions: ρ(j,i) imports tuples from
Dj′ to Di , and ρe(j,i) from Dj′ to Die , both of signature
~ ⇐ Rj′ [B],
~ where A
~ = (A1 , . . . , An ) and B
~ is a sequence
Ri [A]
of n attributes in Rj′ . We say that ρe(j,i) extends ρ(j,i) if
1. Die is an extension of Di ;
Corollary 4.6: For SP queries, CCQA(SP) is
• in PTIME in the absence of denial constraints, and
• coNP-complete (data complexity) and Πp2 -complete
(combined complexity) in the presence of denial constraints, even for identity queries.
✷
Proof sketch: (1) In the absence of denial constraints, one
canTverify that for any specification
S and each SP query
Q, Dc ∈Mod(S) Q LST(Dc ) can be obtained from (1) eval
uating Q on a representation repr(S) of LST(Dc ) | Dc ∈
Mod(S) , where repr(S) compactly represents all possible
latest values by special
symbols; and (2) by removing all tu
ples in Q repr(S) that contain such special symbols. Capitalizing on this property, we develop an algorithm that takes
as input an SP query Q, a tuple t and a specification S without denial constraints. It checks whether t is a certain current answer to Q w.r.t. S as follows: (a) check whether S
is consistent, and return “no” if not; (b) compute the representation repr(S); (c)
check whether t is in the “stripped”
version of Q repr(S) ; and (d) return “yes” if so, and “no”
otherwise. Step (c) is PTIME for SP queries. By Theorem 4.4 and by leveraging the certain order computed by
the algorithm for CPS, steps (a) and (b) are also in PTIME.
Therefore, CCQA is in PTIME in this setting, for both combined complexity (when queries are not fixed) and data complexity.
(2) In the presence of denial constraints, we show that CCQA
2. for each tuple t in Di , if ρ(j,i) (t) is defined, then so is
ρe(j,i) (t) and moreover, ρe(j,i) (t) = ρ(j,i) (t);
3. for each tuple t in Die \Di , there exists a tuple s in Dj′
such that ρe(j,i) (t) = s.
We refer to Die as the extension of Di by ρe(j,i) .
Observe that Die is not allowed to expand arbitrarily. (a)
Each new tuple t in Die is copied from a tuple s in Dj′ . (b)
No new entity is introduced. Note that only copy functions
that cover all attributes but EID of Ri can be extended. This
assures that all the attributes of a new tuple are in place.
An extension ρe of ρ is a collection of copy functions ρe(j,i)
such that ρe 6= ρ and moreover, for all i ∈ [1, p] and j ∈ [1, q],
either ρe(j,i) is an extension of ρ(j,i) , or ρe(j,i) = ρ(j,i) .
We denote the set of all extensions of ρ as Ext(ρ).
For each ρe in Ext(ρ), we denote as Se the extension of S
by ρe , which consists of the same D′ and denial constraints
as in S, but has copy functions ρe and De = (D1e , . . . , Dpe ),
where Die is an extension of Di by ρe(j,i) for all j ∈ [1, q].
Currency preservation. We are now ready to define currency preservation. Consider a collection ρ of copy functions
78
s′1 :
s′2 :
s′3 :
FN
Mary
Mary
Mary
LN
Dupont
Dupont
Smith
address
6 Main St
6 Main St
2 Small St
salary
60k
80k
80k
status
married
married
divorced
phone
6671975
6671975
2962845
Q. The next problem is to decide whether ρ in S can be
extended to be currency preserving for Q.
ECP(LQ ):
INPUT:
The existence problem.
A query Q in LQ , and a consistent specification S with non-currency-preserving ρ
QUESTION: Does there exist ρe in Ext(ρ) that is currency preserving for Q?
Figure 2: Relation Mgr
in a specification S. We say that ρ is currency preserving
for a query Q w.r.t. S if (a) Mod(S) 6= ∅, and moreover, (b)
for all ρe ∈ Ext(ρ) such that Mod(Se ) 6= ∅, we have that
\
\
Q LST(Dce ) .
Q LST(Dc ) =
Dc ∈Mod(S)
Bounded extension. We also want to know whether it
suffices to extend ρ by copying additional data of a bounded
size, and make it currency preserving.
e
Dc
e ∈Mod(S )
Intuitively, ρ is currency preserving if (1) ρ is meaningful;
and (2) for each extension ρe of ρ that makes sense, the
certain current answers to Q are not improved by ρe , i.e., no
matter what additional tuples are imported for those entities
in D, the certain current answers to Q remain unchanged.
BCP(LQ ):
INPUT:
The bounded copying problem.
S, ρ and Q as in ECP, and a positive number k.
QUESTION: Does there exist ρe ∈ Ext(ρ) such that ρe is
currency preserving for Q and |ρe | ≤ k+|ρ|?
Example 5.1: As shown in Fig. 2, relation Mgr collects
manager records. Consider a specification S1 consisting of
the following: (a) temporal instances Mgr of Fig. 2 and Emp
of Fig. 1, in which partial currency orders are ∅ for all attributes; (b) denial constraints ϕ1 –ϕ3 of Example 2.1 and
ϕ5 : ∀s, t : Mgr (s[EID] = t[EID] ∧ s[status] = “divorced” ∧
t[status] = “married”) → t ≺LN s ,
i.e., if s[status] is divorced and t[status] is married, then s is
more current than t in LN; and (c) a copy function ρ with
~ ⇐ Mgr[A],
~ where A
~ is (FN, LN, address,
signature Emp[A]
salary, status), such that ρ(s3 ) = s′2 , i.e., s3 of Emp is copied
from s′2 of Mgr. Obviously S1 is consistent.
Recall query Q2 of Example 1.1, which is to find Mary’s
current last name. For Q2 , ρ is not currency preserving.
Indeed, there is an extension ρ1 of ρ by copying s′3 to Emp.
In all consistent completions of the extension Emp1 of Emp
by ρ1 , the answer to Q2 is Smith. However, the answer
to Q2 in all consistent completions of Emp is Dupont (see
Examples 1.1 and 2.5). In contrast, ρ1 is currency preserving
for Q2 . Indeed, copying more tuples from Mgr (i.e., tuple
s′1 ) to Emp does not change the answer to Q2 in Emp1 . ✷
6. Deciding Currency Preservation
We next study the decision problems in connection
with currency-preserving copy functions, namely, CPP(LQ ),
MCP(LQ ), ECP(LQ ) and BCP(LQ ) when LQ is CQ, UCQ,
∃FO+ or FO. We provide their combined complexity and data
complexity, and identify special cases with lower complexity.
Checking currency preservation. We first investigate
CPP(LQ ), the problem of deciding whether a collection of
copy functions in a given specification is currency preserving
for a query Q. We show that CPP is nontrivial. Indeed, its
combined complexity is already Πp3 -hard when Q is in CQ,
and it is PSPACE-complete when Q is in FO.
One might be tempted to think that fixing denial constraints would make our lives easier. Indeed, in practice
denial constraints are often predefined and fixed, and only
data, copy functions and query vary. Moreover, as shown in
Theorem 4.1 for the consistency problem, fixing denial constraints indeed helps there. Unfortunately, it does not simplify the analysis of the combined complexity when it comes
to CPP. Even when both query and denial constraints are
fixed, the problem is Πp2 -complete (data complexity).
Deciding currency preservation. There are several decision problems associated with currency-preserving copy
functions, which we shall investigate in the rest of the paper.
The first problem is to decide whether given copy functions
have imported necessary current data for answering a query.
Theorem 6.1: For CPP(LQ ), the combined complexity is
• Πp3 -complete when LQ is CQ, UCQ or ∃FO+ , and
• PSPACE-complete when LQ is FO.
CPP(LQ ):
INPUT:
The currency preservation problem.
A query Q in LQ , and a specification S of
data currency with copy functions ρ.
QUESTION: Is ρ currency preserving for Q?
• Its data complexity is Πp2 -complete when LQ ∈ {CQ,
UCQ, ∃FO+ ,FO}.
The combined complexity bounds remain unchanged when
denial constraints and copy functions are fixed.
✷
Minimal copying. Moreover, we want to know whether a
currency preserving ρ does not copy unnecessary or redundant data. To this end, we use |ρ| to denote the sum of the
sizes of data values copied by ρ. We say that ρ is minimal
for Q if there exists no collection ρ′ of copy functions such
that (1) ρ ∈ Ext(ρ′ ), (2) |ρ′ | < |ρ|, and (3) ρ′ is currency preserving for Q. That is, there exists no currency-preserving
ρ′ that imports less data from D′ to D than ρ.
Proof sketch: (1) Lower bounds. For the combined complexity, it suffices to show that CPP is already Πp3 -hard for
CQ, while for FO, it is PSPACE-hard. For the data complexity, we show that CPP is Πp2 -hard with CQ queries.
The Πp3 lower bound is verified by reduction from the
∗ ∗ ∗
∀ ∃ ∀ 3sat problem, which is Πp3 -complete (cf. [28]). Given
a sentence φ = ∀X∃Y ∀Zψ(X, Y, Z), we construct a query Q
in CQ, and a specification S with (i) two data sources D
and D′ , (ii) a single copy function ρ and (iii) a set of denial
constraints. Source D consists of four relations encoding
Boolean values, disjunction, conjunction and negation, as
well as a relation Db for testing certain current query answers. Source D′ has a single relation Db′ from which some
tuples are copied to Db by ρ. Query Q encodes φ by leverag-
MCP(LQ ):
INPUT:
The minimal copying problem.
S and Q as in CPP, with a collection ρ of
currency-preserving copy functions in S.
QUESTION: Is ρ minimal for Q?
Extending copy functions. Consider a consistent specification S in which ρ is not currency preserving for a query
79
ing the relations in D for Boolean operations, and the property of ∀Zψ(X, Y, Z) explored in the proof of Theorem 4.1.
We show that φ is true iff ρ is currency preserving for Q.
When it comes to FO, we show that CPP is PSPACE-hard
by a straightforward reduction from Q3SAT.
In these proofs, both denial constraints and copy functions
are fixed, independent of ∀∗ ∃∗ ∀∗ 3sat or Q3SAT instances.
For the data complexity, we verify the Πp2 lower bound
for CQ by reduction from the ∀∗ ∃∗ 3sat problem. Given a
sentence φ = ∀X∃Y ψ(X, Y ), we define a query Q in CQ and a
specification S such that φ is true iff ρ is currency preserving
for Q w.r.t. S. Here Q is fixed, i.e., it is independent of φ.
(2) Upper bounds. We give an algorithm that takes a query
Q and a specification S as input, and checks whether the
copy functions ρ̄ in S are currency preserving for Q. It
invokes oracles to check whether S is consistent and whether
some guessed tuples are certain current answers to Q in Dc ∈
Mod(S) but are not answers to Q in some Dce ∈ Mod(Se ),
where Se is an extension of S by extending copy functions.
The oracles are in Πp2 or Σp2 when Q is in ∃FO+ , and in
PSPACE when Q is in FO. When Q is fixed, the oracles are
in NP or coNP. From these the upper bounds follow.
✷
ification S in which copy functions ρ̄ are not currency preserving for Q, whether we can extend ρ̄ to preserve currency.
The good news is that the answer to this question is affirmative: we can always extend ρ̄ and make them currency
preserving for Q. Hence the decision problem ECP is in O(1)
time, although it may take much longer time to explicitly
construct a currency preserving extension of ρ̄.
Proposition 6.3: ECP(LQ ) is decidable in O(1) time for
both the combined complexity and data complexity, when
LQ is CQ, UCQ, ∃FO+ or FO.
✷
Proof sketch: To show the existence of currency preserving extensions of ρ, we give an algorithm to construct one.
Let D′ be the data source in S from which tuples can be
copied. For all copy function ρ in ρ̄, if ρ can be extended,
the algorithm expands ρ by copying tuples one by one from
D′ , starting from the most current ones, as long as the
extended instances satisfy the denial constraints in S and
the constraints of the copy functions. It proceeds until no
more tuples from D′ can be copied while satisfying the constraints, and yields a currency-preserving extension of ρ̄. ✷
Bounded extensions. In contrast to ECP, when it comes
to deciding whether ρ̄ can be made currency-preserving by
copying data within a bounded size, the analysis becomes
far more intricate. Indeed, the result below tells us that
even for CQ, BCP is Σp4 -hard, and fixing denial constraints
and copy functions does not help. When both queries and
denial constraints are fixed, BCP is Σp3 -complete.
Minimal copying. We next study MCP(LQ ), which is
to decide whether currency-preserving copy functions import any data that is unnecessary or redundant for a query
Q. This problem is even harder than CPP: for CQ queries,
p
its combined complexity is ∆p4 -complete (P Σ3 ) even if denial constraints are fixed, and its data complexity is ∆p3 complete. For FO queries, it is still PSPACE-complete.
Theorem 6.4: For BCP(LQ ), the combined complexity is
• Σp4 -complete when LQ is CQ, UCQ or ∃FO+ , and
Theorem 6.2: For MCP(LQ ), the combined complexity is
• ∆p4 -complete when LQ is CQ, UCQ or ∃FO+ , and
• PSPACE-complete when LQ is FO.
• Its data complexity is ∆p3 -complete when LQ ∈ {CQ,
UCQ, ∃FO+ ,FO}.
The combined complexity bounds remain unchanged when
denial constraints and copy functions are fixed.
✷
• PSPACE-complete when LQ is FO.
• Its data complexity is Σp3 -complete when LQ ∈ {CQ,
UCQ, ∃FO+ ,FO}.
The combined complexity bounds remain unchanged when
denial constraints and copy functions are fixed.
✷
Proof sketch: (1) Lower bounds. For FO, we show that
BCP is PSPACE-hard by a simple reduction from Q3SAT.
For CQ, we show that BCP is Σp4 -hard by reduction from
the ∃∗ ∀∗ ∃∗ ∀∗ 3sat problem, which is Σp4 -complete (cf. [28]).
Given a sentence φ = ∃W ∀X∃Y ∀Zψ(W, X, Y, Z), we define
a CQ query Q, a positive number k and a specification S
with two copy functions ρ̄. We show that φ is true iff there
exists an extension ρ̄e of ρ̄ such that |ρe | ≤ |ρ|+k and ρ̄e is
currency preserving for Q. We use temporal instances in S
to encode disjunction and negation, which Q leverages to
express φ. We also define a data source in S whose current
instance ranges over all truth assignments for W , and use
the data source to inspect possible extensions of ρ̄.
For the data complexity, we verify the Σp3 lower bound for
CQ by reduction from the ∃∗ ∀∗ ∃∗ 3sat problem. This reduction is more involved than the one developed for the combined complexity. Given an ∃∗ ∀∗ ∃∗ sentence φ, we define a
query Q in CQ that is independent of φ but nonetheless, is
able to encode the negations of the clauses in φ, by making
use of temporal instances in a specification constructed.
In all these proofs, denial constraints and copy functions
are independent of input sentences, i.e., they are fixed.
(2) Upper bounds. We develop an algorithm for BCP, which
first guesses an extension ρ̄e of ρ̄ that copies more data of a
Proof sketch: (1) Lower bounds. For FO, the proofs are
similar to their counterparts of CPS. For CQ queries, we
show that MCP is ∆p4 -hard (combined complexity) by reduction from the msa(∃∗ ∀∗ ∃∗ 3sat) problem, which is ∆p4 complete [26]. The latter problem is to determine, given a
satisfiable sentence φ = ∃X∀Y ∃Z ψ(X, Y, Z), whether in the
lexicographically maximum truth assignment µX for variables in X such that ∀Y ∃Z ψ(µX (x1 ), . . . , µX (xn ), Y, Z) evaluates to true, the last variable xn ∈ X has value 1, i.e.,
whether µX (xn ) = 1. For the data complexity, we verify
that it is ∆p3 -hard for CQ by reduction from msa(∃∗ ∀∗ 3sat)problem. These reductions need to encode the lexicographical successor function for variables in X, and are more involved than those given in the proof of Theorem 6.1.
(2) Upper bounds. We provide an algorithm that takes as
input a query Q and a specification S with currency preserving copy functions ρ̄. For each tuple t copied by ρ̄, it checks
whether removing t makes the copy functions not currency
preserving, by using the algorithm for CPP in the proof of
p
Theorem 6.1 as an oracle. Hence MCP is in ∆p4 (P Σ3 , comp
bined complexity) and in ∆3 (data complexity).
✷
The feasibility of currency preservation. We now consider ECP(LQ ). It is to decide, given a query Q and a spec-
80
bounded size, and then checks whether ρ̄e is currency preserving, by invoking the algorithm for CPP. From Theorem 6.1 the upper bounds for BCP follow immediately. ✷
(2) Lower bounds. We show that when k is fixed, BCP is
already ∆p4 -hard (combined complexity) for CQ queries, by
reduction from the msa(∃∗ ∀∗ ∃∗ 3sat) problem. In addition,
we verify that it is ∆p3 -hard (data complexity) for CQ queries
by reduction from the msa(∃∗ ∀∗ 3sat) problem. For FO
queries, we show that it remains PSPACE-hard (combined
complexity) by reduction from Q3SAT.
✷
Special cases. Theorem 6.1, 6.2 and 6.4 motivate us to
explore special cases that simplify the analyses. In contrast
to Theorem 4.1 and Corollary 4.2, we have seen that fixing denial constraints does not make our lives easier when
it comes to CPP, MCP or BCP. However, when denial constraints are absent, these problems become tractable for SP
queries. This is consistent with Corollary 4.6.
7. Conclusions
We have proposed a model to specify the currency of data
in the absence of reliable timestamps but in the presence of
copy relationships. We have also introduced a notion of currency preservation to assess copy functions for query answering. We have identified eight fundamental problems associated with data currency and currency preservation (CPS,
COP, DCIP, CCQA(LQ ), CPP(LQ ), MCP(LQ ), ECP(LQ )
and BCP(LQ )). We have provided a complete picture of the
lower and upper bounds of these problems, all matching, for
their data complexity as well as combined complexity when
LQ ranges over a variety of query languages. These results
are not only of theoretical interest in their own right, but
may also help practitioners distinguish current values from
stale data, answer queries with current data, and design
proper copy functions to import data from external sources.
The main complexity results are summarized in Tables 2
and 3, annotated with their corresponding theorems.
The study of data currency is still preliminary. An open
issue concerns generalizations of copy functions. To simplify
the presentation we assume a single copy function from one
relation to another. Nonetheless we believe that all the results remain intact when multiple such functions coexist.
For currency-preserving copy functions, we assume that the
signatures “cover” all attributes (except EID) of the importing relation. It is nontrivial to relax this requirement, however, since otherwise unknown values need to be introduced
for attributes whose value is not provided by the extended
copy functions. A second issue is about practical use of
the study. As shown in Tables 2 and 3, most of the problems are intractable. To this end we expect to (a) identify practical PTIME cases in various applications, (b) develop efficient heuristic algorithms with certain performance
guarantees, and (c) conduct incremental analysis when data
or copy functions are updated, which is expected to result
in a lower complexity than its batch counterpart when the
area affected by the updates is small, as commonly found
in practice. A third issue concerns the interaction between
data consistency and data currency. There is an intimate
connection between these two central issues of data quality. Indeed, identifying the current value of an entity helps
resolve data inconsistencies, and conversely, repairing data
helps remove obsolete data. While these processes should
logically be unified, we are not aware of any previous work
on this topic. Finally, it is interesting to develop syntactic
characterizations of currency-preserving copy functions.
Theorem 6.5: When denial constraints are absent, for SP
queries both the combined complexity and the data complexity are in PTIME for CPP, MCP and BCP (when the
bound k on the size of additional data copied is fixed). ✷
Proof sketch: (1) CPP. We develop a PTIME algorithm
A that takes as input a specification S and an SP query
Q defined on instances of a relation schema R. It checks
whether the copy functions ρ̄ in S are currency preserving
for Q. The main idea behind A is as follows. Let E be the
set of all entities in Dt , where Dt is the temporal instance
of R in S. Let C be the set of certain current answers to
Q. By Corollary 4.6, C can be computed in PTIME. For
each e ∈ E, A inspects tuples t in D′ one by one, to find
whether it would “spoil” some answer s ∈ C produced by the
current tuple of e if t were imported, i.e., adding t would
make LST(e, Dc ) either empty or a distinct tuple that does
not produce s via Q. Here D′ is the data source in S from
which tuples are copied. When denial constraints are absent
and Q is in SP, this can be done in PTIME. The algorithm
then checks spoilers for all e ∈ E, to find whether there exists
s ∈ C spoiled by importing tuples for all entities that yield
s, or whether some spoilers introduce a new certain current
answer. This can be done in PTIME since the entities in E
are independent of each other. The algorithm returns “yes”
iff no such spoilers exist. It is in PTIME as argued above.
(2) MCP. Given A, an algorithm for MCP is immediate: for
each “reduced” ρ̄c that removes one imported tuple from ρ̄,
it checks whether ρ̄c is currency preserving for Q by using
A. The algorithm is obviously in PTIME.
(3) BCP. When the bound k is fixed, there are polynomially
many extensions ρ̄e of ρ̄ such that |ρe | ≤ |ρ|+k. For each
such ρ̄e we check whether ρ̄e is currency preserving for Q by
invoking A. This can be done in PTIME.
✷
For BCP(LQ ), when the bound k is predefined and fixed,
the analysis also becomes simpler when LQ is CQ, UCQ or
∃FO+ . For FO, however, BCP remains PSPACE-complete.
Corollary 6.6: When k is fixed, BCP(LQ ) is
• ∆p4 -complete (combined) for CQ, UCQ and ∃FO+ ,
• ∆p3 -complete (data) for CQ, UCQ and ∃FO+ , but is
✷
• PSPACE-complete for FO (combined complexity).
Proof sketch: (1) Upper bounds. As remarked earlier,
when k is fixed there are polynomially many extensions
ρ̄e of ρ̄ such that |ρe | ≤ |ρ|+k. For each such ρ̄e we check
whether ρ̄e is currency preserving, by invoking the algorithm
for checking CPP given in the proof of Theorem 6.1. Therep
fore, BCP is in P Σ3 = ∆p4 for ∃FO+ (combined complexity).
Moreover, when queries and denial constraints are fixed, it is
p
in P Σ3 = ∆p3 (data complexity). For FO, the CPP checking
is in PSPACE, and hence so is the upper bound for BCP.
Acknowledgments. Fan and Geerts are supported in part
by an IBM scalable data analytics for a smarter planet innovation award, and the RSE-NSFC Joint Project Scheme.
8. References
[1] S. Abiteboul, R. Hull, and V. Vianu. Foundations of
Databases. Addison-Wesley, 1995.
81
Data complexity
Combined complexity
Special case
Combined and data complexity
CPS
COP
DCIP
NP-complete (Th 4.1) coNP-complete (Cor 4.2) coNP-complete (Cor 4.2)
Σp2 -complete (Th 4.1)
Πp2 -complete (Cor 4.2)
Πp2 -complete (Cor 4.2)
In the absence of denial constraints
PTIME (Th 4.4)
PTIME (Th 4.4)
PTIME (Th 4.4)
Table 2: Complexity of problems for reasoning about data currency (CPS, COP, DCIP)
Complexity
Data
Combined (LQ )
CQ, UCQ, ∃FO+
FO
Special case
Combined & data
CCQA(LQ )
coNP-complete (Th 4.3)
Πp2 -complete (Th 4.3)
PSPACEcomplete (Th 4.3)
PTIME (Cor 4.6)
CPP(LQ )
Πp2 -complete (Th 6.1)
MCP(LQ )
∆p2 -complete (Th 6.2)
ECP(LQ )
O(1) (Prop 6.3)
BCP(LQ )
Σp3 -complete (Th 6.4)
Πp3 -complete (Th 6.1) ∆p3 -complete (Th 6.2) O(1) (Prop 6.3)
PSPACEPSPACEO(1) (Prop 6.3)
complete (Th 6.1)
complete (Th 6.2)
SP queries in the absence of denial constraints
PTIME (Th 6.5)
PTIME (Th 6.5)
O(1) (Prop 6.3)
Σp4 -complete (Th 6.4)
PSPACEcomplete (Th 6.4)
PTIME (Th 6.5)
Table 3: Complexity of problems for query answering and for determining currency preservation
[18] M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H.
Freeman and Company, 1979.
[19] G. Grahne. The Problem of Incomplete Information in
Relational Databases. Springer, 1991.
[20] M. Grohe and G. Schwandtner. The complexity of datalog on linear orders. Logical Methods in Computer Science, 5(1), 2009.
[21] T. Imieliński and W. Lipski, Jr. Incomplete information
in relational databases. JACM, 31(4), 1984.
[22] Knowledge Integrity. Two sides to data decay. DM Review, 2003.
[23] P. G. Kolaitis. Schema mappings, data exchange, and
metadata management. In PODS, 2005.
[24] M. Koubarakis. Database models for infinite and indefinite temporal information. Inf. Syst., 19(2):141–173,
1994.
[25] M. Koubarakis. The complexity of query evaluation in
indefinite temporal constraint databases. TCS, 171(12):25–60, 1997.
[26] M. W. Krentel. Generalizations of Opt P to the polynomial hierarchy. TCS, 97(2):183–198, 1992.
[27] M. Lenzerini. Data integration: A theoretical perspective. In PODS, 2002.
[28] C. H. Papadimitriou. Computational Complexity.
Addison-Wesley, 1994.
[29] E. Schwalb and L. Vila. Temporal constraints: A survey. Constraints, 3(2-3):129–149, 1998.
[30] R. T. Snodgrass. Developing Time-Oriented Database
Applications in SQL. Morgan Kaufmann, 1999.
[31] R. van der Meyden. The complexity of querying indefinite data about linearly ordered domains. JCSS, 54(1),
1997.
[32] R. van der Meyden. Logical approaches to incomplete
information: A survey. In J. Chomicki and G. Saake,
editors, Logics for Databases and Information Systems.
Kluwer, 1998.
[33] V. Vianu. Dynamic functional dependencies and
database aging. J. ACM, 34(1):28–59, 1987.
[34] H. Zhang, Y. Diao, and N. Immerman. Recognizing patterns in streams with imprecise timestamps. In VLDB,
2010.
[2] L. Berti-Equille, A. D. Sarma, X. Dong, A. Marian,
and D. Srivastava. Sailing the information ocean with
awareness of currents: Discovery and application of
source dependence. In CIDR, 2009.
[3] L. Bertossi. Consistent query answering in databases.
SIGMOD Rec., 35(2), 2006.
[4] M. Bodirsky and J. Kára. The complexity of temporal
constraint satisfaction problems. JACM, 57(2), 2010.
[5] P. Buneman, J. Cheney, W. Tan, and S. Vansummeren.
Curated databases. In PODS, 2008.
[6] J. Cheney, L. Chiticariu, and W. C. Tan. Provenance
in databases: Why, how, and where. Foundations and
Trends in Databases, 1(4):379–474, 2009.
[7] J. Chomicki. Consistent query answering: Five easy
pieces. In ICDT, 2007.
[8] J. Chomicki and D. Toman. Time in database systems.
In M. Fisher, D. Gabbay, and L. Vila, editors, Handbook of Temporal Reasoning in Artificial Intelligence.
Elsevier, 2005.
[9] J. Clifford, C. E. Dyreson, T. Isakowitz, C. S. Jensen,
and R. T. Snodgrass. On the semantics of “now” in
databases. TODS, 22(2):171–214, 1997.
[10] E. F. Codd. Extending the database relational model
to capture more meaning. TODS, 4(4):397–434, 1979.
[11] A. Deutsch, A. Nash, and J. B. Remmel. The chase
revisited. In PODS, 2008.
[12] X. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava.
Global detection of complex copying relationships between sources. In VLDB, 2010.
[13] X. Dong, L. Berti-Equille, and D. Srivastava. Truth discovery and copying detection in a dynamic world. In
VLDB, 2009.
[14] C. E. Dyreson, C. S. Jensen, and R. T. Snodgrass. Now
in temporal databases. In L. Liu and M. T. Özsu, editors, Encyclopedia of Database Systems. Springer, 2009.
[15] W. W. Eckerson. Data quality and the bottom line:
Achieving business success through a commitment to
high quality data. Data Warehousing Institute, 2002.
[16] A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios.
Duplicate record detection: A survey. TKDE, 19(1):1–
16, 2007.
[17] W. Fan, F. Geerts, J. Li, and M. Xiong. Discovering
conditional functional dependencies. TKDE, 23(4):683–
698, 2011.
82