Query Processing in Distributed Database
Query Processing in Distributed Database
Query decomposition: takes query expressed on global relations and performs partial optimization using centralized QO techniques. Output is some form of RAT based on global relations. Data localization: takes into account how data has been distributed. Replace global relations at leaves of RAT with their reconstruction algorithms.
Global optimization: uses statistical information to find a near-optimal execution plan. Output is execution strategy based on fragments with communication primitives added. Local optimization: Each local DBMS performs its own local optimization using centralized QO techniques.
Data Localization
In QP, represent query as R.A.T. and, using transformation rules, restructure tree into equivalent form that improves processing. In DQP, need to consider data distribution. Replace global relations at leaves of tree with their reconstruction algorithms - RA operations that reconstruct global relations from fragments:
For horizontal fragmentation, reconstruction algorithm is Union; For vertical fragmentation, it is Join.
Data Localization
Then use reduction techniques to generate simpler and optimized query. Consider reduction techniques for following types of fragmentation:
Primary horizontal fragmentation. Vertical fragmentation. Derived fragmentation.
If selection predicate contradicts definition of fragment, this produces empty intermediate relation and operations can be eliminated. For join, commute join with union. Then examine each individual join to determine whether there are any useless joins that can be eliminated from result. A useless join exists if fragment predicates do not overlap.
P1:
P2: P3:
B1:
B2:
branchNo=B003 (Branch)
branchNo!=B003 (Branch)
Reduction for vertical fragmentation involves removing those vertical fragments that have no attributes in common with projection attributes, except the key of the relation.
SELECT fName, lName FROM Staff; S1: S2: staffNo, position, sex, DOB, salary(Staff) staffNo, fName, lName, branchNo (Staff)
Use transformation rule that allows join and union to be commuted. Using knowledge that fragmentation for one relation is based on the other and, in commuting, some of the partial joins should be redundant.
Global Optimization
Objective of this layer is to take the reduced query plan for the data localization layer and find a near-optimal execution strategy. In distributed environment, speed of network has to be considered when comparing strategies. If know topology is that of WAN, could ignore all costs other than network costs. LAN typically much faster than WAN, but still slower than disk access.
Oracle does not support type of fragmentation discussed previously, although DBA can distribute data to achieve similar effect. Thus, fragmentation transparency is not supported although location transparency is. Discuss:
connectivity global database names and database links transactions referential integrity heterogeneous distributed databases Distributed QO.
Database Links
Used to build distributed databases. Defines a communication path from one Oracle database to another (possibly non-Oracle) database. Acts as a type of remote login to remote database.
CREATE PUBLIC DATABASE LINK RENTALS.GLASGOW.NORTH.COM; SELECT * FROM Staff@RENTALS.GLASGOW.NORTH.COM; UPDATE Staff@RENTALS.GLASGOW.NORTH.COM SET salary = salary*1.05;
Here one of the local DBMSs is not Oracle. Oracle Heterogeneous Services and a nonOracle system-specific agent can hide distribution and heterogeneity. Can be accessed through:
transparent gateways generic connectivity.
Transparent Gateways
Generic Connectivity
A distributed query is decomposed by the local Oracle DBMS into a number of remote queries, which are sent to remote DBMS for execution. Remote DBMSs execute queries and send results back to local node. Local node then performs any necessary postprocessing and returns results to user. Only necessary data from remote tables are extracted, thereby reducing amount of data that needs to be transferred.