Adina Institute of Science & Technology: Department of Computer Science & Engg. M.Tech CSE-II Sem Lab Manuals MCSE - 203
Adina Institute of Science & Technology: Department of Computer Science & Engg. M.Tech CSE-II Sem Lab Manuals MCSE - 203
Adina Institute of Science & Technology: Department of Computer Science & Engg. M.Tech CSE-II Sem Lab Manuals MCSE - 203
8 To Study of Web
database,Multimediadatabase,Spatialdatabas.
This is the highest level ER model in that it contains the least granular detail but establishes
the overall scope of what is to be included within the model set. The conceptual ER model
normally defines master reference data entities that are commonly used by the organization.
Developing an enterprise-wide conceptual ER model is useful to support documenting the data
architecture for an organization.
Relational algebra received little attention outside of pure mathematics until the publication
of E.F. Codd'srelational model of data in 1970. Codd proposed such an algebra as a basis for
database query languages. (See section Implementations.)Five primitive operators of Codd's
algebra are the selection, the projection, the Cartesian product (also called the cross product or
cross join), the set union, and the set difference.
An algebra is a formal structure consisting of sets and operations on those sets. Relational
algebra is a formal system for manipulating relations.
For the set operations on relations, both operands must have the same scheme, and the result
has that same scheme.
• R1 U R2 (union) is the relation containing all tuples that appear in R1, R2, or both.
• R1 n R2 (intersection) is the relation containing all tuples that appear in both R1 and R2.
• R1 - R2 (set difference) is the relation containing all tuples of R1 that do not appear in R2.
Selection
Selects tuples from a relation whose attributes meet the selection criteria, which is normally
expressed as a predicate.
R2 = select(R1,P)
That is, from R1 we create a new relation R2 containing those tuples from R1 that satisfy
(make true) the predicate P.
A predicate is a boolean expression whose operators are the logical connectives (and, or, not)
and arithmetic comparisons (LT, LE, GT, GE, EQ, NE), and whose operands are either domain
names or domain constants.
select(Workstation,Room=633) =
R2 = project(R1,D1,D2,...Dn)
That is, from the tuples in R1 we create a new relation R2 containing only the domains
D1,D2,..Dn.
project(Server,Name,Status) =
Name Status
==============
diamond up
emerald up
graphite down
ruby up
frito up
project(select(User,Status=UG),Name,Status) =
Name S0tatus
==================
A. Cohn UG
J. Inka UG
R. Kemp UG
Join
R3 = join(R1,D1,R2,D2)
Given a domain from each relation, join considers all possible pairs of tuples from the two
relations, and if their values for the chosen domains are equal, it adds a tuple to the result
containing all the attributes of both tuples (discarding the duplicate domain D2).
Relational calculus
Relational calculus consists of two calculi, the tuple relational calculus and the domain
relational calculus, that are part of the relational model for databases and provide a
declarative way to specify database queries. This in contrast to the relational algebra which is
also part of the relational model but provides a more procedural way for specifying queries.
The relational algebra might suggest these steps to retrieve the phone numbers and names of
book stores that supply Some Sample Book:
• Create separate tables for sets of values that apply to multiple records.
• Relate these tables with a foreign key. Records should not depend on anything other
than a table's primary key (a compound key, if necessary). For example, consider a
customer's address in an accounting system. The address is needed by the Customers
table, but also by the Orders, Shipping, Invoices, Accounts Receivable, and Collections
tables. Instead of storing the customer's address as a separate entry in each of these
tables, store it in one place, either in the Customers table or in a separate Addresses
table.
Values in a record that are not part of that record's key do not belong in the table. In general,
any time the contents of a group of fields may apply to more than a single record in the table,
consider placing those fields in a separate table.
For example, in an Employee Recruitment table, a candidate's university name and address
may be included. But you need a complete list of universities for group mailings. If university
information is stored in the Candidates table, there is no way to list universities with no
current candidates. Create a separate Universities table and link it to the Candidates table with
a university code key.
Generally, the query optimizer cannot be accessed directly by users: once queries are
submitted to database server, and parsed by the parser, they are then passed to the query
optimizer where optimization occurs. However, some database engines allow guiding the
query optimizer with hints.
Implementation
Most query optimizers represent query plans as a tree of "plan nodes". A plan node
encapsulates a single operation that is required to execute the query. The nodes are arranged
as a tree, in which intermediate results flow from the bottom of the tree to the top. Each node
has zero or more child nodes—those are nodes whose output is fed as input to the parent
node. For example, a join node will have two child nodes, which represent the two join
operands, whereas a sort node would have a single child node (the input to be sorted). The
leaves of the tree are nodes which produce results by scanning the disk, for example by
performing an index scan or a sequential scan.
Join ordering
The performance of a query plan is determined largely by the order in which the tables are
joined. This algorithm works in two stages:
1. First, all ways to access each relation in the query are computed. Every relation in the
query can be accessed via a sequential scan. If there is an index on a relation that can be
used to answer a predicate in the query, an index scan can also be used. For each
relation, the optimizer records the cheapest way to scan the relation, as well as the
cheapest way to scan the relation that produces records in a particular sorted order.
2. The optimizer then considers combining each pair of relations for which a join condition
exists. For each pair, the optimizer will consider the available join algorithms
implemented by the DBMS. It will preserve the cheapest way to join each pair of
relations, in addition to the cheapest way to join each pair of relations that produces its
output according to a particular sort order.
3. Then all three-relation query plans are computed, by joining each two-relation plan
produced by the previous phase with the remaining relations in the query.
A SQL query to a modern relational DBMS does more than just selections and joins. In
particular, SQL queries often nest several layers of SPJ blocks (Select-Project-Join), by means of
group by, exists, and not exists operators. In some cases such nested SQL queries can be
flattened into a select-project-join query, but not always.
Cost estimation
One of the hardest problems in query optimization is to accurately estimate the costs of
alternative query plans. Optimizers cost query plans using a mathematical model of query
execution costs that relies heavily on estimates of the cardinality, or number of tuples, flowing
through each edge in a query plan. Cardinality estimation in turn depends on estimates of the
selection factor of predicates in the query. Traditionally, database systems estimate
selectivities through fairly detailed statistics on the distribution of values in each column, such
as histograms.
AIM: Study of Distributed databases.
A distributed database is a database in which storage devices are not all attached to a
common processing unit such as the CPU, controlled by a distributed database management
system (together sometimes called a distributed database system). It may be stored in
multiple computers, located in the same physical location; or may be dispersed over a network
of interconnected computers. Unlike parallel systems, in which the processors are tightly
coupled and constitute a single database system, a distributed database system consists of
loosely-coupled sites that share no physical components.System administrators can distribute
collections of data (e.g. in a database) across multiple physical locations. A distributed
database can reside on network servers on the Internet, on corporate intranets or extranets,
or on other company networks. Because they store data across multiple computers,
distributed databases can improve performance at end-user worksites by allowing
transactions to be processed on many machines, instead of being limited to one.
Architecture
A database user accesses the distributed database through:
Advantages
Disadvantages
• Complexity — DBAs may have to do extra work to ensure that the distributed nature of
the system is transparent. Extra work must also be done to maintain multiple disparate
systems, instead of one big one. Extra database design work must also be done to
account for the disconnected nature of the database — for example, joins become
prohibitively expensive when performed across multiple systems.
• Economics — increased complexity and a more extensive infrastructure means extra
labour costs
• Security — remote database fragments must be secured, and they are not centralized so
the remote sites must be secured as well. The infrastructure must also be secured (for
example, by encrypting the network links between remote sites).
• Difficult to maintain integrity — but in a distributed database, enforcing integrity over a
network may require too much of the network's resources to be feasible
• Inexperience — distributed databases are difficult to work with, and in such a young
field there is not much readily available experience in "proper" practice
]
AIM: Study of Web Databases databases,
multimedia, and spatial databases.
Web Databases
A web database is a database that can be queried and/or updated through the World Wide
Web (WWW). As web technologies are evolving, the WWW turned out to be the preferred
medium for many applications, e.g., e-commerce and digital libraries. These applications use
information that is stored in huge databases and can only be retrieved by issuing direct
queries to the back-end databases. Database-driven web sites have their own interfaces and
access forms that create HTML pages on-the-fly. Web database technologies define the way
that these forms can connect to and retrieve data from database servers.
Database Connectivity:
Querying the database via direct SQL is one of the most common ways to access data from a
database. The need for a standard way to query different databases arises as the number of
existing database servers is very huge and they differ in their query interfaces.
Multimedia database
Like the traditional databases, Multimedia databases should address the following
requirements:
Integration
Data items do not need to be duplicated for different programs invocations
Data independence
Separate the database and the management from the application programs
Concurrency control
Allows concurrent transactions
Persistence
Data objects can be saved and re-used by different transactions and program invocations
Privacy
Access and authorization control
Integrity control
Ensures database consistency between transactions
Recovery
Failures of transactions should not affect the persistent data storage
Query support
Allows easy querying of multimedia data
Application areas
Examples of multimedia database application areas:
Digital Libraries
News-on-Demand
Video-on-Demand
Music database
Geographic Information Systems (GIS)
Telemedicine
Spatial database
A spatial database is a database that is optimized to store and query data that represents
objects defined in a geometric space. Most spatial databases allow representing simple
geometric objects such as points, lines and polygons. Some spatial databases handle more
complex structures such as 3D objects, topological coverages, linear networks, and TINs..
These are typically called geometry or feature. The Open Geospatial Consortium created the
Simple Features specification and sets standards for adding spatial functionality to database
systems.
Features of spatial databases
Database systems use indexes to quickly look up values and the way that most databases index
data is not optimal for spatial queries. Instead, spatial databases use a spatial index to speed up
database operations.
In addition to typical SQL queries such as SELECT statements, spatial databases can perform a
wide variety of spatial operations. The following operations and many more are specified by the
Open Geospatial Consortium standard:
• Spatial Measurements: Computes line length, polygon area, the distance between
geometries, etc.
• Spatial Functions: Modify existing features to create new ones, for example by providing
a buffer around them, intersecting features, etc.
• Spatial Predicates: Allows true/false queries about spatial relationships between
geometries. Examples include "do two polygons overlap" or 'is there a residence located
within a mile of the area we are planning to build the landfill?' (see DE-9IM)
• Geometry Constructors: Creates new geometries, usually by specifying the vertices
(points or nodes) which define the shape.
AIM: Study of Object Oriented Databases.
Object oriented databases are also called Object Database Management Systems (ODBMS).
Object databases store objects rather than data such as integers, strings or real numbers.
Objects are used in object oriented languages such as Smalltalk, C++, Java, and others. Objects
basically consist of the following:
• Attributes - Attributes are data which defines the characteristics of an object. This data
may be simple such as integers, strings, and real numbers or it may be a reference to a
complex object.
• Methods - Methods define the behavior of an object and are what was formally called
procedures or functions.
data as objects gives an advantage to those companies that are geared towards multimedia
presentation or organizations that utilize computer-aided design (CAD).Some object-oriented
databases are designed to work well with object-oriented programming languages such as
Delphi, Ruby, Python, Perl, Java, C#, Visual Basic .NET, C++, Objective-C and Smalltalk; others
have their own programming languages. OODBMSs use exactly the same model as object-
oriented programming languages.
Technical features
Most object databases also offer some kind of query language, allowing objects to be found
using a declarative programming approach. It is in the area of object query languages, and the
integration of the query and navigational interfaces, that the biggest differences between
products are found. An attempt at standardization was made by the ODMG with the Object
Query Language, OQL.
Access to data can be faster because joins are often not needed (as in a tabular
implementation of a relational database). This is because an object can be retrieved directly
without a search, by following pointers.
Another area of variation between products is in the way that the schema of a database is
defined. A general characteristic, however, is that the programming language and the
database schema use the same type definitions.
Multimedia applications are facilitated because the class methods associated with the data are
responsible for its correct interpretation.
AIM: Study of Object Data Management Group and
OO languages.
The Object Data Management Group (ODMG) was conceived in the summer of 1991 at a
breakfast with object database vendors that was organized by Rick Cattell of Sun
Microsystems. In 1998, the ODMG changed its name from the Object Database Management
Group to reflect the expansion of its efforts to include specifications for both object database
and object-relational mappingproducts.The primary goal of the ODMG was to put forward a
set of specifications that allowed a developer to write portable applications for object
database and object-relational mapping products. In order to do that, the data schema,
programming language bindings, and data manipulation and query languages needed to be
portable.
1. Encapsulation. Functions that are internal to a class can be marked as “private”. This
means that they’re hidden from any code outside the class, so their implementation can
be changed without bothering any code that uses the class. Conversely, the methods
that are marked “public” form a well-defined interface that should not be changed
without due consideration, because client code relies on it.
2. Inheritance. You can derive one class from another, and the new class automatically
contains all of the methods and data of the original class. This is useful when some
subset of your objects needs an additional capability, but you don’t want to give that
capability to all of the other objects.
3. Polymorphism. Polly who? It’s a Greek-derived term that means “many forms”. In OOP,
it means that sending the same message (in most OO languages, this means calling a
method by name) may evoke different responses depending on type. inherited method
with its own implementation, so that sending the same message (calling the same
function) on two different objects yields a different behavior depending on their types.
A second type of polymorphism is called “parametric polymorphism”, which means that
a class provides different implementations for a method depending on the types of
parameters passed to it.
AIM: Study of Data mining and data warehousing.
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD),an
interdisciplinary subfield of computer science, is the computational process of discovering
patterns in large data sets involving methods at the intersection of artificial intelligence,
machine learning, statistics, and database systems.The overall goal of the data mining process
is to extract information from a data set and transform it into an understandable structure for
further use.Aside from the raw analysis step, it involves database and data management
aspects, data pre-processing, model and inference considerations, interestingness metrics,
complexity considerations, post-processing of discovered structures, visualization, and online
updating.
The term is a buzzword,and is frequently misused to mean any form of large-scale data or
information processing (collection, extraction, warehousing, analysis, and statistics) but is also
generalized to any kind of computer decision support system, including artificial intelligence,
machine learning, and business intelligence
Data warehouse
In computing, a data warehouse (DW, DWH), or an enterprise data warehouse (EDW), is a
system used for reporting and data analysis. Integrating data from one or more disparate
sources creates a central repository of data, a data warehouse (DW). Data warehouses store
current and historical data and are used for creating trending reports for senior management
reporting such as annual and quarterly comparisons.
Benefits of a data warehouse-A data warehouse maintains a copy of information from the source transaction
systems. This architectural complexity provides the opportunity to :
• Congregate data from multiple sources into a single database so a single query engine
can be used to present data.
• Mitigate the problem of database isolation level lock contention in transaction
processing systems caused by attempts to run large, long running, analysis queries in
transaction processing databases.
• Maintain data history, even if the source transaction systems do not.
• Integrate data from multiple source systems, enabling a central view across the
enterprise. This benefit is always valuable, but particularly so when the organization has
grown by merger.
• Improve data quality, by providing consistent codes and descriptions, flagging or even
fixing bad data.
• Present the organization's information consistently.
• Provide a single common data model for all data of interest regardless of the data's
source.
• Restructure the data so that it makes sense to the business users.