Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Adina Institute of Science & Technology: Department of Computer Science & Engg. M.Tech CSE-II Sem Lab Manuals MCSE - 203

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Adina Institute of Science & Technology

Department of Computer Science & Engg.


M.Tech CSE-II Sem
Lab Manuals
MCSE - 203
INDEX
List of Practicals:MCSE 203
Sr. Object Signature
No.
1 To Study Of ER model and Relational data
model.

2 To Study of relational algebra And


relational calculus.

3 To Study of Normalization and Normal form.

4 To Study of Query Processing.

5 To Study of Query Optimization.

6 To Study of Distributed Database .


7 To Study of Object oriented database.

8 To Study of Web
database,Multimediadatabase,Spatialdatabas.

9 To Study of Object data management group


and OO language.

10 To Study Of Data Mining and Data


Warehousing.
AIM: Study of ER model and Relational data model
In software engineering, an entity–relationship model (ER model) is a data model for
describing the data or information aspects of a business domain or its process requirements,
in an abstract way that lends itself to ultimately being implemented in a database such as a
relational database. The main components of ER models are entities (things) and the
relationships that can exist among them.Entity–relationship modeling was developed by Peter
Chen and published in a 1976 paper. However, variants of the idea existed previously, and
have been devised subsequently such as supertype and subtype data entities and
commonality relationships.Thethree schema approach to software engineering uses three
levels of ER models that may be developed.

Conceptual data model

This is the highest level ER model in that it contains the least granular detail but establishes
the overall scope of what is to be included within the model set. The conceptual ER model
normally defines master reference data entities that are commonly used by the organization.
Developing an enterprise-wide conceptual ER model is useful to support documenting the data
architecture for an organization.

Logical data model


A logical ER model does not require a conceptual ER model, especially if the scope of the
logical ER model includes only the development of a distinct information system. The
logical ER model contains more detail than the conceptual ER model. In addition to
master data entities, operational and transactional data entities are now defined. The
details of each data entity are developed and the entity relationships between these
data entities are established. The logical ER model is however developed independent of
technology into which it is implemented.
Physical data model
One or more physical ER models may be developed from each logical ER model. The
physical ER model is normally developed to be instantiated as a database. Therefore,
each physical ER model must contain enough detail to produce a database and each
physical ER model is technology dependent since each database management system is
somewhat different.
Relation Data Model
Relational data model is the primary data model, which is used widely around the world for
data storage and processing. This model is simple and have all the properties and capabilities
required to process data with storage efficiency.
AIM: Study of Relational algebra and Relational
calculus.
In Data Management, relational algebra describes how data is naturally organized into sets of
data, aptly so as data is the documentation of a real life person, place or thing and the events
or transactions between them at a point in time. Relational algebra, first described by E.F.
Codd while at IBM, is a family of algebra with a well-founded semantics used for modelling the
data stored in relational databases, and defining queries on it.

Relational algebra received little attention outside of pure mathematics until the publication
of E.F. Codd'srelational model of data in 1970. Codd proposed such an algebra as a basis for
database query languages. (See section Implementations.)Five primitive operators of Codd's
algebra are the selection, the projection, the Cartesian product (also called the cross product or
cross join), the set union, and the set difference.

An algebra is a formal structure consisting of sets and operations on those sets. Relational
algebra is a formal system for manipulating relations.

• Operands of this algebra are relations.


• Operations of this algebra include the usual set operations (since relations are sets of
tuples), and special operations defined for relations
o selection
o projection
o join

Set Operations on Relations

For the set operations on relations, both operands must have the same scheme, and the result
has that same scheme.

• R1 U R2 (union) is the relation containing all tuples that appear in R1, R2, or both.
• R1 n R2 (intersection) is the relation containing all tuples that appear in both R1 and R2.
• R1 - R2 (set difference) is the relation containing all tuples of R1 that do not appear in R2.

Selection

Selects tuples from a relation whose attributes meet the selection criteria, which is normally
expressed as a predicate.

R2 = select(R1,P)
That is, from R1 we create a new relation R2 containing those tuples from R1 that satisfy
(make true) the predicate P.

A predicate is a boolean expression whose operators are the logical connectives (and, or, not)
and arithmetic comparisons (LT, LE, GT, GE, EQ, NE), and whose operands are either domain
names or domain constants.

select(Workstation,Room=633) =

Name Room MemProc Monitor


====================================
coke 633 16384 SP4 color17
bass 633 8124 SP2 color19
bashful 633 8124 SP1 b/w

select(User,Status=UG and Idle<1:00) =

Login Name Status Idle Shell Sever


================================================
jli J. Inka UG 0:00 bsh UG
Projection

Chooses a subset of the columns in a relation, and discards the rest.

R2 = project(R1,D1,D2,...Dn)

That is, from the tuples in R1 we create a new relation R2 containing only the domains
D1,D2,..Dn.
project(Server,Name,Status) =

Name Status
==============
diamond up
emerald up
graphite down
ruby up
frito up
project(select(User,Status=UG),Name,Status) =
Name S0tatus
==================
A. Cohn UG
J. Inka UG
R. Kemp UG
Join

Combines attributes of two relations into one.

R3 = join(R1,D1,R2,D2)

Given a domain from each relation, join considers all possible pairs of tuples from the two
relations, and if their values for the chosen domains are equal, it adds a tuple to the result
containing all the attributes of both tuples (discarding the duplicate domain D2).

Relational calculus

Relational calculus consists of two calculi, the tuple relational calculus and the domain
relational calculus, that are part of the relational model for databases and provide a
declarative way to specify database queries. This in contrast to the relational algebra which is
also part of the relational model but provides a more procedural way for specifying queries.

The relational algebra might suggest these steps to retrieve the phone numbers and names of
book stores that supply Some Sample Book:

1. Join book stores and titles over the BookstoreID.


2. Restrict the result of that join to tuples for the book Some Sample Book.
3. Project the result of that restriction over StoreName and StorePhone.
AIM: Study of Normalization and Normal forms.
Normalization is the process of organizing data in a database. This includes creating tables and
establishing relationships between those tables according to rules designed both to protect
the data and to make the database more flexible by eliminating redundancy and inconsistent
dependency.
There are a few rules for database normalization. Each rule is called a "normal form." If the
first rule is observed, the database is said to be in "first normal form." If the first three rules
are observed, the database is considered to be in "third normal form." Although other levels of
normalization are possible, third normal form is considered the highest level necessary for
most applications.
As with many formal rules and specifications, real world scenarios do not always allow for
perfect compliance. In general, normalization requires additional tables and some customers
find this cumbersome. If you decide to violate one of the first three rules of normalization,
make sure that your application anticipates any problems that could occur, such as redundant
data and inconsistent dependencies.
The following descriptions include examples.

First Normal Form

• Eliminate repeating groups in individual tables.


• Create a separate table for each set of related data.
• Identify each set of related data with a primary key.

Second Normal Form

• Create separate tables for sets of values that apply to multiple records.
• Relate these tables with a foreign key. Records should not depend on anything other
than a table's primary key (a compound key, if necessary). For example, consider a
customer's address in an accounting system. The address is needed by the Customers
table, but also by the Orders, Shipping, Invoices, Accounts Receivable, and Collections
tables. Instead of storing the customer's address as a separate entry in each of these
tables, store it in one place, either in the Customers table or in a separate Addresses
table.

Third Normal Form

• Eliminate fields that do not depend on the key.

Values in a record that are not part of that record's key do not belong in the table. In general,
any time the contents of a group of fields may apply to more than a single record in the table,
consider placing those fields in a separate table.
For example, in an Employee Recruitment table, a candidate's university name and address
may be included. But you need a complete list of universities for group mailings. If university
information is stored in the Candidates table, there is no way to list universities with no
current candidates. Create a separate Universities table and link it to the Candidates table with
a university code key.

Other Normalization Forms


Fourth normal form, also called Boyce Codd Normal Form (BCNF), and fifth normal form do
exist, but are rarely considered in practical design. Disregarding these rules may result in less
than perfect database design, but should not affect functionality.
AIM: Study of Query Processing.
Query processing is a set of activities involving in getting the result of a query expressed in a
high-level language. These activities includes parsing the queries and translate them into
expressions that can be implemented at the physical level of the file system,optimizing the
query of internal form to get a suitable execution strategies for processing and then doing the
actual execution of queries to get the results.The cost of processing of query is dominated by
the disk access. For a given query, there are several possible strategies for processing exist,
especially when query is complex. The difference between a good strategies and a bad one
may be several order of magnitude. Therefore, it is worhwhile for the system to spend some
time on selecting a good strategies for processing query.

Parsing and Translating


Translate the query into its internal form (parse tree).
This is then translated into an expression of the relational algebra.
{Parser checks syntax, validates relations, attributes and access permissions
Evaluation
The query execution engine takes a physical query plan(aka execution plan), executes the plan,
and returns the result.
Optimization
Find the \cheapest" execution plan for a query
AIM: Study of Query optimization
Query optimization is a function of many relational database management systems. The query
optimizer attempts to determine the most efficient way to execute a given query by
considering the possible query plans.

Generally, the query optimizer cannot be accessed directly by users: once queries are
submitted to database server, and parsed by the parser, they are then passed to the query
optimizer where optimization occurs. However, some database engines allow guiding the
query optimizer with hints.

Implementation
Most query optimizers represent query plans as a tree of "plan nodes". A plan node
encapsulates a single operation that is required to execute the query. The nodes are arranged
as a tree, in which intermediate results flow from the bottom of the tree to the top. Each node
has zero or more child nodes—those are nodes whose output is fed as input to the parent
node. For example, a join node will have two child nodes, which represent the two join
operands, whereas a sort node would have a single child node (the input to be sorted). The
leaves of the tree are nodes which produce results by scanning the disk, for example by
performing an index scan or a sequential scan.
Join ordering

The performance of a query plan is determined largely by the order in which the tables are
joined. This algorithm works in two stages:

1. First, all ways to access each relation in the query are computed. Every relation in the
query can be accessed via a sequential scan. If there is an index on a relation that can be
used to answer a predicate in the query, an index scan can also be used. For each
relation, the optimizer records the cheapest way to scan the relation, as well as the
cheapest way to scan the relation that produces records in a particular sorted order.
2. The optimizer then considers combining each pair of relations for which a join condition
exists. For each pair, the optimizer will consider the available join algorithms
implemented by the DBMS. It will preserve the cheapest way to join each pair of
relations, in addition to the cheapest way to join each pair of relations that produces its
output according to a particular sort order.
3. Then all three-relation query plans are computed, by joining each two-relation plan
produced by the previous phase with the remaining relations in the query.

Query planning for nested SQL queries

A SQL query to a modern relational DBMS does more than just selections and joins. In
particular, SQL queries often nest several layers of SPJ blocks (Select-Project-Join), by means of
group by, exists, and not exists operators. In some cases such nested SQL queries can be
flattened into a select-project-join query, but not always.
Cost estimation
One of the hardest problems in query optimization is to accurately estimate the costs of
alternative query plans. Optimizers cost query plans using a mathematical model of query
execution costs that relies heavily on estimates of the cardinality, or number of tuples, flowing
through each edge in a query plan. Cardinality estimation in turn depends on estimates of the
selection factor of predicates in the query. Traditionally, database systems estimate
selectivities through fairly detailed statistics on the distribution of values in each column, such
as histograms.
AIM: Study of Distributed databases.
A distributed database is a database in which storage devices are not all attached to a
common processing unit such as the CPU, controlled by a distributed database management
system (together sometimes called a distributed database system). It may be stored in
multiple computers, located in the same physical location; or may be dispersed over a network
of interconnected computers. Unlike parallel systems, in which the processors are tightly
coupled and constitute a single database system, a distributed database system consists of
loosely-coupled sites that share no physical components.System administrators can distribute
collections of data (e.g. in a database) across multiple physical locations. A distributed
database can reside on network servers on the Internet, on corporate intranets or extranets,
or on other company networks. Because they store data across multiple computers,
distributed databases can improve performance at end-user worksites by allowing
transactions to be processed on many machines, instead of being limited to one.

Architecture
A database user accesses the distributed database through:

Local applications-applications which do not require data from other sites.

Global applications-applications which do require data from other sites.

Advantages

• Management of distributed data with different levels of transparency like network


transparency, fragmentation transparency, replication transparency, etc.
• Increase reliability and availability
• Easier expansion
• Reflects organizational structure — database fragments potentially stored within the
departments they relate to
• Local autonomy or site autonomy — a department can control the data about them (as
they are the ones familiar with it)

Disadvantages

• Complexity — DBAs may have to do extra work to ensure that the distributed nature of
the system is transparent. Extra work must also be done to maintain multiple disparate
systems, instead of one big one. Extra database design work must also be done to
account for the disconnected nature of the database — for example, joins become
prohibitively expensive when performed across multiple systems.
• Economics — increased complexity and a more extensive infrastructure means extra
labour costs
• Security — remote database fragments must be secured, and they are not centralized so
the remote sites must be secured as well. The infrastructure must also be secured (for
example, by encrypting the network links between remote sites).
• Difficult to maintain integrity — but in a distributed database, enforcing integrity over a
network may require too much of the network's resources to be feasible
• Inexperience — distributed databases are difficult to work with, and in such a young
field there is not much readily available experience in "proper" practice
]
AIM: Study of Web Databases databases,
multimedia, and spatial databases.
Web Databases

A web database is a database that can be queried and/or updated through the World Wide
Web (WWW). As web technologies are evolving, the WWW turned out to be the preferred
medium for many applications, e.g., e-commerce and digital libraries. These applications use
information that is stored in huge databases and can only be retrieved by issuing direct
queries to the back-end databases. Database-driven web sites have their own interfaces and
access forms that create HTML pages on-the-fly. Web database technologies define the way
that these forms can connect to and retrieve data from database servers.
Database Connectivity:

Querying the database via direct SQL is one of the most common ways to access data from a
database. The need for a standard way to query different databases arises as the number of
existing database servers is very huge and they differ in their query interfaces.

Multimedia database

A Multimedia database (MMDB) is a collection of related multimediadata.[1] The multimedia


data include one or more primary media data types such as text, images, graphicobjects
(including drawings, sketches and illustrations) animation sequences, audio and video.
A Multimedia Database Management System (MMDBMS) is a framework that manages
different types of data potentially represented in a wide diversity of formats on a wide array of
media sources. It provides support for multimedia data types, and facilitate for creation,
storage, access, query and control of a multimedia database.

Requirements of Multimedia databases

Like the traditional databases, Multimedia databases should address the following
requirements:
Integration
Data items do not need to be duplicated for different programs invocations
Data independence
Separate the database and the management from the application programs
Concurrency control
Allows concurrent transactions
Persistence
Data objects can be saved and re-used by different transactions and program invocations
Privacy
Access and authorization control
Integrity control
Ensures database consistency between transactions
Recovery
Failures of transactions should not affect the persistent data storage
Query support
Allows easy querying of multimedia data

Application areas
Examples of multimedia database application areas:
Digital Libraries
News-on-Demand
Video-on-Demand
Music database
Geographic Information Systems (GIS)
Telemedicine
Spatial database
A spatial database is a database that is optimized to store and query data that represents
objects defined in a geometric space. Most spatial databases allow representing simple
geometric objects such as points, lines and polygons. Some spatial databases handle more
complex structures such as 3D objects, topological coverages, linear networks, and TINs..
These are typically called geometry or feature. The Open Geospatial Consortium created the
Simple Features specification and sets standards for adding spatial functionality to database
systems.
Features of spatial databases
Database systems use indexes to quickly look up values and the way that most databases index
data is not optimal for spatial queries. Instead, spatial databases use a spatial index to speed up
database operations.

In addition to typical SQL queries such as SELECT statements, spatial databases can perform a
wide variety of spatial operations. The following operations and many more are specified by the
Open Geospatial Consortium standard:

• Spatial Measurements: Computes line length, polygon area, the distance between
geometries, etc.
• Spatial Functions: Modify existing features to create new ones, for example by providing
a buffer around them, intersecting features, etc.
• Spatial Predicates: Allows true/false queries about spatial relationships between
geometries. Examples include "do two polygons overlap" or 'is there a residence located
within a mile of the area we are planning to build the landfill?' (see DE-9IM)
• Geometry Constructors: Creates new geometries, usually by specifying the vertices
(points or nodes) which define the shape.
AIM: Study of Object Oriented Databases.
Object oriented databases are also called Object Database Management Systems (ODBMS).
Object databases store objects rather than data such as integers, strings or real numbers.
Objects are used in object oriented languages such as Smalltalk, C++, Java, and others. Objects
basically consist of the following:

• Attributes - Attributes are data which defines the characteristics of an object. This data
may be simple such as integers, strings, and real numbers or it may be a reference to a
complex object.
• Methods - Methods define the behavior of an object and are what was formally called
procedures or functions.

Object-oriented database management systems (OODBMSs) combine database capabilities


with object-oriented programming language capabilities. OODBMSs allow object-oriented
programmers to develop the product, store them as objects, and replicate or modify existing
objects to make new objects within the OODBMS. Because the database is integrated with the
programming language, the programmer can maintain consistency within one environment, in
that both the OODBMS and the programming language will use the same model of
representation. Relational DBMS projects, by way of contrast, maintain a clearer division
between the database model and the application.As the usage of web-based technology
increases with the implementation of Intranets and extranets, companies have a vested
interest in OODBMSs to display their complex data. Using a DBMS that has been specifically
designed to store

data as objects gives an advantage to those companies that are geared towards multimedia
presentation or organizations that utilize computer-aided design (CAD).Some object-oriented
databases are designed to work well with object-oriented programming languages such as
Delphi, Ruby, Python, Perl, Java, C#, Visual Basic .NET, C++, Objective-C and Smalltalk; others
have their own programming languages. OODBMSs use exactly the same model as object-
oriented programming languages.

Technical features
Most object databases also offer some kind of query language, allowing objects to be found
using a declarative programming approach. It is in the area of object query languages, and the
integration of the query and navigational interfaces, that the biggest differences between
products are found. An attempt at standardization was made by the ODMG with the Object
Query Language, OQL.
Access to data can be faster because joins are often not needed (as in a tabular
implementation of a relational database). This is because an object can be retrieved directly
without a search, by following pointers.

Another area of variation between products is in the way that the schema of a database is
defined. A general characteristic, however, is that the programming language and the
database schema use the same type definitions.

Multimedia applications are facilitated because the class methods associated with the data are
responsible for its correct interpretation.
AIM: Study of Object Data Management Group and
OO languages.
The Object Data Management Group (ODMG) was conceived in the summer of 1991 at a
breakfast with object database vendors that was organized by Rick Cattell of Sun
Microsystems. In 1998, the ODMG changed its name from the Object Database Management
Group to reflect the expansion of its efforts to include specifications for both object database
and object-relational mappingproducts.The primary goal of the ODMG was to put forward a
set of specifications that allowed a developer to write portable applications for object
database and object-relational mapping products. In order to do that, the data schema,
programming language bindings, and data manipulation and query languages needed to be
portable.

Major components of the ODMG


• Object Model. This was based on the Object Management Group's Object Model. The
OMG core model was designed to be a common denominator for object request
brokers, object database systems, object programming languages, etc. The ODMG
designed a profile by adding components to the OMG core object model.
• Object Specification Languages. The ODMG Object Definition Language (ODL) was used
to define the object types that conform to the ODMG Object Model. The ODMG Object
Interchange Format (OIF) was used to dump and load the current state to or from a file
or set of files.
• Object Query Language (OQL). The ODMG OQL was a declarative (nonprocedural)
language for query and updating. It used SQL as a basis, where possible, though OQL
supports more powerful object-oriented capabilities.
• C++ Language Binding. This defined a C++ binding of the ODMG ODL and a C++ Object
Manipulation Language (OML). The C++ ODL was expressed as a library that provides
classes and functions to implement the concepts defined in the ODMG Object Model.
The C++ OML syntax and semantics are those of standard C++ in the context of the
standard class library. The C++ binding also provided a mechanism to invoke OQL.
• Smalltalk Language Binding. This defined the mapping between the ODMG ODL and
Smalltalk, which was based on the OMG Smalltalk binding for the OMG Interface
Definition Language (IDL). The Smalltalk binding also provided a mechanism to invoke
OQL.
• Java Language Binding. This defined the binding between the ODMG ODL and the Java
programming language as defined by the Java 2 Platform. The Java binding also
provided a mechanism to invoke OQL.

ORIENTED LANGUAGE OBJECT :-Object-orientation can refer to a set of design principles, a


programming style, or features of programming languages that support that style. Continuing
from an earlier post on the history of programming languages, let’s next concentrate on the
purpose and history of the languages that support OOP. Programming was limited to talking
about the set of nouns provided by a language: numbers, characters, channels, etc. Of course,
programmers built more abstract structures around this limited set of nouns, but the code
that described those abstractions was much more complex than talking about them in English.
Object-oriented languages allow you to define types of objects (called classes) that are derived
from, or composed of, other types. In addition to this data component, the functions (also
called methods) that “belong” to the data are also grouped in the class. This has at least three
benefits:

1. Encapsulation. Functions that are internal to a class can be marked as “private”. This
means that they’re hidden from any code outside the class, so their implementation can
be changed without bothering any code that uses the class. Conversely, the methods
that are marked “public” form a well-defined interface that should not be changed
without due consideration, because client code relies on it.
2. Inheritance. You can derive one class from another, and the new class automatically
contains all of the methods and data of the original class. This is useful when some
subset of your objects needs an additional capability, but you don’t want to give that
capability to all of the other objects.
3. Polymorphism. Polly who? It’s a Greek-derived term that means “many forms”. In OOP,
it means that sending the same message (in most OO languages, this means calling a
method by name) may evoke different responses depending on type. inherited method
with its own implementation, so that sending the same message (calling the same
function) on two different objects yields a different behavior depending on their types.
A second type of polymorphism is called “parametric polymorphism”, which means that
a class provides different implementations for a method depending on the types of
parameters passed to it.
AIM: Study of Data mining and data warehousing.
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD),an
interdisciplinary subfield of computer science, is the computational process of discovering
patterns in large data sets involving methods at the intersection of artificial intelligence,
machine learning, statistics, and database systems.The overall goal of the data mining process
is to extract information from a data set and transform it into an understandable structure for
further use.Aside from the raw analysis step, it involves database and data management
aspects, data pre-processing, model and inference considerations, interestingness metrics,
complexity considerations, post-processing of discovered structures, visualization, and online
updating.

The term is a buzzword,and is frequently misused to mean any form of large-scale data or
information processing (collection, extraction, warehousing, analysis, and statistics) but is also
generalized to any kind of computer decision support system, including artificial intelligence,
machine learning, and business intelligence

Data warehouse
In computing, a data warehouse (DW, DWH), or an enterprise data warehouse (EDW), is a
system used for reporting and data analysis. Integrating data from one or more disparate
sources creates a central repository of data, a data warehouse (DW). Data warehouses store
current and historical data and are used for creating trending reports for senior management
reporting such as annual and quarterly comparisons.
Benefits of a data warehouse-A data warehouse maintains a copy of information from the source transaction
systems. This architectural complexity provides the opportunity to :

• Congregate data from multiple sources into a single database so a single query engine
can be used to present data.
• Mitigate the problem of database isolation level lock contention in transaction
processing systems caused by attempts to run large, long running, analysis queries in
transaction processing databases.
• Maintain data history, even if the source transaction systems do not.
• Integrate data from multiple source systems, enabling a central view across the
enterprise. This benefit is always valuable, but particularly so when the organization has
grown by merger.
• Improve data quality, by providing consistent codes and descriptions, flagging or even
fixing bad data.
• Present the organization's information consistently.
• Provide a single common data model for all data of interest regardless of the data's
source.
• Restructure the data so that it makes sense to the business users.

You might also like