Chapter Seven

CHAPTER SEVEN
7. Advanced Database Concepts

7.1. Integrity and Security
A database represents an essential corporate resource that should be properly secured using
appropriate controls. Database security encompasses hardware, software, people and data Multi-
user database system. DBMS must provide a database security and authorization subsystem to
enforce limits on individual and group access rights and privileges. Database security and integrity
is about protecting the database from being inconsistent and being disrupted. We can also call it
database misuse. Database misuse could be Intentional or Accidental, where accidental misuse is
easier to cope with than intentional misuse. Accidental inconsistency could occur due to:
 System crash during transaction processing  Anomalies due to redundancy
 Anomalies due to concurrent access  Logical errors
Likewise, even though there are various threats that could be categorized in this group, intentional
misuse could be:
 Unauthorized reading of data.
 Unauthorized modification of data.
 Unauthorized destruction of data.
Most systems implement good Database Integrity to protect the system from accidental misuse while
there are many computers based measures to protect the system from intentional misuse, which is
termed as Database Security measures.
Database security is considered in relation to the following situations:
 Theft and fraud  Loss of integrity
 Loss of confidentiality (secrecy)  Loss of availability
 Loss of privacy
Security Issues and general considerations:
 Legal, ethical and social issues regarding the right to access information.
 Physical control.
 Policy issues regarding privacy of individual level at enterprise and national level.
 Operational consideration on the techniques used (password, etc)
 System level security including operating system and hardware control.
 Security levels and security policies in an enterprise level.
1
Database security: the mechanisms that protect the database against intentional or accidental
threats. And Database security encompasses hardware, software, people and data.
Threat: any situation or event, whether intentional or accidental, that may adversely affect a system
and consequently the organization. A threat may be caused by a situation or event involving a person,
action, or circumstance that is likely to bring harm to an organization. The harm to an organization
may be:
 Tangible: loss of hardware, software, or data.
 Intangible: loss of credibility or client confidence.
Examples of threats:
 Using another persons’ means of access
 Unauthorized amendment/modification or copying of data.
 Inadequate policies and procedures that allow a mix of confidential and normal output.
 Failure of security mechanisms, giving greater access than normal.
 Fire (electrical fault, lightning strike, arson), flood, bomb.
 Data corruption owing to power loss or surge
 Illegal entry by hacker.  Viewing and disclosing unauthorized data.
 Program alteration.  Electronic interference and radiation
 Blackmail.  Physical damage to equipment
 Theft of data, programs, and equipment.  Breaking cables or disconnection of cables
 Staff shortages or strikes.  Introduction of viruses
 Inadequate staff training  Viewing and disclosing unauthorized data.
An organization needs to identify the types of threat it may be subjected to and initiate appropriate
plans and countermeasures, bearing in mind the costs of implementing them.
7.1.1. Countermeasures
The types of countermeasure to threats on computer systems range from physical controls to
administrative procedures. The following are computer-based security controls for a multi-user
environment:
1. Authorization
The granting of a right or privilege that enables a subject to have legitimate access to a system or a
system’s object. Authorization controls can be built into the software, and govern not only what
system or object a specified user can access, but also what the user may do with it. Authorization
controls are sometimes referred to as access controls. The process of authorization involves
2
authentication of subjects (i.e. a user or program) requesting access to objects (i.e. a database table,
view, procedure, trigger, or any other object that can be created within the system).
2. Authentication
All users of the database will have different access levels and permission for different data objects,
and authentication is the process of checking whether the user is the one with the privilege for the
access level.
 Is the process of checking the users are who they say they are?
 Each user is given a unique identifier, which is used by the operating system to determine
who they are.
 Thus the system will check whether the user with a specific username and password is trying
to use the resource.
 Associated with each identifier is a password, chosen by the user and known to the operation
system, which must be supplied to enable the operating system to authenticate who the user
claims to be.
3. Views
A view is the dynamic result of one or more relational operations on the base relations to produce
another relation. A view is a virtual relation that does not actually exist in the database, but is
produced upon request by a particular user. The view mechanism provides a powerful and flexible
security mechanism by hiding parts of the database from certain users. Using a view is more
restrictive than simply having certain privileges granted to a user on the base relation(s).
4. Backup and Recovery
Backup is the process of periodically taking a copy of the database and log file (and possibly
programs) on to offline storage media. A DBMS should provide backup facilities to assist with the
recovery of a database following failure.
Database recovery is the process of restoring the database to a correct state in the event of a failure.
Journaling is the process of keeping and maintaining a log file (or journal) of all changes made to
the database to enable recovery to be undertaken effectively in the event of a failure. The advantage
of journaling is that, in the event of a failure, the database can be recovered to its last known
consistent state using a backup copy of the database and the information contained in the log file. If
no journaling is enabled on a failed system, the only means of recovery is to restore the database
using the latest backup version of the database. However, without a log file, any changes made after
the last backup to the database will be lost.
3
5. Integrity
Integrity of information refers to protecting information from being modified by unauthorized
parties. Information only has value if it is correct. Information that has been tampered with could
prove costly. For example, if you were sending an online money transfer for $100, but the
information was tampered in such a way that you actually sent $10,000, it could prove to be very
costly for you.
Integrity refers to ensuring the authenticity of information that information is not altered, and that
the source of the information is genuine. Imagine that you have a website and you sell products on
that site. Now imagine that an attacker can shop on your web site and maliciously alter the prices of
your products, so that they can buy anything for whatever price they choose. That would be a failure
of integrity, because your information in this case, the price of a product has been altered and you
didn't authorize this alteration.
6. Encryption
The encoding of the data by a special algorithm that renders the data unreadable by any program
without the decryption key. If a database system holds particularly sensitive data, it may be deemed
necessary to encode it as a precaution against possible external threats or attempts to access it. The
DBMS can access data after decoding it, although there is a degradation in performance because of
the time taken to decode it. Encryption also protects data transmitted over communication lines. To
transmit data securely over insecure networks requires the use of a Cryptosystem.
Encryption is a method of protecting data from people you don’t want to see it. For example, when
you use your credit card on Amazon, your computer encrypts that information so that others can’t
steal your personal data as its being transferred. Similarly, if you have a file on your computer you
want to keep secret only for yourself, you can encrypt it so that no one can open that file without the
password. It’s great for everything from sending sensitive information to securing your email,
keeping your cloud storage safe, and even hiding your entire operating system.
7. RAID (Redundant Array of Independent Disks) Technology
The hardware that the DBMS is running on must be fault-tolerant, meaning that the DBMS should
continue to operate even if one of the hardware components fails. This suggests having redundant
components that can be seamlessly integrated into the working system whenever there are one or
more component failures. The main hardware components that should be fault-tolerant include disk
drives, disk controllers, CPU, power supplies, and cooling fans. Disk drives are the most vulnerable
components with the shortest times between failures of any of the hardware components. RAID
4
works on having a large disk array comprising an arrangement of several independent disks that are
organized to improve reliability and at same time increase performance. Performance is increased
through data striping. Data striping: the data is segmented into equal size partitions (the striping
unit) which are transparently distributed across multiple disks.
7.1.2. Levels of Security Measures
Security measures can be implemented at several levels and for different components of the system.
These levels are:
1. Physical Level: concerned with securing the site containing the computer system should be
physically secured. The backup systems should also be physically protected from access except
for authorized users.
2. Human Level: concerned with authorization of database users for access the content at different
levels and privileges.
3. Operating System: concerned with the weakness and strength of the operating system security
on data files. Weakness may serve as a means of unauthorized access to the database. This also
includes protection of data in primary and secondary memory from unauthorized access.
4. Database System: concerned with data access limit enforced by the database system. Access
limit like password, isolated transaction and etc.
Even though we can have different levels of security and authorization on data objects and users,
who access which data is a policy matter rather than technical. These policies are:
 Should be known by the system: should be encoded in the system.
 Should be remembered: should be saved somewhere (the catalogue).
Any database access request will have the following three major components:
1. Requested Operation: what kind of operation is requested by a specific query?
2. Requested Object: on which resource or data of the database is the operation sought to be
applied?
3. Requesting User: who is the user requesting the operation on the specified object?
The database should be able to check for all the three components before processing any request.
The checking is performed by the security subsystem of the DBMS.
7.1.3. Forms of User Authorization
There are different forms of user authorization on the resource of the database. These forms are
privileges on what operations are allowed on a specific data object. User authorization on the
data/extension:
5
1. Read Authorization: the user with this privilege is allowed only to read the content of the data
object.
2. Insert Authorization: the user with this privilege is allowed only to insert new records or items
to the data object.
3. Update Authorization: users with this privilege are allowed to modify content of attributes but
are not authorized to delete the records.
4. Delete Authorization: users with this privilege are only allowed to delete a record and not
anything else.
Different users, depending on the power of the user, can have one or the combination of the above
forms of authorization on different data objects.
7.1.4. Role of DBA in Database Security
The database administrator is responsible to make the database to be as secure as possible. For this
the DBA should have the most powerful privilege than every other user. The DBA provides
capability for database users while accessing the content of the database. The major responsibilities
of DBA in relation to authorization of users are:
1. Account Creation: involves creating different accounts for different users as well as user
groups.
2. Security Level Assignment: involves in assigning different users at different categories of
access levels.
3. Privilege Grant: involves giving different levels of privileges for different users and user
groups.
4. Privilege Revocation: involves denying or canceling previously granted privileges for users due
to various reasons.
5. Account Deletion: involves in deleting an existing account of users or user groups. It is similar
with denying all privileges of users on the database.
7.2. Data Integrity
Data integrity constraints refer to the accuracy and correctness of data in the database. Data integrity
provides a mechanism to maintain data consistency for operations like INSERT, UPDATE, and
DELETE. The different types of data integrity constraints are Entity, NULL, Domain, and
Referential integrity.
7.3. Data Security
6
Data security refers to the fact that only authorized users can access the data. Data security can be
enforced by passwords. If two separate users are accessing a particular data at the same time, the
DBMS must not allow them to make conflicting changes.
7.4. Client-Server systems
A computing system that is composed of two logical parts: a server, which provides services, and a
client, which requests them. The two parts can run on separate machines on a network, allowing
users to access powerful server resources from their personal computers. Client-server systems are
not limited to traditional computers. An example is an automated teller machine (ATM) network.
Customers typically use ATMs as clients to interface to a server that manages all of the accounts for
a bank. This server may in turn work with servers of other banks (such as when withdrawing money
at a bank at which the user does not have an account). The ATMs provide a user interface and the
servers provide services, such as checking on account balances and transferring money between
accounts.
7.5. Distributed Database Systems
A distributed database is a collection of data which are distributed over different computers of a
computer network. Distributed Database is not a centralized database.
7.5.1. Data Distribution Strategies

Distributed DB stores logically related data at several independent sites connected via network. Data
allocation is the process of deciding where to allocate/store particular data. There three data
allocation strategies:
1. Centralized: The entire DB is located at a single site.
2. Partitioned: The DB is split into several disjoint parts (called partitions, segments or fragments)
and stored at several sites.
3. Replicated: A copies of one or more partitions are stored at several sites.
7
In a distributed database system, the database is stored on several computers. The computers in a
distributed system communicate with each other through various communication media, such as
high speed buses or telephone line. A distributed database system consists of a collection of sites,
each of which maintains a local database system and also participates in global transaction where
different databases are integrated together. Local Transaction: transactions that access data only in
that single site. Global Transaction: transactions that access data in several sites.
7.6. How is data stored in DDBMS?
There are several ways of storing a single relation in distributed database systems:
1. Replication: System maintains multiple copies of similar data (identical data). Stored in
different sites, for faster retrieval and fault tolerance. Duplicate copies of the tables can be kept
on each system (replicated). With this option, updates to the tables can become involved (of
course the copies of the tables can be read-only).
 Advantage: Availability, Increased parallelism (if only reading)
 Disadvantage: increased overhead of update
2. Fragmentation: Relation is partitioned into several fragments stored in distinct sites. The
partitioning could be vertical, horizontal or both.
A. Horizontal Fragmentation: Systems can share the responsibility of storing information from a
single table with individual systems storing groups of rows. Performed by the Selection
Operation and the whole content of the relation is reconstructed using the UNION operation.
B. Vertical Fragmentation: Systems can share the responsibility of storing particular attributes of
a table. Needs attribute with tuple number. Performed by the Projection Operation. The whole
content of the relation is reconstructed using the Natural JOIN operation using the attribute with
Tuple number.
7.6.1. Homogeneous and Heterogeneous Distributed Databases
In a homogeneous distributed database:
 All sites have identical software (DBMS).
 Are aware of each other and agree to cooperate in processing user requests.
 Each site surrenders part of its autonomy in terms of right to change schemas or software
 Appears to user as a single system.
In a heterogeneous distributed database:
 Different sites may use different schemas and software (DBMS).
 Difference in schema is a major problem for query processing.
8
 Difference in software is a major problem for transaction processing.
 Sites may not be aware of each other and may provide only limited facilities for cooperation
in transaction processing.
Advantages of DDBMS
1. Many existing systems: Maybe you have no choice. Possibly there are many different existing
system, with possible different kinds of systems (Oracle, Informix, others) that need to be used
together.
2. Data sharing and distributed control: User at one site may be able access data that is available
at another site. Each site can retain some degree of control over local data. We will have local
as well as global database administrator
3. Reliability and availability of data: If one site fails, the rest can continue operation as long as
transaction does not demand data from the failed system and the data is not replicated in other
sites.
4. Speedup of query processing: If a query involves data from several sites, it may be possible to
split the query into sub-queries that can be executed at several sites which is parallel
processing. Query can be sent to least heavily loaded sites
5. Expansion: In a distributed environment you can easily expand by adding more machines to
the network.
6. Avoid single point failure: DDB avoid single point of failure, because the database is located
at different site and accessed in a distributed manner. In DDB when one database is fail the
other database start work by replacing the failed database function.
Disadvantages of DDBMS
1. Software Development Cost: Is difficult to install, thus is costly
2. Greater Potential for Bugs: Parallel processing may endanger correctness of algorithms
3. Increased Processing Overhead: Exchange of message between sites and due to
communication jargons.
4. Communication problems
5. Increased Complexity and Data Inconsistency Problems: Since clients can read and modify
closely related data stored in different database instances concurrently.
7.7. Data Warehousing and Data Mining
Data mining refers to extracting or “mining” knowledge from large amounts of data. Data mining is
the process of discovering interesting knowledge from large amounts of data stored in databases,
9
data warehouses, or other information repositories. There are many other terms carrying a similar or
slightly different meaning to data mining, such as Knowledge Mining from Databases, Knowledge
Extraction, Data/Pattern Analysis, Data Archaeology, and Data dredge. The data mining is
appropriately named as “Knowledge Mining.”
7.7.1. The major components Data Mining
Data Warehouse: This is one or a set of databases, spreadsheets, or other kinds of information
repositories. Data cleaning and Data integration techniques may be performed on the data.
Database: The database server is responsible for fetching the relevant data based on the user data-
mining request.
Knowledge Base: This can be used to guide the search, or evaluate the interestingness of the
resulting patterns. Such knowledge includes concept hierarchies, used to organize attributes or
attribute values into different levels of abstraction, knowledge such as user beliefs, which can be
used to assess the pattern interestingness based on its unexpectedness may also be included.
Data Mining Engine: This is essential to the data mining system and ideally consists of set of
functional modules for tasks such as characterization, association, classification, cluster analysis,
evaluation, and deviation analysis.
Pattern Evaluation Modules: This component typically employs interestingness measures and
interacts with the data mining modules so as to focus the search toward interesting patterns. It may
use interestingness thresholds to filter out discover patterns. Alternatively, this module may be
integrated with the mining module depending on the implementation of the data mining method
used.
Graphical User Interface: This module communicates between the user and the data mining
system, allowing the user to interact with the system by specifying the data mining query or task,
providing the information to help focus the search, and performing exploratory data mining based
on the intermediate data mining results.
10
Architecture of data mining
7.7.2. Data Mining Functionalities
Functionalities of data mining are used to specify the kind of patterns to be found in data mining
tasks. It can be classified into two categories such as Descriptive and Predictive. Descriptive
mining task characterize the general properties of data in the database, whereas predictive mining
task perform inference on the current data in order to make predictions. These functionalities are
classified as follows:
 Characterization and discrimination  Cluster analysis
 Association analysis  Outlier analysis
 Classification and prediction  Evolution analysis
7.7.3. Data Warehouse
Data Warehouse is an integrated, subject-oriented, time-variant, non-volatile database that provides
support for decision making. It is a database that stores information oriented to satisfy decision-
making requests. It is a database with some particular features concerning the data it contains and
its utilization.
 Integrated: centralized, consolidated database that integrates data derived from the entire
organization. Consolidates data from multiple and diverse sources with diverse formats. Helps
managers to better understand the company’s operations.
 Subject-Oriented: Data warehouse contains data organized by topics. Eg. Sales, Marketing,
Finance, Customer etc.
 Time variant: In contrast to the operational data that focus on current transactions, the
warehouse data represent the flow of data through time. Data warehouse contains data that
11
reflect what happened last week, last month, past five years, and so on. Snapshot of data in
the organization at different point in time.
 Nonvolatile: Once data enter the data warehouse, they are never changed. Because the data
in the warehouse represent the company’s entire history not operational data.
Data warehousing technology comprises a set of new concepts and tools which support the
knowledge worker like executive, manager, and analyst with information material for decision
making. The fundamental reason for building a data warehouse is to improve the quality of
information in the organization.
Characteristics of Data in Data Warehouse: Data in the Data Warehouse is integrated from
various, heterogeneous operational systems like database systems, flat files, etc. Before the
integration, structural and semantic differences have to be reconciled, i.e., data have to be
“homogenized” according to a uniform data model. Furthermore, data values from operational
systems have to be cleaned in order to get correct data into the data warehouse. Since a data
warehouse is used for decision making, it is important that the data in the warehouse be correct.
However, large volumes of data from multiple sources are involved; there is a high probability of
errors and anomalies in the data. The differences between database and data warehouse:
 The data found in data warehouse is analyzed to discover previously unknown data
characteristics, relationships, dependencies, or trends.
 Database and Data warehouse (DWH) both store data in form of tables, view, columns and SQL
is used to query the data.
 DB contains current data while DWH stores historical data over a long time horizon.
 Data is normalized in DB so a lot of joins and complex queries to make response time very fast.
 Data lies in denormalized form in DWH to make structure simple and fast.
 DB is optimized for read/write operations while DWH is designed to handle aggregate queries
and read/retrieve operations.
 DB is referred an OLTP (Online Transactional processing) and DWH is referred as OLAP
(Online Analytical Processing)
 Usually DHW separated from frontend application and data is retrieved in forms of reports that
run on scheduled time.
12

Chapter Seven

Uploaded by

Copyright:

Available Formats

Chapter Seven

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter Seven

Uploaded by

Copyright:

Available Formats

CHAPTER SEVEN

7. Advanced Database Concepts

7.5.1. Data Distribution Strategies

You might also like