Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Adbms- Super 25

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

ADBMS: Super 25 Questions with Answers:

1. Enlist types of four string functions in SQL.


ANS:

 Initcap(String) – converts first character of string to upper case

 Upper(String) – converts the string to upper case

 Lower(String) – converts string to lower case

 Length(String) – returns the number of characters in the string

 Instr(String, sub) – returns the location of the substring

 Lpad(String,char,number) – returns the string left padded with the character specified to a total of length
specified

 Rpad(String,char,number) – returns the string right padded with the character specified to a total of
length specified

 Ltrim(String) -removes white space or other specified characters from the left end of the string

 Rtrim(String)--removes white space or other specified characters from the right end of the string

 Replace(String, char,char) – replace all occurrence of a substring by another substring

 Substring(String,number) – extracts substring from the string


 Translate(String,char,char) – replace all occurrence of characters by other characters.

2. State any two advantages of functions in PL/SQL.


ANS: Advantages of functions in PL/SQL:

 Work can be divided into smaller modules so that it can be manageable and also enhances the readability
of the code.

 It promotes re-usability.

 It is secure, as the code is in the database and hides the internal database details from the user.

 It improves performance against running SQL queries multiple times.

3. Describe Hadoop. Explain architecture of Hadoop.


ANS: Hadoop is an open-source software framework for storing data and running applications on clusters
of commodity hardware. It provides massive storage for any kind of data, enormous processing power and
the ability to handle virtually limitless concurrent tasks or jobs. It is used to manage data, store data, and
process data for various big data application running under clustered systems. Hadoop provides the
following:
1) Ability to store and process huge amounts of any kind of data, quickly. With data volumes and varieties
constantly increasing, especially from social media and the Internet of Things (IoT), that's a key
consideration.

2) Computing power: Hadoop's distributed computing model processes big data fast. The more computing
nodes you use the more processing power you have.

3) Fault tolerance: Data and application processing are protected.against hardware failure. If a node goes
down, jobs are automatically redirected to other nodes to make sure the distributed computing does not
fail. Multiple copies of all data are stored automatically.

4) Flexibility: Unlike traditional relational databases, you don’t have to preprocess data before storing it.
You can store as much data as you want and decide how to use it later. That includes unstructured data like
text, images and videos.

5) Low cost: The open-source framework is free and uses commodity hardware to store large quantities of
data. Scalability. You can easily grow your system to handle more data simply by adding nodes. Little
administration is required.

Hadoop has two major layers namely −

1. Processing/Computation layer (MapReduce),

2. Storage layer (Hadoop Distributed File System).

MapReduce : Processing/Computation layer

MapReduce is a parallel programming model for writing distributed applications devised at Google for efficient
processing of large amounts of data (multi-terabyte data-sets), on large clusters (thousands of nodes) of
commodity hardware in a reliable, fault-tolerant manner. The MapReduce program runs on Hadoop which is an
Apache open-source framework.

Hadoop Distributed File System: Storage layer

The Hadoop Distributed File System (HDFS) is based on the Google File System (GFS) and provides a distributed
file system that is designed to run on commodity hardware. It has many similarities with existing distributed file
systems. However, the differences from other distributed file systems are significant. It is highly fault-tolerant
and is designed to be deployed on low-cost hardware. It provides high throughput access to application data
and is suitable for applications having large datasets.

Apart from above mentioned two core components Hadoop framework also includes two modules as

1) Hadoop common utilities – These are the java libraries and utilities required by other Hadoop modules.

2) Hadoop YARN – This is a framework for job scheduling and cluster resource management.

4. Explain the use of R-programming and also give the various applications where R-
programming use.
ANS: Use of R-programming:
R is a programming language and free software environment. It is used for statistical computing and graphics
supported by the R foundation for statistical computing. The R language is widely used among statisticians and
data minors for developing statistical software and data analysis. Applications of R-Programming:

1. Banking

2. Finance

3. E-commerce

4. Social-Media

5. Healthcare

Most of the companies are using R:

1. Facebook: Facebook uses R to update facebook status updates and its social network graph.

2. Twitter: Basically, Twitter users R to monitor user experience.

3. New York Times: R is used by New York of advertising campaigns.

4. Google: Google uses R to calculate the ROI of advertising campaigns

5. Explain and draw data warehouse life cycle.


The Data warehouse life cycle contains Project Planning Requirement gathering, Business Requirements,
Design, ETL development, Project Management & Deployment. Data warehouse life-cycle is used to indicate
the phases & their relationship through which the data warehouse system goes.

Project Planning: Contains the requirement gathering & project management. Requirement gathering: It is
done by business analyst, onsite technical lead & client. The business Analyst prepares Business requirements
specification (BRS) document. 80% of requirement collection takes place at client side. The business
requirement document can be prepared from the gathered requirement.

Requirement Analysis: After collecting the requirements the requirement analysis. This is the very tough task as
it affects every decision. The user requirement analysis can following into 4 categories:
- Data driven

- User Driven

- Goal Driven

- Mixed Driven

Technical Architecture Track: After requirement gathering & requirement analysis the technical architecture or
the project design takes place. This process involves preparing business requirement document into high level
design that includes various modules in the data warehouse project. This high level design is prepared by the
architects.

Data Track: The data track contains the data warehouse design & ETL development. Data ware design – is a
process of designing the data base by fulfilling user requirements. A data modeler is responsible for creating
Data Warehouse or Data Marts with different schemas as 1) Star schema: Simplest warehouse schema diagram
resembles star.

2) Snowflake schema: Extention of star schema, adds additional dimensions, diagram resembles snowflake

ETL development: Designing ETL applications to fulfill the specifications of documents which are prepared in
the analysis phase. The ETL development contains the ETL code review, Peer review and ETL testing.

Business Intelligence track: It contains BI design C BI development. The business logic is developed by the
developers as per the requirement.

Deployment: It is the next phase after construction. The deployment phase concerns with training support and
the maintenance of the product. This phase is also known as pilot phase or stabilization phase.

Project Management: The overall process of data warehouse life Cycle is managed by the project management
It contains different phases as: Approve specification, Task allocation, Manage issues, Regular product
demonstration, Regular product status updates and quality assurance.

Data Warehousing Development: Data warehouse is also known as enterprise data warehouse. It is a system
used for reporting and data analysis. It is considered as the core component of business Intelligence.

6. Describe BI components framework.


The Major Components of Business Intelligence (BI) The five primary components of BI include:

 OLAP (Online Analytical Processing): This component of BI allows executives to sort and select aggregates
of data for strategic monitoring. With the help of specific software products, a certification in business
intelligence helps business owners can use data to make adjustments to overall business processes

 Advanced Analytics or Corporate Performance Management (CPM): This set of tools allows business
leaders to look at the statistics of certain products or services. For instance, a fast food chain may analyze
the sale of certain items and make local, regional and national modifications on menu board offerings as a
result. The data could also be used to predict in which markets a new product may have the best success.

 Real-time BI: Using software applications, a business can respond to real-time trends in email, messaging
systems or even digital displays. Because it’s all in real-time, an entrepreneur can announce special offers
that take advantage of what’s going on in the immediate.

 Data Warehousing: Data warehousing lets business leaders sift through subsets of data and examine
interrelated components that can help drive business. Looking at sales data over several years can help
improve product development or tailor seasonal offerings.
 Data Sources: This component of BI involves various forms of stored data. It’s about taking the raw data
and using software applications to create meaningful data sources that each division can use to positively
impact business.

 A Business Intelligence Framework is a framework that seamlessly connects the various elements of a
business: organizational roles, KPIs (Key Performance Indicators), authorization, and visualization. This
helps you implement Business Intelligence plans both easier and faster.

7. Explain Concurrency Control Techniques.


ANS: There are different concurrency control techniques such as:

1) Lock based protocols

2) Two phase Locking protocols

3) Time stamp based protocols

1. Lock based protocol: To ensure serviceability it requires that th data items be accessed in a mutually
exclusive manner. i.e. While one transaction is accessing a data item, no other transaction can modify that
data. Method used to implement this requirement is to allow transaction to access a data item only if it is
currently holding a lock on that item. Locks: Lock is a data variable which is associated with a data item. Locks
help synchronize access to the database items by concurrent transactions. All lock requests are made to the
concurrency-control manager. Transactions proceed only once the lock request is granted. There are different
types of locks:

 Binary lock: A binary lock on a data item can either have locked or unlocked states.

 Shared Lock: A shared lock is also called as Read only lock. With the shared locks data items can be shared
between transactions. Because with shared locks you will never have permission to update data on the
data item. Shared lock is denoted by S.

 Exclusive Lock: With the exclusive lock a data item can be read as well as written. This lock can’t be held
concurrently on the same data item. It is denoted by X. Exclusive lock is requested using lock-X instruction.

2. Two phase Locking protocol: which is also known as 2PL. Two phase locking protocol requires that each
transaction issues lock and unlock requests in two phases:

 Growing phase: A transaction may obtain locks but may not release any lock.

 Shrinking phase: A transaction may release locks, but may not obtain any new locks.

 If the conversion is allowed, then upgrading of locks from S(A) to X(A) happens in growing phase and the
downgrade of locks from X(A) to S(A) happens in shrinking phase. It is true that 2PL protocol offers
serializability. However it does not ensure that dead locks not happen.

3. Time stamp based protocols: The timestamp-based algorithm uses a timestamp to serialize the execution of
concurrent transactions. This protocol ensures that every read and write operations are executed in timestamp
order. These protocol uses the System Time or logical count as a timestamp. The older transaction is always
given priority in this method. This is the most commonly used concurrency protocol. E.g: Suppose there are
transactions T1, T2 and T3 T1 has entered the system at time 0010 T2 has entered the system at 0020 T3 has
entered the system at 0030 Thus the priority will be given to transaction T1, then transaction T2 and then lastly
to Transaction T3

8. Explain XML document schema


ANS:

 Databases have schemas, which are used to constrain what information can be stored in the database and
to constrain the data types of the stored information. e the first schema-definition language included as
part of the XML standard, the Document Type Definition, as well as its more recently defined
replacement, XML Schema.

 Another XML schemadefinition language called Relax NG is also in use. XML Schema defines a number of
built-in types such as string, integer, decimal date, and boolean. In addition, it allows user-defined types;
these may be simple types with added restrictions, or complex types constructed using constructors such
as complex Type and sequence The first thing to note is that schema definitions in XML Schema are
themselves specified in XML syntax, using a variety of tags defined by XML Schema.

 To avoid conflicts with user-defined tags, we prefix the XML Schema tag with the namespace prefix “xs:”;
this prefix is associated with the XML Schema namespace by the xmlns:xs specification in the root
element: <xs:schemaxmlns:xs=“http://www.w3.org/2001/XMLSchema”>

9. Explain two phase locking protocol with example.

10. Compare supervised and unsupervised machine learning. (Any four points)

11. Explain Mobile database with neat diagram

12. Compare SQL and NoSQL. (Any four points)

13. List any four features of Hadoop Cloudera combination

14. Compare between Structured and Unstructured Data

15. Compare between Parallel and Distributed Database (any six points).

16. Explain Array and Multiset types in SQL with example.

17. With the help of diagram describe following architectures.

a. HDFS

b. Hbase

18. CRUD operation in MongoDB ( syntax and example)

19. Explain the types of NoSQL Databases

20. Explain the characteristics of Big data.

21. Consider following input data for your Map Reduce Program Welcome to Hadoop Class Hadoop is
good Hadoop is bad Draw Map Reduce Architecture and explain its phases.

22. Write a query using Aggregate methods.

23. State the use of NoSQL database system

24. Explain following terms of MongoDB.

(i) MongoDB Shell

(ii) MongoDB Client


25. Explain object and object identity. Write SQL query for the following table

Class: Student

Name

Age

GPA

Subject

Gender

Store

Print

Update

You might also like