Unit 2 Data Storage and Cloud Computing
Unit 2 Data Storage and Cloud Computing
[7Hrs]
1
Data Storage
2
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage
● File storage
● In file storage, data is stored in files, the files are organized in folders, and
the folders are organized under a hierarchy of directories and subdirectories.
● To locate a file, all you or your computer system need is the path—from
directory to subdirectory to folder to file.
● If you need to store very large or unstructured data volumes, you should
consider block-based or object-based storage
● Example:Harddrive,google drive etc.
3
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage
● Block Storage:
● Block storage breaks a file into equally-sized chunks (or blocks) of data and
stores each block separately under a unique address.
● Rather than conforming to a rigid directory/subdirectory/folder structure,
blocks can be stored anywhere in the system.
● To access any file, the server's operating system uses the unique address to
pull the blocks back together into the file, which takes less time than
navigating through directories and file hierarchies to access a file.
● Example:Block Storage are SAN, iSCSI, and local disks. 4
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage
● Object Storage:
● unstructured media and web content like email, videos, image files, web
pages, and sensor data produced by the Internet of Things (IoT).
● object is a simple, self-contained repository that includes the data,
metadata (descriptive information associated with an object), and a unique
identifying ID number.
● This information enables an application to locate and access the object.
● Example: storing objects like videos and photos on Facebook, songs on
Spotify, or files in online collaboration services, such as Dropbox 5
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Different Storage Types
File Storage Block Storage Object Storage
6
Introduction to Enterprise Data Storage,
● The size of data that businesses can store depends on the storage type
they use.
7
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Introduction to Enterprise Data Storage,
9
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Direct Attached Storage(DAS)
● Introduction of DAS
● Advantage of DAS
● Disadvantage of DAS
10
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
11
Direct Attached Storage(DAS)
● Introduction of DAS
● Direct-attached storage (DAS) is a type of storage that is attached directly to a
computer without going through a network.
● The storage might be connected internally or externally.
● Only the host computer can access the data directly.
● Most servers, desktops and laptops contain an internal hard disk drive (HDD)
or solid-state drive (SSD).
● Some computers also use external DAS devices.
12
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Direct Attached Storage(DAS)
● Introduction of DAS
● In some cases, an enterprise server might connect directly to drives
that are shared by other servers.
● A direct-attached storage device is not networked.
● An external DAS device connects directly to a computer through an interface
such as Small Computer System Interface (SCSI), Serial Advanced Technology
Attachment (SATA), Serial-Attached SCSI (SAS), FC or Internet SCSI (iSCSI).
13
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Direct Attached Storage(DAS)
● Advantage of DAS
● DAS can provide users with better performance than networked storage
because the server does not have to traverse a network to read and write
data, which is why many organizations turn to DAS for applications that
require high performance.
● DAS is also less complex than network-based storage systems, making it
easier to implement and maintain, and it is cheaper.
14
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Direct Attached Storage(DAS)
● Disadvantage of DAS
15
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Storage Area Network(SAN)
● Introduction of SAN
● Advantage of SAN
● Disadvantage of SAN
16
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Storage Area Network(SAN)
● Introduction of SAN
● A Storage Area Network (SAN) is a specialized, high-speed network
that provides network access to storage devices.
● SANs are typically composed of hosts, switches, storage elements,
and storage devices that are interconnected using a variety of
technologies, topologies, and protocols.
17
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Storage Area Network(SAN)
● Introduction of SAN
● Traditionally, only a limited number of storage devices could attach to
a server, limiting a network's storage capacity.
● But a SAN introduces networking flexibility enabling one server, or
many heterogeneous servers across multiple data centers, to share a
common storage utility.
18
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Storage Area Network(SAN)
● Advantage of SAN
● Simplified storage administration
● Disk mirroring
● Low cost of storage management
● Instant and real-time information
● Ability to boot itself and expand the storage capacity
● SAN is not directly attached to any particular server or network, SAN
can be shared by all
19
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Storage Area Network(SAN)
● Disadvantage of SAN
● If client computers need intensive data transfer then SAN is not the right choice. SAN
is good for low data traffic
● More expensive
● It is very hard to maintain
● As all client computers share the same set of storage devices so sensitive data can
be leaked. It is preferable not to store confidential information on this network.
● Poor implementation results in a performance bottleneck
● Not affordable for small business
● Require a high-level technical person
20
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Network Attached Storage(NAS)
● Introduction of NAS
● Advantage of NAS
● Disadvantage of NAS
21
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Network Attached Storage(NAS)
● Introduction of NAS
● An NAS device is a storage device connected to a network that allows
storage and retrieval of data from a central location for authorised network
users and varied clients.
● NAS is a centralized, file server, which allows multiple users to store and share
files over a TCP/IP network via Wifi or an Ethernet cable.
● It is also commonly known as a NAS box, NAS unit, NAS server, or NAS head.
22
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Network Attached Storage(NAS)
● Introduction of NAS
● Network Protocols: TCP/IP protocols –i.e. Transmission Control Protocol (TCP)
and Internet Protocol (IP)—are used for data transfer, but the network
protocols for data sharing can vary based on the type of client.
23
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Network Attached Storage(NAS)
● Advantage of NAS
● Simple to operate, a dedicated IT professional is often not required
● Lower cost
● Easy data backup, so it’s always accessible when you need it
● Good at centralising data storage in a safe, reliable way
● Disadvantage of NAS
● Out-of-sync data
● Reliability and accessibility issues if storage goes down
24
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
25
26
27
Data Storage management
● Data storage Management tool must rely on policies which govern the usage
of storage devices .
● Data Storage management refers to the software and processes that
improve the performance of data storage resources.
● It may include network virtualization, replication, mirroring, security,
compression, deduplication, traffic analysis, process automation, storage
provisioning and memory management.
28
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage management
30
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System
● Introduction
● Ghost File System
● Gluster File System
● Hadoop File System
● XtreemFS: A Distributed and Replicated File System
● Kosmos File System
● CloudFS
● Google File system(GFS)
31
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Introduction
32
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Introduction
FAT:
● FAT was planned for systems with very small RAM and small disks. It required
much less system resources compared to other file systems like UNIX.
NTFS
● NTFS is much simpler than FAT.
● While files are used, the system areas can be customized,enlarged, or moved as
required. NTFS has much more security incorporated.
● NTFS is not apt for small-sized disks.
33
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Introduction
34
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System
38
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Gluster File System
39
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Gluster File System
● Users no longer locked with legacy storage platforms which are costly and
monolithic.
● GlusterFS gives users the ability to deploy scale-out, virtualized storage,
centrally managed pool of storage.
● Attributes of GlusterFS include scalability and performance, high
availability, global namespace, elastic hash algorithm, elastic volume
manager, gluster console manager, and standards-based.
40
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Hadoop File System
41
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
42
Cloud File System :XtreemFS
43
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Kosmos File System
44
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :CloudFS
46
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Data Stores
47
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Data Stores
● A Distributed Data Store is like a distributed database where users store information
on multiple nodes.
● These kinds of data store are non-relational databases that searches data quickly
over a large multiple nodes.
● Examples for this kind of data storage are Google’s BigTable, Amazon’s Dynamo and
Windows Azure Storage.
● Some Distributed Data Stores use to recover the original file when parts of that file
are damaged or unavailable by using forward error correction techniques.
● Others download that file from a diverse mirror.
49
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Data Stores:Types of Data Stores:BigTable
● BigTable charts two random string values (row and column key) and
timestamp into an associated random byte array.
52
Bigtable architecture
53
Cloud Data Stores:Types of Data Stores:BigTable
● What is grids?
● Grid Storage for Grid Computing
● Grid Oriented Storage (GOS)
58
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage
● What is grids?
● Grid computing is a computing infrastructure that combines computer
resources spread over different geographical locations to achieve a
common goal.
59
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage
61
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage
62
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage
65
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage
● Following are some additional cloud storage attributes:
● Resource pooling and multi-tenancy: Multiple consumers can use
shared single storage device. Storage resources are pooled and
consumers can be assigned and unassigned resources according to
their needs.
● Scalable and elastic: Virtualized storage can be easily expanded on
need basis.
66
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage
● Following are some additional cloud storage attributes:
● Accessible standard protocols including HTTP, FTP, XML, SOAP and
REST.
● Service-based: Consumers no need to invest, that is, no CAPEX (Capital
Expenditure) and only pay for usage, that is, OPEX (Operational
Expenditure).
● Pricing based on usage
● Shared and collaborative
● On-demand self-service
67
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Management for Cloud Storage
● Introduction
● Cloud Data Management Interface (CDMI)
● Cloud Storage Requirements
68
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Management for Cloud Storage
● Introduction
● Cloud storage should incorporate new services according to change of
time.
● For cloud storage, a standard document is placed by SNIA, Storage
Industry Resource Domain Model (SIRDM).
● Figure shows the SIRDM model which uses CDMI standards.
● SIRDM model adopts three metadata:
● storage metadata, data metadata and user metadata.
● By using these metadata, cloud storage interface can offer services
69
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Management for Cloud Storage
● User metadata is used by the cloud to find the data objects and
containers.
● Storage system metadata is used by the cloud to offer basic storage
functions like assigning,modifying and access control.
● Data system metadata is used by the cloud to offer data as a service
based on user requirements and controls the operation based on that
data.
70
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
71
Data Management for Cloud Storage
● Cloud Data Management Interface (CDMI)
● To create, retrieve, update and delete objects in a cloud ,the cloud
data management interface (CDMI) is used.
● The functions in CDMI are:
● Cloud storage offerings are discovered by clients
● Management of containers and the data
● Sync metadata with containers an objects
72
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Management for Cloud Storage
● Cloud Data Management Interface (CDMI)
● CDMI is also used to manage containers, domains, security access
and billing information.
● CDMI standard is also used as protocols for accessing storage.
● CDMI defines how to manage data and also ways of storing and
retrieving it.
● ‘Data path’ means how data is stored and retrieved.
● ‘Control path’ means how data is managed.
● CDMI standard supports both data path and control path interface. 73
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Provisioning Cloud Storage
74
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Provisioning Cloud Storage
● By adopting Cloud Data Management Interface (CDMI), standard
service providers can implement the method for metering the storage
and data usage of consumers.
● This interface also helps the providers for billing to the IT
organizations based on their usage.
● Advantage of this interface is that IT organizations need not write/use
different adapters used by the service providers.
75
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing
● Introduction
● Processing Approach
● System Architecture
76
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing
● Introduction
● Data-intensive computing is a related type of computing which use
parallelism concept for processing large volumes of data, called big data.
● Parallel processing approaches are divided into two types: compute-
intensive and data intensive.
77
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing
● Introduction
● Compute-intensive:Applications which need more
execution time for computational requirements
● Data-intensive :Applications which to try to find
large volume of data and time in process.
78
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing
● Processing Approach
● Data-intensive computing platforms use a parallel computing
approach.
● This approach combines multiple processors and disks as
computing clusters connected via high-speed network.
● The data that are needed to be processed are independently done
by computing resources available in the clusters.
79
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing
● Processing Approach
● There are many common characteristics of data-intensive
● The principle mechanism used for collection of the data and
programs or algorithms to perform the computation
● Programming model used
● Reliability and availability
● Scalability of both hardware and software
80
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing
● System Architecture
● For data-intensive computing an array of system architectures have
been implemented.
● Architecture for data-intensive computing
1. MapReduce
2. HPCC
81
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing
● System Architecture
● 1. MapReduce
● MapReduce concept which is developed by Google and available as open-source
implementation known as Hadoop.
● This project is used by Yahoo, Facebook and others.
● To create a map function, the MapReduce architecture uses a functional
programming style using key-value pair.
● Reduce function merges all intermediate values using intermediate keys.
● Hence programmers who do not have experience in parallel programming can simply
use a large distributed processing environment without any problem.
82
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing
● System Architecture
● 2. HPCC:(High-Performance Computing Cluster).
● Developed by Lexis Nexis Risk Solutions called LexisNexis.
● LexisNexis Risk Solutions independently developed and implemented
a solution for data intensive computing called the HPCC .
● The LexisNexis method structure clusters with commodity hardware
that runs in Linux OS.
83
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing
● System Architecture
● 2. HPCC
● Custom system software and middleware parts were created and layered
to provide the execution environment and distributed file system support
that is essential for data-intensive computing on the base of Linux operating
system.
● A new high-level language for data-intensive computing called ECL is also
implemented by LexisNexis.
84
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs
❏ Cloud Characteristics
❏ Distributed Data Storage.
❏ Application Utilizing Cloud storage
85
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs
★ Cloud Characteristics
★ There are three characteristics of a cloud computing ,considered before choosing
storage in cloud.
88
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs
90
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs