Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
1K views

Unit 2 Data Storage and Cloud Computing

The document discusses data storage and cloud computing. It describes different types of data storage including file storage, block storage, and object storage. It also discusses various enterprise data storage solutions like direct attached storage (DAS), storage area networks (SAN), and network attached storage (NAS). Cloud storage concepts like data management, provisioning cloud storage, and data intensive technologies for cloud computing are also covered.

Uploaded by

High School
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views

Unit 2 Data Storage and Cloud Computing

The document discusses data storage and cloud computing. It describes different types of data storage including file storage, block storage, and object storage. It also discusses various enterprise data storage solutions like direct attached storage (DAS), storage area networks (SAN), and network attached storage (NAS). Cloud storage concepts like data management, provisioning cloud storage, and data intensive technologies for cloud computing are also covered.

Uploaded by

High School
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Unit-2 : Data Storage and Cloud Computing

[7Hrs]

Data Storage: Introduction to Enterprise Data Storage, Direct Attached Storage,


Storage Area Network, Network Attached Storage, Data Storage Management, File
System, Cloud Data Stores,Using Grids for Data Storage.

Cloud Storage: Data Management, Provisioning Cloud storage,Data Intensive


Technologies for Cloud Computing. Cloud Storage from LANs to WANs: Cloud
Characteristics, Distributed Data Storage.

1
Data Storage

● Data storage:Files and documents are recorded digitally and saved in a


storage system for future use
● A huge amount of data is continuously generated, collected, stored, and
analyzed through software.
● The most prevalent forms of data storage are file storage, block
storage, and object storage, with each being ideal for different
purposes.

2
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage

● File storage
● In file storage, data is stored in files, the files are organized in folders, and
the folders are organized under a hierarchy of directories and subdirectories.
● To locate a file, all you or your computer system need is the path—from
directory to subdirectory to folder to file.
● If you need to store very large or unstructured data volumes, you should
consider block-based or object-based storage
● Example:Harddrive,google drive etc.
3
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage

● Block Storage:
● Block storage breaks a file into equally-sized chunks (or blocks) of data and
stores each block separately under a unique address.
● Rather than conforming to a rigid directory/subdirectory/folder structure,
blocks can be stored anywhere in the system.
● To access any file, the server's operating system uses the unique address to
pull the blocks back together into the file, which takes less time than
navigating through directories and file hierarchies to access a file.
● Example:Block Storage are SAN, iSCSI, and local disks. 4
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage

● Object Storage:
● unstructured media and web content like email, videos, image files, web
pages, and sensor data produced by the Internet of Things (IoT).
● object is a simple, self-contained repository that includes the data,
metadata (descriptive information associated with an object), and a unique
identifying ID number.
● This information enables an application to locate and access the object.
● Example: storing objects like videos and photos on Facebook, songs on
Spotify, or files in online collaboration services, such as Dropbox 5
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Different Storage Types
File Storage Block Storage Object Storage

6
Introduction to Enterprise Data Storage,

● An Enterprise Storage System is a centralized repository for business


information.

● Enterprise data storage allows businesses to store and access large


volumes of company information.

● The size of data that businesses can store depends on the storage type
they use.
7
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Introduction to Enterprise Data Storage,

● For most large companies, a good data storage platform is essential to


security and success
● Business requires that huge amounts of data be stored safely but also be
easily accessible.
● Enterprise storage is a centralized repository for business
information that provides common data management, protection
and data sharing functions through connections to computer
systems. 8
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Introduction to Enterprise Data Storage,

● Main types of Enterprise(business) Data storage


● Direct Attached Storage(DAS)
● Storage Area Network(SAN)
● Network Attached Storage(NAS)

9
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Direct Attached Storage(DAS)

● Introduction of DAS

● Advantage of DAS

● Disadvantage of DAS

10
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
11
Direct Attached Storage(DAS)

● Introduction of DAS
● Direct-attached storage (DAS) is a type of storage that is attached directly to a
computer without going through a network.
● The storage might be connected internally or externally.
● Only the host computer can access the data directly.
● Most servers, desktops and laptops contain an internal hard disk drive (HDD)
or solid-state drive (SSD).
● Some computers also use external DAS devices.

12
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Direct Attached Storage(DAS)

● Introduction of DAS
● In some cases, an enterprise server might connect directly to drives
that are shared by other servers.
● A direct-attached storage device is not networked.
● An external DAS device connects directly to a computer through an interface
such as Small Computer System Interface (SCSI), Serial Advanced Technology
Attachment (SATA), Serial-Attached SCSI (SAS), FC or Internet SCSI (iSCSI).

13
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Direct Attached Storage(DAS)

● Advantage of DAS
● DAS can provide users with better performance than networked storage
because the server does not have to traverse a network to read and write
data, which is why many organizations turn to DAS for applications that
require high performance.
● DAS is also less complex than network-based storage systems, making it
easier to implement and maintain, and it is cheaper.

14
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Direct Attached Storage(DAS)

● Disadvantage of DAS

● It has limited scalability


● lacks the type of centralized management
● backup capabilities available to other storage platforms.
● it can't be easily shared data

15
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Storage Area Network(SAN)

● Introduction of SAN

● Advantage of SAN

● Disadvantage of SAN

16
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Storage Area Network(SAN)

● Introduction of SAN
● A Storage Area Network (SAN) is a specialized, high-speed network
that provides network access to storage devices.
● SANs are typically composed of hosts, switches, storage elements,
and storage devices that are interconnected using a variety of
technologies, topologies, and protocols.

17
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Storage Area Network(SAN)

● Introduction of SAN
● Traditionally, only a limited number of storage devices could attach to
a server, limiting a network's storage capacity.
● But a SAN introduces networking flexibility enabling one server, or
many heterogeneous servers across multiple data centers, to share a
common storage utility.

18
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Storage Area Network(SAN)

● Advantage of SAN
● Simplified storage administration
● Disk mirroring
● Low cost of storage management
● Instant and real-time information
● Ability to boot itself and expand the storage capacity
● SAN is not directly attached to any particular server or network, SAN
can be shared by all

19
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Storage Area Network(SAN)

● Disadvantage of SAN
● If client computers need intensive data transfer then SAN is not the right choice. SAN
is good for low data traffic
● More expensive
● It is very hard to maintain
● As all client computers share the same set of storage devices so sensitive data can
be leaked. It is preferable not to store confidential information on this network.
● Poor implementation results in a performance bottleneck
● Not affordable for small business
● Require a high-level technical person
20
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Network Attached Storage(NAS)

● Introduction of NAS

● Advantage of NAS

● Disadvantage of NAS

21
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Network Attached Storage(NAS)

● Introduction of NAS
● An NAS device is a storage device connected to a network that allows
storage and retrieval of data from a central location for authorised network
users and varied clients.
● NAS is a centralized, file server, which allows multiple users to store and share
files over a TCP/IP network via Wifi or an Ethernet cable.
● It is also commonly known as a NAS box, NAS unit, NAS server, or NAS head.

22
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Network Attached Storage(NAS)

● Introduction of NAS
● Network Protocols: TCP/IP protocols –i.e. Transmission Control Protocol (TCP)
and Internet Protocol (IP)—are used for data transfer, but the network
protocols for data sharing can vary based on the type of client.

23
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Network Attached Storage(NAS)

● Advantage of NAS
● Simple to operate, a dedicated IT professional is often not required
● Lower cost
● Easy data backup, so it’s always accessible when you need it
● Good at centralising data storage in a safe, reliable way
● Disadvantage of NAS
● Out-of-sync data
● Reliability and accessibility issues if storage goes down
24
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
25
26
27
Data Storage management

● Data storage Management tool must rely on policies which govern the usage
of storage devices .
● Data Storage management refers to the software and processes that
improve the performance of data storage resources.
● It may include network virtualization, replication, mirroring, security,
compression, deduplication, traffic analysis, process automation, storage
provisioning and memory management.

28
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage management

● Storage management makes it possible to reassign storage capacity quickly


as business needs change.
● Storage management techniques can be applied to primary,backup or
archived storage.
● Primary storage holds actively or frequently accessed data
● Backup storage holds copies of primary storage data for use of disaster
recovery
● Archived storage holds outdated or rarely used data that must be retained
for compliance or business continuity.
29
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Storage management

● Storage provisioning is a management technique that assign storage


capacity to servers,computer,virtual machines and other devices.
● It may use automation to allocate storage space in a networked
environment.
● Intelligent storage management uses software policies and algorithm to
automate the provisioning (and de-provisioning) of storage
resources,continuously monitoring data utilisation and re-balancing data
placement without human intervention.

30
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System

● Introduction
● Ghost File System
● Gluster File System
● Hadoop File System
● XtreemFS: A Distributed and Replicated File System
● Kosmos File System
● CloudFS
● Google File system(GFS)
31
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Introduction

● A file system is a structure used in computer to store data on a hard


disk.
● When we install a new hard disk, we need to partition and format it using
a file system before storing data.
● Following file systems in use in Windows OS; NTFS(New Technology File
System),FAT32(File allocation table),EXT4(most common Linux file
system) etc

32
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Introduction

FAT:
● FAT was planned for systems with very small RAM and small disks. It required
much less system resources compared to other file systems like UNIX.
NTFS
● NTFS is much simpler than FAT.
● While files are used, the system areas can be customized,enlarged, or moved as
required. NTFS has much more security incorporated.
● NTFS is not apt for small-sized disks.
33
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Introduction

● File system typically provide mechanism for reading,writing


,modifying,deleting or organising files in folders and
directories,
● Cloud file system are specifically designed to be distributed
and operated in the cloud based environment.

34
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System

● In cloud file systems, the considerations are:


● It must sustain basic file system functionality.
● It should be an open source.
● It should be grown-up enough that users will at least think about trusting
their data to it.
● It should be shared, i.e., available over a network.
● It should be paralleling scalable.
● It should provide honest data protection, still on commodity hardware with
only internal storage.
35
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Ghost File System

● Ghost cloud file system is used in Amazon Web Services (AWS).


● It gives high redundant elastic mountable, cost-effective and standards-
based file system.
● A fully featured scalable and stable cloud file systems is provided by ghost
cloud file system.
● GFS (Ghost File System) run over Amazon’s S3, EC2 and SimpleDB web
services.
● When using GFS, user can have complete control of the data and can be
accessed as a standard network disk drive.
36
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Ghost File System

● Features of Ghost CFS


● Mature elastic file system in the cloud.
● All files and metadata duplicated across multiple AWS availability regions.
● WebDav for standard mounting on any Linux, Windows or Mac server or
client in the world.
● FTP access.
● Web interface for user management and for file upload/download.
● File name search.
● Side-loading of files from torrent and from URL.
37
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Ghost File System

● Benefits of Ghost CFS


● Elastic and cost efficient: Pay for what you use from 1 GB to hundreds of
terabytes.
● Multi-region redundancy: Aiming to take advantage of AWS’s 99.99% availability
● Highly secure: Uses your own AWS account (ghost cannot access your data).
● No administration: Scales elastically with built in redundancy—no provisioning or
backup.
● Anywhere: Mount on a server or client or access files via a web page or from a
mobile phone.

38
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Gluster File System

● GlusterFS is an open source, distributed file system capable of handling


multiple clients and large data.
● GlusterFS clusters storage devices over network, aggregating disk and
memory resources and managing data as a single unit.
● GlusterFS is based on a stackable user space design and delivers good
performance for even heavier workloads.
● GlusterFS supports clients with valid IP address in network.

39
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Gluster File System

● Users no longer locked with legacy storage platforms which are costly and
monolithic.
● GlusterFS gives users the ability to deploy scale-out, virtualized storage,
centrally managed pool of storage.
● Attributes of GlusterFS include scalability and performance, high
availability, global namespace, elastic hash algorithm, elastic volume
manager, gluster console manager, and standards-based.

40
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Hadoop File System

● A distributed file system designed to run on commodity hardware is known


as Hadoop Distributed File System (HDFS).
● In HDFS, files are stored in blocks ranging from 64 MB to 1024 MB.
● The default size is 64 MB.
● The blocks will be distributed across the cluster and replicated for fault
tolerance.

41
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
42
Cloud File System :XtreemFS

● XtreemFS: A Distributed and Replicated File System


● XtreemFS is a distributed, replicated and open source.
● XtreemFS allows users to mount and access files via WWW.
● Engaging XtreemFS a user can replicate the files across data centres
to reduce network congestion, latency and increase data availability.
● Installing XtreemFS is quite easy, but replicating the files is bit
difficult.

43
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :Kosmos File System

● Kosmos Distributed File System (KFS) gives high performance with


availability and reliability.
● For example, search engines, data mining, grid computing, etc.
● It is deployed in C++ using standard system components such as STL,
boost libraries, aio, log4cpp.
● KFS is incorporated with Hadoop and Hypertable.

44
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System :CloudFS

● CloudFS is a distributed file system to solve problems when file system


is itself provided as a service.
● CloudFS is based on GlusterFS, a basic distributed file system, and
supported by Red Hat and hosted by Fedora.
● There are really three production level distributed/parallel file systems
that come close to the requirements for the cloud file systems:
Lustre(Linux and cluster.), PVFS2(The Parallel Virtual File System (PVFS) ) and
GlusterFS.
45
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud File System:Google file System

● It is scalable distributed file system for large distributed data-intensive


applications.
● Developed by google
● Single file is not stored into single server,files are divided into multiple
chunk
● GFS master only read the metadata of file
● Publish paper https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf

46
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Data Stores

● What is Data stores?


● Distributed Data Store
● Types of Data Stores :BigTable,Dynamo:

47
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Data Stores

● A data store is a data repository where data are stored as objects.


● Data store includes data repositories, flat files that can store data.
● Data stores can be of different types:
● Relational databases (Examples: MySQL, PostgreSQL, Microsoft SQL Server, Oracle
● Database)
● Object-oriented databases
● Operational data stores
● Schema-less data stores, e.g. Apache Cassandra or Dynamo
● Paper files
● Data files (spreadsheets, flat files, etc)
48
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Data Stores:Distributed Data Store

● A Distributed Data Store is like a distributed database where users store information
on multiple nodes.
● These kinds of data store are non-relational databases that searches data quickly
over a large multiple nodes.
● Examples for this kind of data storage are Google’s BigTable, Amazon’s Dynamo and
Windows Azure Storage.
● Some Distributed Data Stores use to recover the original file when parts of that file
are damaged or unavailable by using forward error correction techniques.
● Others download that file from a diverse mirror.

49
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Data Stores:Types of Data Stores:BigTable

● BigTable is a compressed, high performance and proprietary data storage


system construct on Google File System, Chubby Lock Service, SSTable and a
small number of other Google technologies.
● BigTable was developed in 2004 and is used in number of Google applications
such as web indexing, Google Earth, Google Reader, Google Maps, Google Book
Search, MapReduce, Blogger.com, Google Code hosting, Orkut, YouTube and
Gmail.
● Advantage for developing BigTable includes scalability and better performance
control.
50
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Data Stores:Types of Data Stores:BigTable

● BigTable charts two random string values (row and column key) and
timestamp into an associated random byte array.

● BigTable is designed to scale into the petabyte range across


multiple machines and easy to add more machines and
automatically start using resources available without any
configuration changes.
51
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Bigtable storage model

52
Bigtable architecture

53
Cloud Data Stores:Types of Data Stores:BigTable

● Other similar softwares are as follows:


● Apache Accumulo: Construct on top of Hadoop, ZooKeeper and economy.
Server-side programming mechanism deployed in Java environment.
● Apache Cassandra: Dynamo’s distributed design and BigTable’s facts and
numbers form adds simultaneously in Apache Cassandra, which uses Java.
● Hbase: Supports BigTable and Java programming language.
● Hypertable: Designed for cluster of servers especially for storage and
processing.
● KDI: Kosmix stab to make a BigTable clone and is written in C++.
54
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Data Stores:Types of Data Stores:Dynamo

● Dynamo: A Distributed Storage System


● Dynamo is a vastly offered, proprietary key-value structured storage system
or a dispersed datastore. Dispersed storage systems are well-suited for storing unstructured data like digital

media of all types

● It can act as databases and also distributed hash tables (DHTs).


● It is used with parts of Amazon web services such as Amazon S3.
● Dynamo is the most powerful relational database available in World Wide
Web.
● Relational databases have been used a lot in retail sites, to make visitors
browse andSubject:Cloud
search for products easily.
Computing:Unit-2:Data Storage and Cloud Computing
55
Cloud Data Stores:Types of Data Stores:Dynamo

● It is difficult to create redundancy and parallelism with relational


databases which is a single point failure.
● Replication is also not possible.
● Dynamo is a distributed storage system and not a relational database.
● Similar to a relational database it stores information to be retrieved;
however, it stores the data as objects and not as tables.
● The advantage of using Dynamo is responsive and consistent in
creating a distributed storage solution.
56
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
AWS Lambda is a serverless compute service that runs your
code in response to events and automatically manages the
underlying compute resources for you 57
Using Grids for Data Storage

● What is grids?
● Grid Storage for Grid Computing
● Grid Oriented Storage (GOS)

58
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage

● What is grids?
● Grid computing is a computing infrastructure that combines computer
resources spread over different geographical locations to achieve a
common goal.

59
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage

● Grid Storage for Grid Computing


● Grid computing established its stand as an understood architecture, as it
provides users and applications to use shared pool of resources.
● The compute grid connects computers both desktops and servers and storage
across an organization.
● It virtualizes heterogeneous and remotely located components into a single
system.
● Grid computing allows sharing of computing and data resources for multiple
workloads and enables collaboration both within and across organizations.
60
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage

● Grid Storage for Grid Computing


● Storage for grid computing requires a common file system to present as a single
storage space to all workloads.
● Presently grid computing system uses NAS type of storage.
● NAS provides transparency but limits scale and storage management
capabilities.
● To set the unique demands of the compute grid on its storage infrastructure,
storage for the grid must be abnormally flexible.

61
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage

● Grid Storage for Grid Computing


● DAS is basically not an option.
● Virtualization is a start, providing the single unit behaviour where the global
filing system requires data compute grid.
● Due to this, SAN architectures are used.
● However, the scale of these SANs is beyond the capabilities of fibre channel.

62
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage

● Grid Oriented Storage (GOS)


● GOS is a dedicated data storage architecture connected directly to a computational
grid.
● It supports and acts as a data bank and large supply for data if needed, which can
be shared among multiple grid clients.
● GOS is a successor of Network-Attached Storage (NAS) products in the grid
computing era.
● GOS accelerates all kinds of applications in terms of performance and transparency.
● A GOS system contains multiple hard disks, arranged into logical, redundant storage
containers like traditional file servers.
63
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Using Grids for Data Storage

● Grid Oriented Storage (GOS)


● GOS deals with long-distance, heterogeneous and single-image file
operations.
● GOS acts as a file server and uses file-based GOS-FS protocol.
● Similar to GridFTP, GOS-FS integrates a parallel stream engine and Grid
Security Infrastructure (GSI).
● GOS-FS can be used as an underlying platform to utilize the available
bandwidth and accelerate performance in grid-based applications.
64
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage
● Cloud storage is a part of cloud computing.
● Cloud storage can be accessible through web-based applications maintained
by the third party (service provider).
● Cloud storage is nothing but virtualized storage on demand called as Data
storage as a Service (DaaS).
● Cloud storage can be deployed in many ways. For example:
● Local data (desktop/laptop) can be backed up to cloud storage.
● A virtual disk can be ‘sync’ to the cloud and distributed.
● The cloud can be used as a reservoir for storing data.

65
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage
● Following are some additional cloud storage attributes:
● Resource pooling and multi-tenancy: Multiple consumers can use
shared single storage device. Storage resources are pooled and
consumers can be assigned and unassigned resources according to
their needs.
● Scalable and elastic: Virtualized storage can be easily expanded on
need basis.

66
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage
● Following are some additional cloud storage attributes:
● Accessible standard protocols including HTTP, FTP, XML, SOAP and
REST.
● Service-based: Consumers no need to invest, that is, no CAPEX (Capital
Expenditure) and only pay for usage, that is, OPEX (Operational
Expenditure).
● Pricing based on usage
● Shared and collaborative
● On-demand self-service
67
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Management for Cloud Storage

● Introduction
● Cloud Data Management Interface (CDMI)
● Cloud Storage Requirements

68
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Management for Cloud Storage

● Introduction
● Cloud storage should incorporate new services according to change of
time.
● For cloud storage, a standard document is placed by SNIA, Storage
Industry Resource Domain Model (SIRDM).
● Figure shows the SIRDM model which uses CDMI standards.
● SIRDM model adopts three metadata:
● storage metadata, data metadata and user metadata.
● By using these metadata, cloud storage interface can offer services
69
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Management for Cloud Storage
● User metadata is used by the cloud to find the data objects and
containers.
● Storage system metadata is used by the cloud to offer basic storage
functions like assigning,modifying and access control.
● Data system metadata is used by the cloud to offer data as a service
based on user requirements and controls the operation based on that
data.

70
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
71
Data Management for Cloud Storage
● Cloud Data Management Interface (CDMI)
● To create, retrieve, update and delete objects in a cloud ,the cloud
data management interface (CDMI) is used.
● The functions in CDMI are:
● Cloud storage offerings are discovered by clients
● Management of containers and the data
● Sync metadata with containers an objects

72
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data Management for Cloud Storage
● Cloud Data Management Interface (CDMI)
● CDMI is also used to manage containers, domains, security access
and billing information.
● CDMI standard is also used as protocols for accessing storage.
● CDMI defines how to manage data and also ways of storing and
retrieving it.
● ‘Data path’ means how data is stored and retrieved.
● ‘Control path’ means how data is managed.
● CDMI standard supports both data path and control path interface. 73
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Provisioning Cloud Storage

● Cloud means sharing third party resources via the Internet.


● This sharing can be done on need basis and there is no need to invest
any infrastructure at consumers end.
● Capacity of storage can be increased on need basis and can be done
using multi-tenancy methods.

74
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Provisioning Cloud Storage
● By adopting Cloud Data Management Interface (CDMI), standard
service providers can implement the method for metering the storage
and data usage of consumers.
● This interface also helps the providers for billing to the IT
organizations based on their usage.
● Advantage of this interface is that IT organizations need not write/use
different adapters used by the service providers.

75
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing

● Introduction
● Processing Approach
● System Architecture

76
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing

● Introduction
● Data-intensive computing is a related type of computing which use
parallelism concept for processing large volumes of data, called big data.
● Parallel processing approaches are divided into two types: compute-
intensive and data intensive.

77
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing

● Introduction
● Compute-intensive:Applications which need more
execution time for computational requirements
● Data-intensive :Applications which to try to find
large volume of data and time in process.

78
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing

● Processing Approach
● Data-intensive computing platforms use a parallel computing
approach.
● This approach combines multiple processors and disks as
computing clusters connected via high-speed network.
● The data that are needed to be processed are independently done
by computing resources available in the clusters.

79
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing

● Processing Approach
● There are many common characteristics of data-intensive
● The principle mechanism used for collection of the data and
programs or algorithms to perform the computation
● Programming model used
● Reliability and availability
● Scalability of both hardware and software

80
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing

● System Architecture
● For data-intensive computing an array of system architectures have
been implemented.
● Architecture for data-intensive computing

1. MapReduce

2. HPCC

81
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing

● System Architecture
● 1. MapReduce
● MapReduce concept which is developed by Google and available as open-source
implementation known as Hadoop.
● This project is used by Yahoo, Facebook and others.
● To create a map function, the MapReduce architecture uses a functional
programming style using key-value pair.
● Reduce function merges all intermediate values using intermediate keys.
● Hence programmers who do not have experience in parallel programming can simply
use a large distributed processing environment without any problem.
82
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing

● System Architecture
● 2. HPCC:(High-Performance Computing Cluster).
● Developed by Lexis Nexis Risk Solutions called LexisNexis.
● LexisNexis Risk Solutions independently developed and implemented
a solution for data intensive computing called the HPCC .
● The LexisNexis method structure clusters with commodity hardware
that runs in Linux OS.

83
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Data-intensive Technologies for Cloud Computing

● System Architecture
● 2. HPCC
● Custom system software and middleware parts were created and layered
to provide the execution environment and distributed file system support
that is essential for data-intensive computing on the base of Linux operating
system.
● A new high-level language for data-intensive computing called ECL is also
implemented by LexisNexis.

84
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs

❏ Cloud Characteristics
❏ Distributed Data Storage.
❏ Application Utilizing Cloud storage

85
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs

★ Cloud Characteristics
★ There are three characteristics of a cloud computing ,considered before choosing
storage in cloud.

1.Computer power is elastic, when it can perform parallel operations.

e.g.Google’s App Engine,

2.Data is retained at an unknown host server.

3.Data is duplicated often over distant locations.e.g.Amazon’s S3


86
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs

● Distributed Data Storage.


● Data storage for the new generation of WWW applications through
organizations like Google, Amazon and Yahoo.
● new generation of applications require processing of data to a tune of
terabytes and even peta bytes. This is accomplished by distributed services.
● Following Database are used for distributed data storage
● Amazon Dynamo
● CouchDB
● ThruDB 87
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs

● Following Database are used for distributed data storage


● Amazon Dynamo
● It is a fully managed, serverless, key-value NoSQL database designed to
run high-performance applications at any scale
● supports key–value and document data structures
● DynamoDB uses synchronous replication across multiple data centers[4]
for high durability and availability.

88
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs

● Following Database are used for distributed data storage


● CouchDB
● CouchDB aspires the Four Pillars of Data Management:
● 1. Save: ACID compliant, save efficiently
● 2. See: Easy retrieval, straightforward describing procedures, fulltext
search
● 3. Secure: Strong compartmentalization, ACL, connections over SSL
● 4. Share: Distributed means
89
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs

● Following Database are used for distributed data storage


● ThruDB
● ThruDB aspires to be universal in simplifying the administration of the up-
to-date WWW data level (indexing, caching, replication, backup) by supplying
a reliable set of services:
● Thrucene for indexing
● Throxy for partitioning and burden balancing
● Thrudoc for article storage

90
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
Cloud Storage from LANs to WANs

❏ Application Utilizing Cloud storage


❏ Online File Storage :DropBox,Box.net,Live Mesh,Oosah,JungleDisk
❏ Cloud Storage Companies:Most of these service providers have a free test or offer
some sort of free storage space.Box cloud storage,Amazon cloud,SugarSync online
backup: SugarSync,Hubic online storage,Google cloud drive: Google
❏ Online Book Marking Service:Microsoft Labs lately launched Thumbtack, a new
bookmarking application.
❏ Online Photo Editing Service :Online Photo Editors,Photoshop Express
Editor:Picnik,Splashup,FotoFlexer,Pixer.us
91
Subject:Cloud Computing:Unit-2:Data Storage and Cloud Computing
92

You might also like