IBM Object Storage
IBM Object Storage
IBM Object Storage
Deepak Rangarao
Vasfi Gucer
Redpaper
Introduction
Object storage is the primary data resource used in the cloud, and it is also increasingly used
for on-premise solutions. Object storage is growing for the following reasons:
It is designed for scale in many ways (multi-site, multi-tenant, massive amounts of data).
It is easy to use and yet meets the growing demands of enterprises for a broad expanse of
applications and workloads.
It allows users to balance storage cost, location, and compliance control requirements
across data sets and essential applications.
Due to its characteristics, object storage is becoming a significant storage repository for
active archive of unstructured data, both for public and private clouds.
IBM® Cloud Object Storage provides industry leading flexibility enabling your organization to
handle unpredictable but always changing needs of business and evolving workloads.
This IBM Redpaper™ publication explains the architecture of IBM Cloud Object Storage and
the technology behind the product. In other words, it is an under-the-hood guide for IBM
Cloud Object Storage.
The target audience for this paper is IBM Cloud Object Storage architects, IT specialists, and
technologists.
IBM Cloud Object Storage gives you the choice deploy object storage on-premises, in the
public cloud or both on-premises and in the cloud, in a hybrid solution. In addition, public
cloud services (Standard Object Storage, Vault Object Storage) can be configured in either a
Regional or Cross Regional model, providing even more choices when it comes to the level of
data protection and resiliency that you need for workloads.
In addition, IBM Cloud Object Storage is available in several licensing models, including
perpetual, subscription, or consumption.
Additional reference: You can find a detailed discussion about IBM Cloud Object Storage
use cases scenarios, and deployment options in Cloud Object Storage as a Service: IBM
Cloud Object Storage from Theory to Practice - For developers, IT architects and IT
specialists, SG24-8385.
The IBM Cloud Object Storage architecture is composed of three functional components.
Each of these components runs ClevOS software that can be deployed on compatible,
industry-standard hardware. The three components include:
IBM Cloud Object Storage Manager
IBM Cloud Object Storage Manager provides an out of band management interface that is
used for administrative tasks, such as system configuration, storage provisioning, and
monitoring the health and performance of the system.
Figure 2 illustrates the most simplistic architecture layout of the different components in IBM
Cloud Object Storage.
DATA
Accesser
Software
Site
Slicestor Software
3
IBM Cloud Object Storage API: At the time of writing this paper, the IBM Cloud Object
Storage API is not available but is expected to be available in 1H 2017.
The IBM Cloud Object Storage API will be geared towards consistency and integration with
the remainder of the IBM Cloud features. This API will be consistent with IBM Cloud API
guidelines, including cosmetic aspects, such as JSON encoding, and functional items,
such as Identity and Access Management (IAM) compliance and OAuth2 authentication for
IBM ID.
Objects that are created by using the S3 API will be able to be accessed by using the IBM
Cloud Object Storage API and vice versa.
Note that the final name of the API and the API features described here are subject to
change at the time of general availability of the API.
Core concepts
This section provides information about IBM Cloud Object Storage core concepts.
Device sets
IBM Cloud Object Storage uses the concept of device sets to group Slicestor devices
(Figure 3). Each device set is consists of the width number of Slicestor devices, as explained
previously.
Device sets can be spread across one or multiple data centers and regions.
Width
The width of a system is the number of Slicestor devices that the data is striped across in a
device set and storage pool. For example, a storage pool that has 30 storage devices is a
30-wide storage pool. As the storage pool grows, additional device sets of 30 more devices
are added; however, the width of the storage pool will stay at 30.
Storage pools
Storage pools are a set of one or more device sets, as shown in Figure 4.
Storage Pool 1
ool A set of one or more evice ets
Storage pools can be spread across one or multiple data centers and regions as they consist
of one or many device sets.
Vaults
Vaults are logical storage containers for data objects that are contained in a storage pool, as
shown in Figure 5 on page 6.
5
VAULT
Storage Pool 1
Vaults span multiple device sets and are automatically spread across all the device sets in
the storage pool so as to optimize access speeds.
Geo dispersal
Geo dispersal allows IBM Cloud Object Storage Accesser and IBM Cloud Object Storage
Slicestor nodes to be distributed across multiple sites and regions for scalability, accessibility,
and security, as shown in Figure 6 on page 7.
Accesser
Software
Slicestor Software
IBM Cloud Object Storage API: At the time of writing this paper, the IBM Cloud Object
Storage API is not available but is expected to be available in 1H 2017.
Additional reference on how IBM Cloud Object Storage works: You can refer to the
following website for more information about the internals of IBM Cloud Object Storage:
http://ibm.co/2njtFrm
7
Client application pushes data to IBM Cloud Object Storage
As shown in Figure 7, client applications use one of the available API interfaces for IBM Cloud
Object Storage to push data to the Accesser.
Original DATA
Accesser
S3 Compatible API/
IBM COS Native API
The IBM Cloud Object Storage Accesser component splits the data into 4 MB segments. For
example, a 1 GB object is split into 250 4 MB segments, as shown in Figure 8.
Accesser slicing
The IBM Cloud Object Storage Accesser component slices the individual segments of the
input data based on the defined or default width, as shown in Figure 9 on page 9.
SecureSlice
SecureSlice combines all-or-nothing-transform (AONT) encryption with IDA to form a
computational secret sharing scheme. AONT is applied as a preprocessing step to IDA using
the following internal encoding:
1. Append an integrity check value.
2. Generate random encryption key = R.
3. Encrypt data by using encryption key R.
4. Calculate the hash of encrypted data = H.
5. Calculate H+R and append the result to the encrypted data to create the AONT package.
Figure 10 on page 10 shows the IBM Cloud Object Storage SecureSlice AONT encryption.
9
Figure 10 IBM Cloud Object Storage SecureSlice AONT encryption
The IBM Cloud Object Storage Accesser component uses SecureSlice to transform the data
into segments and slices data based on a defined width. Erasure coding is used as an
additional layer of security, where the data is not compromised even when the required
number of slices are obtained, as shown in Figure 11.
What this means for you: If security is compromised in a region, the full content will not
be exposed. If one region is offline, your applications continue to run without disruption and
without you having to intervene. This process is called always-on availability, which is
benefit for IBM Cloud Object Storage customers.
What this means for you: IBM Cloud Object Storage encrypts and slices data as soon as
it comes in, with the slices dispersed across multiple regions automatically. If a write
operation is confirmed, data is protected immediately. This process means that if a region
goes down, data can still be delivered from the slices that exist in remaining regions.
Applications that rely on that data remain up and running. They can survive regional
outages.
Reading data from IBM Cloud Object Storage with IBM Cloud Object Storage
Accesser
This section describes how data is read from IBM Cloud Object Storage with IBM Cloud
Object Storage Accesser.
SecureSlice
SecureSlice combines AONT encryption with IDA to form a computational secret sharing
scheme using the following internal AONT encoding:
1. Strip the appended result from the end of the encrypted data.
2. Calculate the hash of the encrypted data = H.
3. Exclusive OR(XOR) the hash with the result to recover the random key = R.
4. Use the key to decrypt the data.
5. Verify the integrity of the data.
11
Figure 13 IBM Cloud Object Storage SecureSlice decryption
Security
The underlying IBM Cloud Object Storage Manager, IBM Cloud Object Storage Accesser,
and IBM Cloud Object Storage Slicestor appliances have multiple levels of security:
OS-level security: Uses a paired-down Linux distribution that has only the minimal
essential functions for the individual components. NSA hardening guidelines are applied.
Firewall: Each appliance has its own firewall.
Monitoring: Uses AES-encrypted SNMP messages and encrypted system logs.
Object-level authentication is done by using an Public Key and Secret Access Key
mechanism.
Key characteristics
IBM Cloud Object Storage security provides the following key characteristics:
All crucial configuration information is digitally signed to avoid being compromised.
Certificate-based authentication of every node (Manager, Accesser, and Slicestor) is
provided.
Transmission and storage of data is inherently private and secure.
No copy of data is present in any single disk, node, or location.
TLS is supported for network connections within IBM Cloud Object Storage for
data-in-motion protection.
TLS is supported on Client to Accesser network connections for data-in-motion protection.
Encryption
IDAs are used to separate the data into unrecognizable slices. These individual slices are
distributed across a network of data centers, which makes transmission and storage of data
inherently private and secure. No complete copy of the object is stored in any single storage
node.
Key characteristics
IBM Cloud Object Storage encryption provides the following key characteristics:
SecureSlice is a standard product feature.
SecureSlice encryption ensures the confidentiality of data at rest on Slicestor storage
nodes, if no more than N Slicestor nodes have their data exposed, where N = IDA Read
Threshold-1.
SecureSlice does not require a key management system.
SecureSlice can be configured to use any of the following combinations of encryption and
data integrity algorithms:
– RC4-128 encryption with MD5-128 hash for data integrity.
– AES-128 encryption with MD5-128 hash for data integrity.
– AES-256 encryption with SHA-256 hash for data integrity.
13
By default, encryption is enabled and all data is RC4 128-bit encrypted. There is an option to
enable AES 256-bit encryption.
What this means for you: IBM Cloud Object Storage is designed with a high level of
security in mind to protect your data from security breaches. From built-in encryption of
data at rest and in motion, to a range of authentication and access control options, the IBM
Cloud Object Storage solution includes a wide range of capabilities that are designed to
help meet your security requirements. These security capabilities are implemented to help
enable better security, without compromising scalability, availability, ease of management,
or economic efficiency.
Scalability
IBM Cloud Object Storage software has been tested at web-scale with production
deployments that exceed hundreds of petabytes of capacity, and can scale to exabytes (EB).
Key characteristics
IBM Cloud Object Storage scalability provides the following key characteristics:
Internet-style, distributed, shared-nothing, and peer-to-peer architecture
Yottabyte-scale global namespace with 10^38 object IDs available per vault
Increased storage capacity and performance by adding Slicestor nodes
Scale to thousands of Slicestor storage nodes in a single system
No limit on the number of Accesser nodes
You can deploy the nodes as needed based on your performance requirements.
Near-linear increase in system throughput and HTTP operations per second as the
system grows
What this means for you: IBM Cloud Object Storage is built for cloud scale and can scale
to exabytes (EB) while maintaining availability, reliability, manageability, and cost-effective
options, without any compromise. The upward scalability is virtually unlimited with this
offering.
Availability
IBM Cloud Object Storage can operate with no downtime during software upgrades, hardware
refreshes, and in the face of hardware failures, such as disk failure or node failure, or even
when an entire site is unavailable. This availability is achieved due to geo-dispersed erasure
coding.
Key characteristics
IBM Cloud Object Storage availability provides the following key characteristics:
Non-disruptive code upgrades are initiated and managed by the IBM Cloud Object
Storage Manager component as rolling upgrades.
Slicestor hardware refreshes involve installing new hardware, evacuating data from
individual Slicestor nodes, and performing writes simultaneously to the new Slicestor
nodes.
Authors
This paper was produced by a team of specialists from around the world working at the
International Technical Support Organization, San Jose Center.
Deepak Rangarao is an Executive IT Specialist with the Global Analytics CTO Office at IBM.
He has more than 18 years of experience in the telecommunications, public sector, finance,
and insurance industries in both pre-sales and post-sales capacities. Deepak’s key consulting
experience includes hybrid cloud, data management, data warehousing, advanced analytics,
and business intelligence solutions. He is a member of the IT Certification board at IBM,
mentors IT Specialist and Architects, and is also a member of the Global Cloud Expertise
Council at IBM, helping field technical staff and customers understand the value of IBM
Cloud. Deepak has a Masters Degree in Information Technology from RMIT, Australia, and
several developer certifications around Visual Basic, MS SQL Server (Microsoft Certified
Professional), IBM Cloud Developer Certification, and Apache Spark (O’Reilly Media).
Vasfi Gucer is an IBM Technical Content Services Project Leader with the Digital Services
Group. He has more than 20 years of experience in the areas of systems management,
networking hardware, and software. He writes extensively and teaches IBM classes
worldwide about IBM products. His focus has been primarily on cloud computing for the last
6 years. Vasfi is also an IBM Certified Senior IT Specialist, Project Management Professional
(PMP), IT Infrastructure Library (ITIL) V2 Manager, and ITIL V3 Expert.
Bert Dufranse
International Technical Support Organization
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
15
Stay connected to IBM Redbooks
Find us on Facebook:
http://www.facebook.com/IBMRedbooks
Follow us on Twitter:
http://twitter.com/ibmredbooks
Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806
Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks
weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html
This information was developed for products and services offered in the US. This material might be available
from IBM in other languages. However, you may be required to own a copy of the product or product version in
that language in order to access it.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user’s responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
The performance data and client examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
Statements regarding IBM’s future direction or intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to actual people or business enterprises is entirely
coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are
provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use
of the sample programs.
The following terms are trademarks or registered trademarks of International Business Machines Corporation,
and might also be trademarks or registered trademarks in other countries.
Redbooks (logo) ® Redbooks®
IBM® Redpaper™
Accesser, Slicestor, and Storage Beyond Scale are trademarks or registered trademarks of Cleversafe, Inc.,
an IBM Company.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
REDP-5435-00
ISBN 0738456039
Printed in U.S.A.
®
ibm.com/redbooks