IBM CEPH Storage
IBM CEPH Storage
IBM CEPH Storage
Redpaper
Draft Document for Review November 28, 2023 12:22 am 5715edno.fm
IBM Redbooks
August 2023
REDP-5715-00
5715edno.fm Draft Document for Review November 28, 2023 12:22 am
Note: Before using this information and the product it supports, read the information in “Notices” on
page vii.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Chapter 4. Disaster recovery and backup and archive: IBM Storage Ceph as an S3
Backup Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.1.1 Benefits of IBM Storage Ceph for Veritas NetBackup. . . . . . . . . . . . . . . . . . . . . . 68
4.1.2 Scalability and flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.1.3 Data protection and disaster recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.1.4 Efficiency and cost effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2 Implementing Veritas NetBackup with IBM Storage Ceph . . . . . . . . . . . . . . . . . . . . . . 69
4.2.1 Certified and supported solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2.2 IBM Storage Insights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3 IBM Storage Ceph multi-site replication with Veritas NetBackup . . . . . . . . . . . . . . . . . 70
4.3.1 Configuration of Veritas NetBackup components . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4 IBM Storage Ready Nodes for Ceph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4.1 IBM Storage Ceph Editions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Contents v
5715TOC.fm Draft Document for Review November 28, 2023 8:33 am
Notices
This information was developed for products and services offered in the US. This material might be available
from IBM in other languages. However, you may be required to own a copy of the product or product version in
that language in order to access it.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user’s responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
The performance data and client examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
Statements regarding IBM’s future direction or intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to actual people or business enterprises is entirely
coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are
provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use
of the sample programs.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation, registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at “Copyright
and trademark information” at https://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks or registered trademarks of International Business Machines Corporation,
and might also be trademarks or registered trademarks in other countries.
Redbooks (logo) ® IBM Cloud®
IBM® Redbooks®
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive
licensee of Linus Torvalds, owner of the mark on a worldwide basis.
Ceph, JBoss, OpenShift, Red Hat, are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries
in the United States and other countries.
VMware, and the VMware logo are registered trademarks or trademarks of VMware, Inc. or its subsidiaries in
the United States and/or other jurisdictions.
Veritas, the Veritas logo, Backup Exec and NetBackup are trademarks or registered trademarks of Veritas
Technologies LLC or its affiliates in the U.S. and other countries. Other names may be trademarks of their
respective owners.
Other company, product, or service names may be trademarks or service marks of others.
Preface
IBM® Storage Ceph is an IBM-supported distribution of the open-source Ceph platform that
provides massively scalable object, block, and file storage in a single system.
IBM Storage Ceph is designed to operationalize AI with enterprise resiliency and consolidate
data with software simplicity and run on multiple hardware platforms to provide flexibility and
lower costs.
Engineered to be self-healing and self-managing with no single point of failure and includes
storage analytics for critical insights into growing amounts of data. IBM Storage Ceph can be
used as an easy and efficient way to build a data lakehouse for IBM watsonx.data and for
next-generation AI workloads.
This IBM Redpaper publication explains the implementation details and use cases of IBM
Storage Ceph. For more information on the IBM Storage Ceph architecture and technology
that is behind the product, see the IBM Storage Ceph Concepts and Architecture Guide,
REDP-5721 IBM Redpaper.
The target audience for this publication is IBM Storage Ceph architects, IT specialists, and
storage administrators.
Authors
This paper was produced by a team of specialists from around the world .
Also thanks to James Pool, James Eckersall and Ian Watson from Red Hat Inc, as they
started this journey, providing the initial insight and documentation for Chapter 7, “S3
enterprise user authentication (IDP) and authorization (RBAC) ” on page 119 in this
document.
The team extends its gratitude to the Upstream Community, IBM and Red Hat Ceph
Documentation teams for their contributions to continuously improve Ceph documentation.
Preface xi
5715pref.fm Draft Document for Review November 28, 2023 12:22 am
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our papers to be as helpful as possible. Send us your comments about this paper or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
redbooks@us.ibm.com
Mail your comments to:
IBM Corporation, IBM Redbooks
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
1.1 Introduction
Before using the IBM Ceph Dashboard, we have to install some prerequisites that are already
documented in IBM Storage Ceph documentation.
We will start here just after the Ceph bootstrap command that among other configuration
tasks, installs and starts a Ceph Monitor daemon and a Ceph Manager daemon for a new
IBM Storage Ceph cluster on the local node as containers.
Note: You can also watch the video that describes how to install IBM Storage Ceph using
the command line.
The bootstrap process also provides the URL, login, and password to first connect to the IBM
Storage Ceph Dashboard.
Our sample cluster is made of four nodes, which is the minimum number supported by IBM.
See Figure 1-1.
=
The bootstrap node is ceph-node01, each node is connected to 4 HDD disks and 3 SSD
disks that will be used for OSD.
2. From this page, we can either edit the configuration or add a host. The first host was
installed by the boostrap, it supports all the daemons of the cluster.
We can choose to add one host at a time or multiple hosts at the same time. In our
configuration hosts from 02 to 04 gives ceph-node[02-04].demo.ibm.local. When adding
hosts, it is also convenient to select labels to apply. Labels are used to automatically
deploy the corresponding daemon on the host. See Figure 1-4 on page 4.
Tip: Clicking the question mark presents you with some examples.
Adding multiple hosts at once is faster but does not allow to specify different labels on
each host. On the other hand, adding one host at a time allows to select different labels
but can become cumbersome on large configuration. Here, every host runs the monitor
and the osd daemon so we can select these labels.
To spread the different workloads across all hosts, we want to achieve the following
daemon configuration, as shown in Table 1-1.
ceph-node01.de X X
mo.ibm.local
ceph-node02.de X X
mo.ibm.local
ceph-node03.de X X
mo.ibm.local
ceph-node04.de X X
mo.ibm.local
3. We have to edit ceph-node01 to add osd label, ceph-node02 to add mgr label,
ceph-node03 and ceph-node04 to add rgw label. We also need to move grafana daemon
from ceph-node01 to ceph-node04. See Figure 1-5 on page 5.
4. Deploying the daemons can take a while, but we can proceed to the OSD creation by
clicking Next.
In our example, we will use HDD drives as "Primary devices” and SSD drives as "DB to
support BlueStore to increase performance.
1. We first select the Advanced mode. See Figure 1-6.
2. Then, we click Add Primary devices. Using the filters on top right, we can choose to only
display hdd type. Once HDD devices are selected, we can click Add button. See
Figure 1-7 on page 6.
3. Once Primary devices are selected, we can add WAL devices and DB devices. Here we
choose to use ssd devices for DB devices only. See Figure 1-8 on page 6.
Note: Since the BlueStore journal is always placed on the fastest device, using a DB
device provides the same benefit that the WAL device provides while also allowing for
storing additional metadata, so no dedicated WAL is required.
But as an example, we can edit the Grafana service to move it to ceph-node04 to free up
some resources from ceph-node01.
1. We select grafana and select Edit in the drop-down list. See Figure 1-9.
2. Once done, we can review the configuration and then launch the configuration of the
cluster.
Hosts should display the four nodes and their instanced services. See Figure 1-11.
OSDs should show all the OSDs in and up, meaning they are available to be used. See
Figure 1-12 on page 9.
2. We can edit the Hosts list to select first 3 hosts and reduce Count to 3. See Figure 1-14
on page 10.
3. Checking Services page will rapidly show the new placement of the mon service.
In this example we are using Firefox, but the process should be quite similar on other
browsers.
1. Access Firefox Settings by clicking Application Settings (1), and then Settings (2), as
shown in Figure 1-15 on page 11.
2. Select Privacy & Security (1), go down on the right pane to find Certificates, then click
View Certificates…(2). See Figure 1-16.
3. On the Servers tab of Certificate Manager, click Add Exception… See Figure 1-17 on
page 12.
4. Add the ceph-node04 Grafana URL and click Get Certificate, then click Confirm
Security Exception. See Figure 1-18.
5. We can click OK to validate the configuration and close the Settings tab.
The radosgw is a client to Ceph Storage that provides object access to other client
applications. Client applications use standard APIs to communicate with the RADOS
Gateway, and the RADOS Gateway uses librados module calls to communicate with the Ceph
cluster.
The RGW is a separate service that externally connects to a Ceph cluster and provides object
storage access to its clients. In a production environment, it's recommended that you run
more than one instance of the RGW masked by a Load Balancer.
2. We can verify the deployment of the service on the two selected hosts
3. The Pools tab (Cluster → Pools) will now show the creation of the 3 pools required by the
rgw service to store all its storage needs.
The type is of course ingress, it will use the rgw default backend service created, be deployed
on the hosts with the rgw label, with a virtual IP address a frontend port and a monitor port.
2. We specify the name default.rgw.buckets.data, select erasure as the pool type and
select the rgw label in Applications to deploy the pool on rgw hosts. Click on the + to be
able to create our own Erasure Code profile. See Figure 1-23.
3. Let us create a k=4, m=2 profile, that will allow the loss of 2 OSDs (m=2) by distributing an
object on 6 (k+m=6) OSDs, to do so complete the dialog box as shown in Figure 1-24 on
page 18. Then click Create EC Profile, then Create Pool.
Note: These settings are for demonstration purpose only. For a supported production
environment, 7 (4+2+1) Ceph nodes would be required for such Erasure Coding settings.
Here we only need to specify the user ID and the Full name. Let us use demouser in both
cases. The user s3 key will be created automatically.
We have created a user, so we can now create a bucket and assign it to them.
2. Select Object Gateway → Buckets then Create and complete the dialog box as follow.
Placement target corresponds to the EC pool we created previously, meaning the bucket
will be protected using erasure coding. See Figure 1-26 on page 20.
Note: Locking allows to protect the bucket and its content from deletion for the specified
period of time.
2. Using the RHEL workstation, open a terminal and enter aws configure, then copy and
paste the keys from the Dashboard. See Figure 1-28 on page 21.
3. Then, type Enter twice to keep Default region name and Default output format to None.
AWS CLI is now configured to use your bucket. Let us copy a file to it by entering the
following command:
[demouser@workstation ~]$ aws --endpoint http://10.0.0.15:8080 s3 cp <some
file> s3://mybucket
4. To list the content of the bucket, use the ls command:
[demouser@workstation ~]$ aws --endpoint http://10.0.0.15:8080 s3 ls
s3://mybucket
This will show the raw and logical space used on OSDs and also some statistics on the
default.rgw.buckets.data pool.
Note: In this capture, 73.79 GB is equivalent to 4x10 GB because of the 4x2 EC.
2. We give a name to the pool (rbd_pool) and we associate Applications with the rbd label.
See Figure 1-31 on page 23.
We can now use the rbd_mnt folder like any other folder in the filesystem, such as by moving
a file into the new mounted folder. We will use it to create snapshots. See Example 1-2.
Example 1-2 Moving the file into the new mounted folder
[demouser@workstation ~]$ mv test.file rbd_mnt/
[demouser@workstation ~]$ ll -h rbd_mnt/test.file
2. Create a snapshot. Select the Snapshots tab, then + Create. A name is automatically
assigned. See Figure 1-34.
3. Going back to the command console, we can now delete the file. See Example 1-3 on
page 26.
4. Rolling back a snapshot means overwriting the existing image, as it can be time
consuming it is recommended to use snapshot clone instead. To do so we must first
change the snapshot from unprotected to protected using the dropdown menu. See
Figure 1-35.
5. Next we will create a clone of the snapshot. In the dropdown menu, we select Clone and
enter the name of the clone. See Figure 1-36 on page 27.
6. In order to restore the deleted file, we create a new directory to mount the clone, then we
map the cloned disk before mounting it in the new directory. We can now copy or move the
deleted file in the rnd_mnt directory. See Example 1-4 on page 28.
As POSIX uses inodes to uniquely reference its content, CephFS offers a dedicated
component that allows CephFS client machines to locate the actual RADOS objects for a
given inode. This component is known as the MetaData Server or MDS.
Before a CephFS client can physically access the content of a file or a directory, technically
an inode, it contacts one of the active MDSs in the Ceph cluster to obtain the RADOS object
name for the inode. Once the MDS provides the CephFS client with the information, it can
contact the Object Storage Daemon that protects the Placement Group containing the data.
The CRUSH algorithm provides the object name to OSD mapping like for any Ceph access
method.
The MDS will also keep track of the metadata for the directory or the file such as ACLs.
To configure a Ceph File System you must deploy one or more MDSs (at least 2 to provide a
highly available architecture). Once the MDS components are up and running, you can then
create your shared file system.
This chapter describes how to configure a MetaData Server and create a file system through
the Ceph dashboard. The configuration will create two Ceph pools:
cephfs_data to contain the file system data.
cephfs_metadata to contain the file system metadata.
2. Create the cephFS pools: Select Pools and click + Create and complete the dialog box as
shown in Figure 1-39.
3. Repeat the same to create the cephfs_metadata pool with the following values:
– Name: cephfs_metadata
– Pool type: replicated
– Applications: cephfs
Once completed, you can verify your configuration. See Figure 1-40.
5. Next, we create a new user and allow him to access read/write the filesystem. See
Example 1-6.
Example 1-6
[root@ceph-node01 ~]# ceph fs authorize mycephfs client.demouser / rw
[client.demouser]
key = AQDr/CZl2+FUOxAA56mXDsY8tYUbnkdeP1Kj2w==
6. Once the key is generated, it can be used to mount the filesystem on the client using fstab,
for example. For this demonstration, we will use ceph-fuse, a userspace client for CephFS
filesystems, because it is easier to use.
To use ceph-fuse on the workstation we need to export the key to a keyring file and copy it
to the workstation. See Example 1-7.
Example 1-7 Export the key to a keyring file and copy it to the workstation
[root@ceph-node01 ~]# ceph auth get client.demouser -o
/etc/ceph/ceph.client.demouser.keyring
exported keyring for client.demouser
[root@ceph-node01 ~]# scp /etc/ceph/ceph.client.demouser.keyring
workstation:/etc/ceph/
8. The filesystem can now be used. From the Dashboard we can see the File Systems
properties. See Figure 1-41.
1.9 Monitoring
Ceph Dashboard also provides the storage administrator with monitoring information.
The bell icon in the upper right corner gives access to the last active notifications. Solved
notifications are automatically deleted from this list. Some failure notifications may even
provide the storage administrator with solutions to apply.
Each of this views can be focused on one host or one OSD. The example in Figure 1-45
shows ceph-node01 performance details. See Figure 1-45.
The Cluster menu also provides an easy access to the logs which is a real help to sort events
when troubleshooting a problem. See Figure 1-46 on page 35.
The Monitoring menu is also convenient to list active and previous alerts with detailed
information about it, but the most useful tab may be the Silences tab, which allows to easily
create silents on specific alerts when doing maintenance on the cluster to avoid flooding of
the logs. See Figure 1-47.
Overall, we saw that the Ceph Dashboard is a very powerful tools that can greatly ease the
day to day work of a storage administrator.
IBM Storage Ceph can be used as an easy and efficient way to build a data lakehouse for
IBM watsonx.data and for next-generation AI workloads. This chapter covers the
configuration steps for integrating IBM Storage Ceph and IBM watsonx.data and has the
following sections:
“Using IBM Storage Ceph with IBM watsonx.data” on page 38
“Creating users” on page 38
“Creating buckets” on page 42
“Performance optimizations” on page 54
“Querying less structured data (S3-Select)” on page 55
IBM Storage Ceph also supports multiple vendor open formats for analytic datasets and
enables different engines to access and share the same data, utilizing tools like Parquet,
Avro, Apache Orc, and more. In this chapter we will cover the configuration steps for
integrating IBM Storage Ceph and IBM watsonx.data.
2. The user creation page will prompt for some information about the user. Fill out all
mandatory fields as indicated by the red asterisks. When finished, click the blue Create
User button. See Figure 2-2 on page 39.
3. Upon creating a user, the user management pane should reflect the addition of the new
user. Click the arrow to see information about the newly created user. See Figure 2-3 on
page 40.
4. The user information is displayed on the Details tab. From there, click the Keys tab. See
Figure 2-4.
5. After selecting the Keys tab, click the blue Show button to view or copy the user's
credentials. See Figure 2-5 on page 41.
6. The access and secret key will be needed to add a bucket in the watsonx.data
Infrastructure Manager. See Figure 2-6.
The command output will include the access and secret keys that will be needed to add a
bucket to watsonx.data from the Infrastructure Manager.
2. The bucket creation page will prompt for some information about the bucket you wish to
create, fill out all mandatory fields as indicated by the red asterisks. Ensure the user
created in2.2, “Creating users” on page 38 is set as the bucket owner.
3. When finished, click the blue Create Bucket button.
Employing a credentials file eliminates the need to source environmental variables every time
you start a new shell session. The commands shown in Example 2-2 will create a credentials
file in the location expected by s5cmd and many other s3 utilities that utilize the S3 SDKs.
With a credentials file configured, you can now create a bucket using the following command:
s5cmd --endpoint-url http://s3.ceph.example.com mb s3://bucket-name
To avoid needing to specify the endpoint url with each command, you can create a bash
aliases follows:
alias s5cmd=' s5cmd --endpoint-url http://s3.ceph.example.com'
The alias command can be added to your .bashrc to ensure that it is configured whenever
you start a new session.
2. From the dropdown menu select Add Bucket. See Figure 2-9 on page 45.
3. This will open the Add bucket prompt where you will provide information about the bucket
being added to watsonx.data.
4. Select IBM Storage Ceph from the Bucket type dropdown menu. Then, populate the
fields for the bucket name, the endpoint for the IBM Storage Ceph cluster that is being
used with watsonx.data, as well as the access and secret keys. See Figure 2-10 on
page 46.
5. Once these steps are completed, a connection test can be performed to verify that
everything is functioning properly. See Figure 2-11 on page 47.
6. Upon successful connection, a “Successful” message along with a green checkmark will
be displayed. See Figure 2-12 on page 48.
7. Indicate whether the bucket should be activated now. See Figure 2-13 on page 49.
Important: Activating the bucket will terminate any in-flight queries against data in any
bucket.
8. Finally, the bucket will be associated with a catalog. Supported catalog types include
Apache Iceberg and Apache Hive. Apache Iceberg is the preferred catalog type when
using IBM Storage Ceph to persist lakehouse data. Iceberg tables are designed to work
well with object stores and engines will spend significantly less time planning queries that
operate against large tables.
2. You will be presented with a prompt where you can select an engine to associate with the
catalog. See Figure 2-15 on page 51.
3. Once selected you can press the red Save and restart engine button.
2. The pseudo-directory established through the command line can be viewed by navigating
to the Objects tab from the Bucket details page. See Figure 2-17.
2. To create a schema, select Create schema from the drop-down menu. A prompt will
appear, allowing you to define your schema's details. Select the catalog associated with
the IBM Ceph Storage bucket. Choose a descriptive name and use the path created in the
previous section. Finally, click the blue Create button. See Figure 2-19.
2.5.3 Partitioning
Partitioning tables based on the values in a specific column can significantly reduce query
processing time, particularly for queries involving predicate filtering (WHERE statements). A
common practice is to partition tables containing order or sales data by a specific time period,
such as day, month, or year. This approach enables query engines to eliminate irrelevant
partitions, minimizing the number of reads from the storage system and consequently
accelerating query execution.
2.5.4 Bucketing
Bucketing can speed up queries where the there is a filter on a column that was used for
bucketing by using a hash function to identify the objects for the corresponding bucket. The
buckets containing data not relevant to the query are pruned, resulting in fewer reads to the
storage system and subsequently faster query times. The downside of bucketing is that data
cannot be inserted into the table by introducing new objects. Instead, buckets need to be
rewritten.
support the ANALYZE statement, which can be used to collect table and column statistics for
Hive tables. Apache Iceberg tables improve on this by storing partition statistics in table
metadata. In other words, statistics are calculated when writing to Iceberg tables. This has
clear advantages over periodic table and column analysis.
For more information, see S3 select operations (Technology Preview) - IBM Documentation.
By sending a notification when an object is synced to a zone, external systems can gain
insight into object-level syncing status. The bucket notification event types s3:ObjectSynced:*
and s3:ObjectSynced:Created, when configured via the bucket notification mechanism,
trigger a notification event from the synced RGW upon successful object synchronization. It's
important to note that both the topics and the notification configuration should be set up
separately in each zone that generates notification events.
Among the many other awards, the bank has received, Euromoney named İşbank Central
and Eastern Europe’s Best Digital Bank1, while it was also crowned “Turkish Bank of the
Year” at The Financial Times’ The Banker Awards 2.
Object Storage as a Service (OSaaS) is a delivery model that brings the benefits of cloud-like
object storage to an organization's on-premises, private cloud infrastructure. With OSaaS,
organizations can leverage the scalability, flexibility, and cost-effectiveness of object storage
without having to manage the underlying hardware and software.
1
https://www.euromoney.com/article/2a3xfv52m4i9d6mrd6j9c/awards/awards-for-excellence/cees-best-digital-bank-
2022-isbank
2 https://www.isbank.com.tr/en/about-us/the-banker-crowns-isbank-bank-of-the-year
Balance the need for cost-effective storage solutions with the need for seamless
integration with emerging technologies like cloud computing.
Adopt a proactive approach to data management in order to stay competitive and meet
their evolving storage needs effectively.
Empowering users with self-service tools for Day-2 operations, such as creating new users,
updating their quotas, and monitoring usage, promotes agility and operational efficiency. This
reduces administrative bottlenecks and empowers teams to respond swiftly to evolving
storage needs, enhancing overall productivity.
A self-service model also ensures cost optimization, as organizations can closely align
storage resources with actual usage, eliminating waste. Additionally, with robust monitoring
and reporting features, businesses gain valuable insights into storage patterns, enabling
data-driven decision-making and strategic planning for their needs.
In summary, an OSaaS model with a strong focus on self-service capabilities can provide
enterprises with the following benefits:
Increased agility and operational efficiency.
Reduced administrative overhead.
Cost optimization.
Improved decision-making.
As a result, OSaaS can be a valuable tool for enterprises looking to overcome their persistent
storage challenges.
This architectural design can be a valuable solution for organizations that are looking to
improve their storage management practices.
Creating a user
Much like the public cloud approach, the process commences with the creation of a user in
İşbank. This crucial initial step grants authorized individuals the ability to effortlessly establish
user accounts that are specifically tailored to their unique storage demands and preferences.
2. The end user making the request can view user-related information on the portal and also
have the information transmitted via email (Figure 3-2 on page 61). The sample
integration at İşbank has been developed using Python and Javascript.
Figure 3-2 The requester can receive an email containing the keys to access the environment
2. Users who receive this alert can also fulfill their needs by submitting quota increase
requests through the S3 user catalog item on the portal (see Figure 3-4).
Figure 3-4 A sample view of the catalog item used during quota updates
Figure 3-5 shows the IBM Storage Ceph API response that indicates that quota increase
request is completed.
IBM Storage Ceph provides a Prometheus exporter to collect and export IBM Storage Ceph
performance counters from the collection point in ceph-mgr. Enabling the Prometheus module
in the mgr service will initiate the collection of detailed metrics with the option to use custom
scrape intervals. (Figure 3-6 and Figure 3-7).
Tip: Prometheus Manager module needs to be restarted after changing its configuration.
Figure 3-6 Sample Grafana dashboard showing one of the pools usage and its performance
Figure 3-7 Sample Grafana dashboard showing one of the pools usage and its performance
One of the important factors related to performance and monitoring is logging. RGW
operation logs can be written to the respective directory on each RGW server using the
globally configured rgw_ops_log_file_path setting. (Figure 3-8).
Figure 3-8 The global config of the log directory for RGW operations
Using the Filebeat agent with a ready integration for Logstash, logs in the relevant directory
on the RGW server are directed to Logstash for indexing. Relevant sections parsed from the
logs can be filtered and viewed through the Kibana interface (Figure 3-9 on page 64).
Note: Authorization is required to allow users access to the relevant index pattern
containing RGW logs.
Figure 3-9 Kibana view of the "IBM Storage Ceph" index pattern
3.3.4 Documentation
You should maintain an up-to-date documentation link that summarizes your S3
environment's architecture, details how to use your object storage infrastructure, and provides
valuable information such as architectural standards, usage recommendations, and S3
endpoint information. This information will benefit the admin teams responsible for managing
the infrastructure of object storage.
Figure 3-10 on page 65 shows a sample documentation view detailing all the required
information for the environment.
Figure 3-10 Sample documentation view detailing all the required information for the environment
4.1 Introduction
Backup and recovery are essential IT operations for data protection and business continuity.
As data volumes explode, organizations struggle to find cost-effective and scalable backup
solutions that meet stringent recovery time objectives (RTOs). Traditional backup appliances
and public cloud options have limitations, making it difficult to achieve both cost-effectiveness
and RTO compliance.
To address these challenges, many organizations are turning to on-premises object storage
solutions like IBM Storage Ceph to extend cloud models for backup and recovery in a
cost-effective way. IBM Storage Ceph is a software-defined data storage solution that can
help lower enterprise data storage costs. By leveraging the S3-compatible interface of IBM
Storage Ceph, organizations can use it as a scalable and reliable backup target.
Implementing Veritas NetBackup with IBM Storage Ceph offers several advantages for
organizations:
Note: In this chapter we discuss IBM Storage Ceph, but this scenario also applies to Red
Hat Ceph.
You need to configure Veritas NetBackup to use IBM Storage Ceph as the cloud storage
provider using Amazon S3 API. See Figure 4-1 on page 69. Veritas NetBackup version 10.1.1
or later supports IBM Storage Ceph as a backup target via the S3 application programming
interface (API). A list of S3 cloud storage vendors certified for NetBackup can be found here.
Figure 4-2 shows Veritas NetBackup with IBM Storage Ceph implementation for a single site.
Figure 4-2 Veritas NetBackup with IBM Storage Ceph for a single site
Chapter 4. Disaster recovery and backup and archive: IBM Storage Ceph as an S3 Backup Target 69
Disaster recovery.fm Draft Document for Review November 28, 2023 12:22 am
Organizations can also implement IBM Storage Ceph using multi-site active/active replication
for a more resilient solution. In this case, the NetBackup Service URL points to a load
balancer, which can fail over access to the DR Ceph cluster if the production Ceph cluster is
unavailable. When NetBackup is accessing the DR Ceph cluster, backup and restore
operations can continue without interruption. When the Prod IBM Storage Ceph cluster is
back online, the data will be synchronized, and the load balancer will failback the access to
the Prod IBM Storage Ceph cluster.
Figure 4-3 IBM Storage Ceph multi-site replication with Veritas NetBackup
Figure 4-4 Two Ceph clusters are configured with multi-site RGW replication fronted by a HAProxy server for Veritas
NetBackup
Perform the following steps to configure storage server for IBM Storage Ceph:
1. Click Add storage server. See Figure 4-5.
2. Select Cloud storage and click Start. See Figure 4-6 on page 72.
Chapter 4. Disaster recovery and backup and archive: IBM Storage Ceph as an S3 Backup Target 71
Disaster recovery.fm Draft Document for Review November 28, 2023 12:22 am
3. Select Red Hat Ceph Storage and continue. See Figure 4-7.
4. Add a region to Red Hat Ceph Storage. The location constraint is the identifier that cloud
providers use to access buckets in the associated region. For public clouds that support
AWS v4 signing, you must specify the location constraint.
For on-premises Ceph clusters, both the region name and location constraint should refer
to the Ceph Object Gateway zone group, rgw-zonegroup. The zone group used in the
Ceph cluster is multizg. The service URL is the endpoint for object storage, which in our
case is the HAProxy server. Leave the rest of the parameters at their default values. Click
Add. See Figure 4-8 on page 73.
5. The region is added, as shown in Figure 4-9 on page 74. Click Next.
Chapter 4. Disaster recovery and backup and archive: IBM Storage Ceph as an S3 Backup Target 73
Disaster recovery.fm Draft Document for Review November 28, 2023 12:22 am
6. In the next window, as shown in Figure 4-10, enter the access key and secret access key
and click Next. This is the access key and secret access key of the Ceph S3 user.
Figure 4-10 Enter the access key and secret access key
7. NetBackup divides the backup data into chunks called objects. The performance of
NetBackup to S3 storage is determined by the combination of object size, number of
parallel connections and the read or write buffer size. By default the object size is 32 MB
and compression is enabled. Click Next in the next window (Figure 4-11 on page 75).
Chapter 4. Disaster recovery and backup and archive: IBM Storage Ceph as an S3 Backup Target 75
Disaster recovery.fm Draft Document for Review November 28, 2023 12:22 am
Figure 4-14 on page 77 shows an example hardware configuration of a 1.2 PB IBM Storage
Ceph cluster.
Figure 4-14 Example hardware configuration of a 1.2 PB IBM Storage Ceph cluster
The Premium edition includes the Red Hat Enterprise Linux (RHEL) server operating system
software license and support.
The Pro edition does not include Red Hat Enterprise Linux (RHEL) server operating system
software license and therefore organizations have the flexibility to bring their own or use their
existing RHEL licenses.
Chapter 4. Disaster recovery and backup and archive: IBM Storage Ceph as an S3 Backup Target 77
Disaster recovery.fm Draft Document for Review November 28, 2023 12:22 am
Objects in S3 are stored in buckets. Buckets can be created by users and can store an
unlimited number of objects. In this chapter, we will show you some real-world use cases
where we use bucket notifications.
5.1 Introduction
Objects in S3 are stored in buckets. Buckets can be created by users and can store an
unlimited number of objects. When an object is created, deleted, replicated, or otherwise
changed in a bucket, you can receive a notification so that you can immediately take action.
S3 bucket notifications can be very useful for a variety of tasks, such as:
Automating workflows: For example, you could use S3 bucket notifications to trigger a
Lambda function to resize an image file whenever a new image is uploaded to a bucket.
Monitoring changes: You could use S3 bucket notifications to monitor changes to your
data and send alerts if something unexpected happens.
Auditing: You could use S3 bucket notifications to log all changes to your data so that you
can track who made what changes and when.
We can configure S3 to generate events for a specific set of actions (such as GET, PUT) on a
per-bucket basis. These events are sent to a configured endpoint, which can then trigger the
user's defined next steps.
We can then use the bucket notification feature in IBM Storage Ceph to send a message to a
Kafka topic whenever a bucket is uploaded or deleted. The topic can then be consumed by a
serverless OpenShift service like Knative, which can scale from 0 to the number of compute
services needed to process the ingested data in the S3 object store. See Figure 5-2 on
page 81.
As a prerequisite, we need one of those endpoints configured and accessible from the Ceph
nodes where the RGW services are running.
Next, we create a topic in RGW that points to the preconfigured endpoint where we want to
send events for each action that happens on the bucket, such as object PUT, DELETE, and
so on.
After we configure the S3/RGW topic, we need to define at the bucket level which topic the
bucket will send notifications to, and which actions will trigger new events to be sent to the
topic.
At this point we start receiving notifications in the Kafka Topic called BucketEvents when a
user PUTs or DELETEs an object in bucket DemoBucket.
Synchronous notifications
When the original bucket notification was released, notifications were sent synchronously, as
part of the operation that triggered them. In this mode, the operation is acknowledged
(ACKed) only after the notification is sent to the topic’s configured endpoint. This means that
the round-trip time of the notification (the time it takes to send the notification to the topic’s
endpoint plus the time it takes to receive the acknowledgment) is added to the latency of the
RGW operation itself.
As you can imagine this implementation has its drawbacks in certain situations.
Even if the message is acknowledged by the endpoint, it is considered successfully
delivered and will not be retried.
When the messaging endpoint is down, using synchronous notifications will slow down
your production S3 operations in the RGW as RGW will not get an acknowledgement until
the messaging endpoint times out.
Synchronous notifications are not retried, so if an event or message is produced while the
endpoint is unavailable or down, it will not be sent.
Asynchronous notifications
Persistent bucket notifications were introduced with the IBM Storage Ceph 5.3 Pacific
version. The idea behind this new feature was to allow for reliable and asynchronous delivery
of notifications from the RADOS Gateway (RGW) to the endpoint configured at the topic.
With persistent notifications, RGW will retry sending notifications even if the endpoint is down
or a network disconnect occurs during the operation (that is, notifications are retried if not
successfully delivered to the endpoint).
So, we will have X number of hospitals (edge) providing chest X-ray images, some normal
and some with pneumonia. The images will be sent to a central site, a laboratory where we
have an AI/ML model that can be trained to infer the images and determine whether the new
images being processed at the hospitals are from patients with pneumonia.
All of the data modelling and training can be done with AI tools, for example watsonx.ai.
We have an AI/ML model, but we need to ensure that it can scale on demand as the number
of hospitals using the service increases over time, leading to an increased number of images
that need to be processed.
We need to analyze images as they are uploaded into the system at the hospitals and give
the results of the analysis to the doctor. We also want to send all images to the central site so
we can retrain the model for improvements and then seamlessly redeploy the updated model
at all the edge sites (hospitals).
1. Ingest chest X-rays into an S3 object store based on IBM Storage Ceph.
2. The S3 IBM Ceph Object store sends notifications to a Kafka topic.
3. A Knative Eventing KafkaSource Listener triggers a Knative Serving function when it
receives a message from a Kafka topic.
4. An ML-trained model running in a pod/container assesses the risk of pneumonia for
incoming images.
5. The application notifies the doctor of the results
6. If the model's certainty in the result is low, anonymize the image data and send it to an S3
object storage located at the Central Science Lab.
This pipeline is showcased by Guillaume Moutier from Red Hat in this OpenShift Commons
YouTube video (slides are available here). There is also information on how to demo this
event-driven pattern at this link.
5.5.1 Prerequisites
Let us start with the prerequisites.
1. We need to have an RGW service configured and running. In our example, we have an
RGW service running on node ceph-node02 on port 8080. See Example 5-1.
4. Once the Kafka topic is set up, we will create a new set of S3 credentials (an access key
and secret key) via the RGW. We will then use an S3 CLI client, such as the AWS CLI, to
create a bucket called demobucket. See Example 5-4.
Note: You can use any other S3 client, We downloaded and installed the AWS CLI
client following the instructions at this link.
2. Once the JSON file is created, we can use the AWS CLI to create the RGW topic. The
name of the RGW topic will match the name of the Kafka topic we created previously,
bucketevents. See Example 5-6.
"TopicArn": "arn:aws:sns:default::bucketevents"
}
$ aws --endpoint=http://ceph-node02:8080 sns list-topics
{
"Topics": [
{
"TopicArn": "arn:aws:sns:default::bucketevents"
}
]
}
3. The next step is to set up the bucket notification. We will add a new bucket notification to
our bucket demobucket. This notification will send an event to the RGW topic bucketevents
whenever an object is created or deleted. See Example 5-7.
In the above example, we set the ID/name of our topic to kafkanotifications. We set the
TopicArn to the ARN of the RGW topic we created in the previous step (Example 5-6). Finally,
we specified the actions that will trigger events, which in this case are object creation and
deletion (Example 5-7). To get a list of all supported actions/events follow this link.
Once the JSON notify file has been created, we can apply its configuration to the demobucket.
See Example 5-8.
]
}
At this point if the setup is working as expected, any new object creation or deletion will send
a bucket notification to the Kafka topic.
Example 5-11 client has consumed the event notification created by the creation of an object
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "kafkanotification",
"bucket": {
"name": "demobucket",
"ownerIdentity": {
"principalId": "user1"
},
"arn": "arn:aws:s3:::demobucket",
"id": "2c814c32-819c-4508-985a-bde6e18e46eb.24172.3"
},
"object": {
"key": "hosts",
"size": 1330,
"eTag": "b7828f6b873e43d1bacec3670a9f4510",
"versionId": "",
"sequencer": "0F381465861F2223",
"metadata": [
{
"key": "x-amz-content-sha256",
"val":
"1ef29c6abb4b7bc3cc04fb804b1850aecf84dc87493330b66c31bef5209680f3"
},
{
"key": "x-amz-date",
"val": "20230927T141127Z"
}
],
"tags": []
}
},
"eventId": "1695823887.589438.b7828f6b873e43d1bacec3670a9f4510",
"opaqueData": ""
}
]
}
In a real-life situation, we would have an application consuming events from this topic and
taking specific actions when an event is received.
If you have issues with the S3 bucket notification configuration, and the events from RGW are
not reaching Kafka, a good place to start is the RGW logs. You will need to enable debug
mode in the RGW subsystem. This link contains instruction on how to enable debug mode for
the RGW service.
inside your OpenShift cluster (There is also the possibility of connecting Fusion Data
Foundation in external mode to an already deployed IBM Storage Ceph stand-alone cluster).
Deploying Ceph in OpenShift gives us the advantage of all the automation and lifecycle
management provided by OpenShift Operators. In the case of Fusion Data Foundation, we
have the Rook Operator taking care of deploying Ceph. If we are deploying on an OpenShift
on-premises cluster, we will get the RadosGW service, which provides S3 object storage
functionality, deployed for us.
One of the advantages of working with Fusion Data Foundation is the ease of use. As an
example we will show how to configure S3 bucket notifications in 3 easy steps with the help of
the Custom Resources provided by Rook.
5.6.1 Prerequisites
The following are the prerequisities.
1. Ensure that OpenShift cluster is running. See Example 5-12.
Example 5-13 Ensure that IBM Storage Fusion with the Fusion Data Foundation service deployed
$ oc get csv -n ibm-spectrum-fusion-ns
NAME DISPLAY VERSION REPLACES PHASE
isf-operator.v2.6.1 IBM Storage Fusion 2.6.1 isf-operator.v2.6.0
Succeeded
$ oc get storagecluster -n openshift-storage
NAME AGE PHASE EXTERNAL CREATED AT VERSION
ocs-storagecluster 4h39m Ready 2023-10-02T10:36:03Z 4.13.2
$ oc get cephcluster -n openshift-storage
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE
MESSAGE HEALTH EXTERNAL FSID
ocs-storagecluster-cephcluster /var/lib/rook 3 4h40m Ready
Cluster created successfully HEALTH_OK
684c0d97-8fea-471d-84fb-b78c91a1e769
3. We need to configure a RADOS Gateway (RGW) cephobjectstores custom resource.
This custom resource deploys the RGW service, which gives us access to our Ceph
Storage S3 RGW endpoint. We also have an OpenShift service created and an external
route, if S3 connectivity is required from outside the OpenShift cluster. Here is an
improved version of your text. We are deploying the service without SSL for simplicity. For
production deployments, HTTPS/SSL is recommended. The RGW service will get
deployed with FDF out of the box for on-premises deployments. If you are running on a
cloud deployment, you can follow this link for instructions on how to set up the
cephobjectstores custom resource step by step. See Example 5-14.
Notifications
A CephBucketNotification defines what bucket actions trigger the notification and which topic
to send notifications to. A CephBucketNotification may also define a filter, based on the
object's name and other object attributes. Notifications can be associated with buckets
created via ObjectBucketClaims by adding labels to an ObjectBucketClaim.
Topics
A CephBucketTopic represents an endpoint (of types: Kafka, AMQP0.9.1 or HTTP), or a
specific resource inside this endpoint (e.g. a Kafka or an AMQP topic, or a specific URI in an
HTTP server). The CephBucketTopic also holds any additional info needed for a
CephObjectStore's RADOS Gateways (RGW) to connect to the endpoint. Topics don't belong
to a specific bucket or notification. Notifications from multiple buckets may be sent to the
same topic, and one bucket (via multiple CephBucketNotifications) may send notifications to
multiple topics.
1. We will start with the configuration of the Topic, using the CephBucketTopic custom
resource provided by rook, we will name our topic `bucketevents`, most of the parameters
have already been explained in 5.5, “Step by step basic example of configuring event
notifications for a bucket in IBM Storage Ceph” on page 85. You can follow this link if you
would like to see a detailed explanation of any of the parameters we are using in this
example.
As the URI for Kafka, we are using the service FQDN as we will be accessing the Kafka
endpoint from a different namespace from where Kafka is running. In this example, Kafka
is running in the namespace data-pipeline, and our topic is configured in the
openshift-storage namespace. See Example 5-17.
bucket-notification 29m
3. Next using the RGW storage class, we create an Object Bucket Claim (OBC). The OBC
will take care of creating a bucket and user credentials in RGW for us. By using bucket
notification labels, OBC will also configure the bucket notification, In our case, we will use
the bucket notification we created in the previous step, called bucket-notification. See
Example 5-19.
2. We add the credentials to our AWS CLI credentials file $HOME/.aws/credentials with a
profile name of notify. See Example 5-21.
EOF
3. We are using the AWS CLI client from inside OCP so we will use the OCP service FQDN
as the RGW S3 endpoint. We can get the OCP service FQDN from the OBC config map
with the same name. See Example 5-22.
Example 5-22 get the OCP service FQDN from the OBC config map with the same name.
$ oc get cm/ceph-noti-bucket -o json | jq .data
"BUCKET_HOST":
"rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc",
"BUCKET_NAME": "databucket2-f9c6479b-3a22-45b4-9564-a9507b4d5d93",
"BUCKET_PORT": "80",
"BUCKET_REGION": "",
"BUCKET_SUBREGION": ""
4. Now that the AWS CLI is configured, we can run the get-bucket-notification s3api
command to check if the configuration is in place for our bucket databucket2. See
Example 5-23.
Example 5-23 Check if the configuration is in place for our bucket databucket2
$ aws --profile notify
--endpoint=http://rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-stora
ge.svc s3api get-bucket-notification-configuration --bucket
databucket2-f9c6479b-3a22-45b4-9564-a9507b4d5d93 | jq .
"TopicConfigurations": [
{
"Id": "bucket-notification",
"TopicArn": "arn:aws:sns:ocs-storagecluster-cephobjectstore::bucketevents",
"Events": [
"s3:ObjectCreated:Put",
"s3:ObjectCreated:Copy"
]
}
]
5. As a final verification, let us upload something to the bucket and check if it triggers the
event and sends it to our Kafka topic bucketevents. See Example 5-24.
Example 5-24 Upload something to the bucket and check if it triggers the event
$ aws --profile notify
--endpoint=http://rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-stora
ge.svc s3 cp /etc/resolv.conf
s3://databucket2-f9c6479b-3a22-45b4-9564-a9507b4d5d93
upload: ../../etc/resolv.conf to
s3://databucket2-f9c6479b-3a22-45b4-9564-a9507b4d5d93/resolv.conf
6. To double-check that our configuration is functional and that the events are reaching their
destination topic, bucketevents, we will use the Kafka consumer available in the Kafka
pod. See Example 5-25.
Example 5-25 Double-check that our configuration is functional and that the events are reaching their
destination topic
$ oc exec -n data-pipeline -it kafka-clus-kafka-0 -- bash
$ /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server
kafka-clus-kafka-brokers:9092 --topic bucketevents --from-beginning
{"Records":[{"eventVersion":"2.2","eventSource":"ceph:s3","awsRegion":"ocs-storage
cluster-cephobjectstore","eventTime":"2023-10-03T07:00:19.271850Z","eventName":"Ob
jectCreated:Put","userIdentity":{"principalId":"obc-openshift-storage-ceph-noti-bu
cket-684c0d97-8fea-471d-84fb-b78c91a1e769"},"requestParameters":{"sourceIPAddress"
:""},"responseElements":{"x-amz-request-id":"479fd891-23fe-4435-8006-31a4be9cc7ff.
74097.11704580900520310930","x-amz-id-2":"12171-ocs-storagecluster-cephobjectstore
-ocs-storagecluster-cephobjectstore"},"s3":{"s3SchemaVersion":"1.0","configuration
Id":"bucket-notification2","bucket":{"name":"databucket2-056d7d8b-b626-4ad6-99fe-b
a5e147561bd","ownerIdentity":{"principalId":"obc-openshift-storage-ceph-noti-bucke
t2-684c0d97-8fea-471d-84fb-b78c91a1e769"},"arn":"arn:aws:s3:ocs-storagecluster-cep
hobjectstore::databucket2-056d7d8b-b626-4ad6-99fe-ba5e147561bd","id":"479fd891-23f
e-4435-8006-31a4be9cc7ff.75784.1"},"object":{"key":"resolv.conf","size":920,"eTag"
:"698a41f5f188c0d3a9e4298fde4631f4","versionId":"","sequencer":"03BC1B65C32C3712",
"metadata":[{"key":"x-amz-content-sha256","val":"ebdf560272a77357195c39e98340b77e1
8c8a8ce2025ee950e9e0c7b01467ab8"},{"key":"x-amz-date","val":"20231003T070018Z"}],"
tags":[]}},"eventId":"1696316419.305605.698a41f5f188c0d3a9e4298fde4631f4","opaqueD
ata":"my@email.com"}]}
With this final verification we have concluded that our configuration is working and we are
able to configure S3 bucket notifications in our OpenShift application buckets.
Fusion Data Foundation services are primarily made available to applications in the form of
storage classes that represent the following components:
– Block storage devices, catering primarily to database workloads, for example, Red Hat
OpenShift Container Platform logging and monitoring, and PostgreSQL.
– Shared and distributed file system, catering primarily to software development,
messaging, and data aggregation workloads, for example, Jenkins build sources and
artifacts, Wordpress uploaded content, Red Hat OpenShift Container Platform registry,
and messaging using JBoss AMQ.
– Multi-cloud object storage, featuring a lightweight S3 API endpoint that can abstract the
storage and retrieval of data from multiple cloud object stores.
– On-premises object storage, featuring a robust S3 API endpoint that scales to tens of
petabytes and billions of objects, primarily targeting data intensive applications, for
example, the storage and access of row, columnar, and semi-structured data with
applications like Spark, Presto, Red Hat AMQ Streams (Kafka), and even machine
learning frameworks like TensorFlow and PyTorch.
Note: Throughout this document, IBM Storage Fusion Data Foundation (FDF) and Red
Hat OpenShift Data Foundation (ODF) are used interchangeably because they have the
same functionality.
single OpenShift cluster, stretched across two data centers with low latency and one
arbiter node.
– Regional-DR (Tech Preview)
Regional-DR ensures business continuity during the unavailability of a geographical
region, accepting some loss of data in a predictable amount. In the public cloud, this
would be similar to protecting from a regional failure. This solution is based on
asynchronous volume-level replication for Block and File volumes.
Zone failure in Metro-DR and region failure in Regional-DR are usually expressed
using the terms Recovery Point Objective (RPO) and Recovery Time Objective (RTO).
– RPO
RPO measures how often you back up or snapshot persistent data. In practice, It
represents the maximum amount of data that can be lost or need to be reentered after
an outage.
– RTO
RTO is the maximum amount of time a business can be down without incurring
unacceptable losses. It answers the question, "How long can our system take to
recover after we are notified of a business disruption?"
Table 6-1 summarizes the different offerings and corresponding use cases.
Regional-DR solution comprises Red Hat Advanced Cluster Management for Kubernetes
(RHACM) and OpenShift Data Foundation (ODF) or Fusion Data Foundation (FDF)
components to provide application and data mobility across OpenShift Container Platform
clusters.
Figure 6-1 on page 101 shows the IBM Fusion Data Foundation RegionalDR offering.
Figure 6-2 on page 102 shows the IBM Fusion Data Foundation MetroDR offering.
The following lists the details of the Stretched Cluster OpenShift DR offering details:
– One OpenShift and one FDF internal cluster stretched across 2/3 data centers.
– FDF is deployed internally. No external IBM Storage Ceph is required.
– No Advanced Cluster Manager is required.
– Supported platforms: VMware and baremetal only.
– Data types: Block, file and object.
– Latency: <10 ms RTT for all nodes.
– Network: High bandwidth (Campus Network).
Figure 6-3 shows the IBM Fusion Data Foundation stretched mode offering.
Ceph is designed for reliable networks and cluster components, with random failures
assumed across the CRUSH map. For switch failures, Ceph is designed to route around the
loss and maintain data integrity with the remaining OSDs and monitors. However, in some
cases, such as a "stretched-cluster" deployment where a significant portion of the cluster
relies on a single network component, stretch mode may be necessary to ensure data
integrity.
Ceph supports two standard configurations: a two-site setup and a three-site setup. In the
two-site setup, Ceph expects each site to have a copy of the data and a third site with a
tiebreaker monitor. This monitor selects a winner if the network connection fails and both data
centers are operational. The tiebreaker monitor can be a VM and may have high latency
compared to the main sites.
IBM Storage Ceph, when being configured in stretched mode, offers synchronous replications
between sites for all of its clients: RBD (block), CephFS (SharedFS) and RGW (Object).
MetroDR is a synchronous replication solution. Write IO at the primary site is replicated to the
secondary site before returning the write acknowledgment. This increases the latency of write
workloads.
By default, Ceph uses a replication factor of 4, meaning that two copies of each data object
are stored at the primary site and two copies are stored at the secondary site.
Both of the previous points, synchronous replication, and having to do 4 copies of the data
degrades the IOPS performance that you can achieve with MetroDR, especially noticeable
with random writes of small blocks.
A deep dive into MetroDR concepts is available in this OpenShift Data Foundation video
series:
While Ceph prioritizes data integrity and consistency, stretched clusters may compromise
data availability.
even though the primary PG cannot replicate data (a situation that, under normal
non-netsplit circumstances, would result in the marking of affected OSDs as down and
their removal from the PG). If this happens, Ceph cannot satisfy its durability guarantees,
and consequently, and IO will be blocked.
Constraints do not guarantee the replication of data across data centers: However, it
may seem that the data is correctly replicated across data centers. For example, in a
scenario with two data centers (A and B) and a CRUSH rule that targets three replicas
with a min_size of 2, the PG may go active with two replicas in A and zero replicas in B.
This means that if A is lost, the data will be lost and Ceph will not be able to operate on it.
This situation is surprisingly difficult to avoid using only standard CRUSH rules.
In this section, Figure 6-4 will be explained in detail, going through the most important
aspects to consider around networking, server hardware and Ceph component placement.
Network
In a stretched architecture, the network is crucial for the overall health and performance of the
cluster.
IBM Storage Ceph has Level 3 routing capabilities, allowing for different subnets/CIDRs on
each site and enabling communication between Ceph servers and components via L3
routing.
IBM Storage Ceph can be configured with two different networks the public and cluster
network. The Ceph public network must be accessible and connected between all three sites
(data + arbiter Site), as all IBM Storage Ceph services such as MONS, OSDs, RGW require
communication over the public network. The private or cluster network is used by the OSDS
services, so it only needs to be configured on the two data sites where the OSDS reside.
There is no need to configure the cluster network on the arbiter site.
Network reliability is of huge importance. Flapping networks between the three sites will
introduce data availability and performance issues into the cluster. It is not just about the
network being accessible but also the consistency of its latency. Frequent spikes in network
latency can lead to false alarms and other problems.
Regarding latency, a minimum of 10ms RTT between the data sites is required to run an IBM
Storage Ceph cluster in stretched mode. By data sites, we mean the sites with servers
containing OSDs (disks). The latency to the arbiter site can be as high as 100ms RTT, so it
may be beneficial to deploy the arbiter node as a VM in a cloud provider, if allowed by
security.
Suppose the arbiter node is configured on a cloud provider or a remote network that goes
through a WAN. In that case, the recommendation is to set up a VPN between the data sites
and the arbiter site and enable encryption in transit. The messenger v2 encryption feature will
encrypt the communications between the different Ceph services, so all the communications
between the MON in the arbiter site and the other components on the data sites will be
encrypted.
Because of how Ceph works, latency between sites in a stretch cluster affects both the
stability of the cluster and the performance of every write. Ceph follows a strong consistency
model, which requires every write to be copied to all OSDs configured in the replica count
before the client receives an acknowledgment (ACK). This means that the client will not
receive the ACK for a write until two copies have been written to each site, adding the
round-trip time (RTT) between sites to the latency of every write workload.
Another important consideration is the throughput between the data sites. The maximum
client throughput will be limited by the connection (Gbit/s) between the sites. Recovery from a
failed node or site is also important. When a node fails, Ceph always performs IO recovery
from the primary OSD. This means that 66% of the recovery traffic will be remote, reading
from the other site and utilizing the inter-site bandwidth that is shared with client IO.
By default, Ceph always reads data from the primary OSD, regardless of its location. This can
cause a significant amount of cross-site read traffic. The read_from_local_replica feature
(primary OSD affinity) forces clients to read from the local OSD (the OSD in the same site)
instead, regardless of whether it is the primary OSD. Testing has shown that the
read_from_local_replica feature improves read performance, especially for smaller block
sizes, and reduces cross-site traffic. This feature is now available for RBD (block) in the
standard internal deployment of Fusion Data Foundation (FDF) in version 4.13/4.14, and
work is underway to make it available for MetroDR deployments. This is a vital feature for
stretched site setups, as it drastically reduces the number of times we need to contact OSDs
on the other site for reads, reducing inter-site traffic.
Hardware requirements
The requirements and recommendations for the IBM Storage Ceph Nodes hardware are the
same as the normal (non-stretched) deployments, except for a couple of differences we will
cover in the following paragraph. Here is a link to the official Hardware section in the IBM
Storage Ceph documentation.
Important differences:
IBM Storage Ceph in stretched mode only supports full flash configurations. HDD spindles
are not supported. The reasoning behind this is that if we lose a full datacenter for what
could be a long period, we still have two replicas/copies of the data available on the
remaining site; replica 2 configurations are only supported with flash media.
IBM Storage Ceph only supports replica 4 as the replication schema. Any other replication
or erasure coding scheme is not supported. So, we need to calculate our total usable
storage accordingly.
IBM Storage Ceph cluster can only be used for the MetroDR use case; it can’t be shared
for other IBM Storage Ceph use cases, such as an S3 object store for external workloads.
IBM Storage Ceph in stretched mode can only be deployed in baremetal or VMware
platforms.
VMware deployments of IBM Storage Ceph are supported for the MetroDR use case. The
following are some best practices for VMware deployments:
– You need to have reserved resources CPU and memory for the Ceph Virtual Machines.
– On the VMs set the latency sensitive option to high.
– In general, applying all VMware recommendations for low-latency use cases would
make sense.
VMware Storage provided to IBM Storage Ceph for the MetroDR use case supports using
.VMDK files for the VMs. However, it is important to note that Ceph will only perform as
well as the data store providing the .VMDKs. If we use a backing store using an
underperforming VSAN, we can expect Ceph performance to be abysmal. Conversely, if
we use a dedicated all-flash SAN datastore for a specific set of VMs, we can expect Ceph
performance to be much better.
Avoid using VSAN datastores for Ceph node drives, as the performance will not be
satisfactory for any IO-intensive applications.
Component placement
Ceph services (MONs, OSDs, RGWs and so forth) must be placed in a way that eliminates
single points of failure and ensures that the cluster can tolerate the loss of a full site without
affecting client service.
MONs: A minimum of 5 MONs are required, 2 MONs per data site, and 1 MON on the
arbiter site. This configuration ensures that quorum is maintained with more than 50% of
MONs available even when a full site is lost.
MGR: We can configure two or four managers, with a minimum of one manager per data
site. Four managers is recommended to provide high availability with an active/passive
pair on the remaining site in the event of a data site failure.
OSDs: Distribute equally across all the nodes in both of the data sites. Custom CRUSH
rules providing two copies in each site (using 4 copies) must be created when configuring
the stretch mode in the Ceph cluster.
RGWs: The minimum recommended is four RGW services, two per data site, to ensure
that we can still provide high availability for object storage on the remaining site in the
event of a site failure.
MDS: For CephFS Metadata services, the minimum recommended is four MDS services,
two per data site. In the case of a site failure, we will still have two MDS services on the
remaining site, one active and the other one on standby.
During the cluster bootstrap using the cephadm deployment tool, we can inject a service
definition yaml file that will do most of the cluster configuration in a single step. Example 6-1
shows an example of how to use a template when deploying an IBM Storage Ceph Cluster
configured in stretch mode. Remember, this is just an example that must be tailored to your
deployment.
Example 6-1 Example of deploying an IBM Storage Ceph Cluster configured in stretch mode
cat <<EOF > /root/cluster-spec.yaml
service_type: host ## service_type host, adds hosts to the cluster
addr: 10.0.40.2 ## IPs, hostnames, etc will need to be replaced
hostname: ceph1 ## depending on each individual use case
location: ## this is just an example template
root: default
datacenter: DC1 ## DC1 is the label we set in the crushmap
labels:
- osd ## Placement of the services will be done with labels
- mon ## This host can be scheduled to run OSD,MON and MGR
- mgr ## services
---
service_type: host
addr: 10.0.40.3
hostname: ceph2
location:
datacenter: DC1
labels:
- osd
- mon
- mgr
---
service_type: host
addr: 10.0.40.4
hostname: ceph3
location:
datacenter: DC1
labels:
- osd
- mds
- rgw
---
service_type: host
addr: 10.0.40.5
hostname: ceph4
location:
datacenter: DC1
labels:
- osd
- mds
- rgw
---
service_type: host
addr: 10.0.48.2 ## The hosts for DC2 are on another Subnet/CIDR
hostname: ceph4
location:
root: default
datacenter: DC2 ## Datacenter 2 label for the crushmap
labels:
- osd
- mon
- mgr
---
service_type: host
addr: 10.0.48.3
hostname: ceph5
location:
datacenter: DC2
labels:
- osd
- mon
- mgr
---
service_type: host
addr: 10.0.48.4
hostname: ceph6
location:
datacenter: DC2
labels:
- osd
- mds
- rgw
---
service_type: host
addr: 10.0.48.5
hostname: ceph7
location:
datacenter: DC2
labels:
- osd
- mds
- rgw
---
service_type: host ## This is the arbiter node, it has no osds
addr: 10.0.49.2 ## so also no Datacenter crushmap label
hostname: ceph8
labels:
- mon
---
service_type: mon
placement:
label: "mon"
---
service_type: mds
service_id: mds_filesystem_one
placement:
label: "mds"
---
service_type: mgr
service_name: mgr
placement:
count: 4
label: "mgr"
---
service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
label: "osd"
spec:
data_devices:
all: true ## With this filter all available disks will be used
---
service_type: rgw
service_id: objectgw
service_name: rgw.objectgw
placement:
count: 4
label: "rgw"
spec:
rgw_frontend_port: 8080
EOF
Once the yaml is modified, we can make use of the --apply-spec parameter available in the
cephadm bootstrap command, so the provided yaml file configuration is fed into cephadm to
get the cluster into the state that we have described in the yaml file, as shown in Example 6-2.
IBM Storage Ceph in stretched mode only supports the replica schema of four, so we will
have two copies of the data available on each site, meaning that even if we lose a full site, we
will have two copies available of our data.
Using replica 4 reduces performance, because Ceph needs to make one extra write
compared to the standard protection schema of replica 3. This extra copy is a factor in
lowering the performance as Ceph, by design, waits for all replicas to acknowledge the write
before responding to the client.
Tip: There are a number of ways to mitigate the performance impact of using replica 4,
such as using a high-performance storage backend, increasing the number of OSDs, and
using Ceph features such as caching and read-from-local-replica.
How does Ceph write 2 copies per site? Ceph uses the CRUSH map to determine where to
store object replicas. The CRUSH map is a logical representation of the physical hardware
layout, with a hierarchy of buckets such as rooms, racks, and hosts.
To configure a stretch mode CRUSH map, we define two data centers under the root bucket,
and then define the hosts in each data center. For example, the following terminal output, as
shown in Example 6-3, shows a stretch mode CRUSH map with two data centers, DC1 and
DC2, each with three Ceph hosts.
To be able to achieve our goal of having two copies per-site, we need to define our failure
domain at the pool level, so as an example if we take pool rbdpool, we can see it is using a
“crush_rule” with an id of “1”. See Example 6-4.
If we check the failure domain that is configured on crush_rule 1, we can see that the rule is
going to select two hosts from each datacenter to store the four copies of each client write to
the pool. See Example 6-5.
"rule_id": 1,
"rule_name": "stretch_rule",
"type": 1,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "choose_firstn",
"num": 0,
"type": "datacenter"
},
{
"op": "chooseleaf_firstn",
"num": 2,
"type": "host"
},
{
"op": "emit"
}
]
}
In Figure 6-5 on page 113 you can see the correlation between the OSD crush map, and the
crush rule we have defined in Example 6-4 on page 111.
Figure 6-5 IBM Storage Ceph Crush map and Crush rule
When the stretch mode is enabled, the OSDs will only take PGs active when they peer across
datacenters, assuming both are alive with the following constraints:
Pools will increase in size from the default 3 to 4, expecting two copies on each site.
OSDs will only be allowed to connect to monitors in the same datacenter.
New monitors will not join the cluster if they do not specify a location.
If all the OSDs and monitors from a datacenter become inaccessible at once, the surviving
datacenter will enter a degraded stretch mode, which implies:
This will issue a warning, reduce the pool’s min_size to 1, and allow the cluster to go
active with data in the remaining site.
The pool size parameter is not changed, so you will also get warnings that the pools are
too small.
Although, the stretch mode flag will prevent the OSDs from creating extra copies in the
remaining datacenter (so it will only keep two copies, as before).
When the missing datacenter comes back, the cluster will enter recovery stretch mode
triggering the following actions:
This changes the warning and allows peering but still only requires OSDs from the
datacenter, which was up the whole time.
When all PGs are in a known state and are neither degraded nor incomplete, the cluster
transitions back to the regular stretch mode where:
The cluster ends the warning.
Restores min_size to its starting value (2) and requires both sites to peer.
Stops requiring the always-alive site when peering (so you can failover to the other site, if
necessary).
Ease of deployment and management are the highlights of running Fusion Data Foundation
(FDF) services internally on OpenShift Container Platform.
External deployment
IBM Storage Fusion Data Foundation (FDF) exposes the IBM Storage Ceph services running
outside of the OpenShift Container Platform cluster as storage classes.
OpenShift DR
OpenShift DR is a disaster recovery orchestrator for stateful applications across a set of peer
OpenShift clusters, which are deployed and managed by using RHACM and provide
cloud-native interfaces to orchestrate the life cycle of an application’s state on persistent
volumes. These include:
– Protecting an application and its state relationship across OpenShift clusters.
– Failing over an application and its state to a peer cluster.
– Relocate an application and its state to the previously deployed cluster.
Both OpenShift clusters in the data sites will be managed ACM clusters. Each OCP cluster
will deploy its instance of Fusion Data Foundation; they will use the external mode
deployment so each FDF in external mode will connect to the same IBM Storage Ceph cluster
in stretched mode.
Applications protected by MetroDR are active/passive, meaning that the application can only
be active on one site at a time. During failover or failback, we can switch the active application
from one site to the other. You can have active applications running on both data sites
simultaneously, so, for example, you can have APP1 running active in Site 1 and passive in
Site 2, and APP 2 running active in Site 2 and passive in Site 1. But it is important to
emphasize that independent of the PV access type RWO or RWX, we only support
active/passive for MetroDR protected applications.
Concurrent access to the same volume by both OpenShift clusters using a single stretched
IBM Storage Ceph cluster can corrupt the filesystem. We must avoid this at all costs. To avoid
this issue, MetroDR uses a Ceph fencing mechanism called blocklisting. We can fence any
client by IP or CIDR, and the DR operator takes care of fencing off all worker nodes on the
failed site before starting the application on the secondary site.
With the previous statement in mind, we have to be aware that with MetroDR once we start
the failover, it is going to be a failover of all applications that MetroDR protects; we do not
have granular, per-application failover in MetroDR because we need to fence off all worker
nodes that are part of the OpenShift cluster to ensure that no filesystem/PV is still being
accessed on the primary/failed site.
By default, failover and failback are triggered manually by an operator. This means that a
human must decide to start a failover, and there is no automatic failover based on application
health checks. This kind of automation can be easily developed with the tools available, but it
would require custom development from the user.
This also brings us to the topic of capacity planning. We will always need to have enough free
resources CPU/MEM on the OCP clusters to match all the applications protected with
MetroDR.
The global load balancer (GLB) depicted in Figure 6-6 on page 116 is not included in the
MetroDR solution. Instead, site failure detection and redirection are handled by the site-level
components. In the event of a site failure, client requests are automatically redirected to the
remaining site. While application-level health checks can be implemented, they are not as
crucial in MetroDR as compared to RegionalDR, as MetroDR involves failing over the entire
site, resulting in all application requests being redirected to the remaining site.
The Failover and Failback workflows for both sets of applications are the same at a high level:
1. Our monitoring system has detected a problem with Site 1: health checks are failing, and
the application is inaccessible.
2. After the operator verifies that there is an issue with Site 1, the failover workflow is started.
3. First, we must fence all worker nodes in Site 1.
4. Once the worker nodes are fenced, we can trigger the failover from the ACM HUB UI.
5. The application PV metadata is restored in Site 2, making the PV available for the
application to start up.
6. ACM HUB deploys the application in Site 2, using the PV to restore all previous data.
7. The application successfully starts up on Site 2.
8. The global load balancer detects that the application is healthy on Site 2 and redirects
client traffic to Site 2..
9. At this point, our service is recovered, and the app is running again with zero data loss.
10.At some point, Site 1 is recovered, and all the FDF/ODF worker nodes are running again
after a restart.
11.We removed the fencing that was in place for the worker nodes in Site 1.
12.We reallocated the service to Site 1, bringing the service back to its original state.
There is a visual slide deck of the Failover and Failback workflow available on the following
link.
There is also a YouTube video by Annett Clewett that shows the Failover and Failback of an
application using a queue from the IBM MQ series product. The video is available at the
following link.
The following options are available to authenticate users against IBM Storage Ceph Object
Storage.
RGW local database authentication.
LDAP authentication
Keystone authentication
Secure Token Service (STS):
– with LDAP (STS-lite)
– with keystone (STS-lite)
– with OpenID connect authentication
IBM Storage Ceph Object STS (Security Token Service) gives us the availability to
Authenticate Object Storage users against your enterprise Identity provider (LDAP, AD, and
so forth) through an OIDC(SSO) provider. STS also avoids using S3 long-lived keys, STS
provides temporary and limited privilege credentials that enhance your Object Storage
security policy.
STS can be configured to work with several authentication methods such as LDAP, keystone
or OpenID connect. See Figure 7-1 on page 121.
With this method, the authentication of a user follows this high-level workflow:
1. The user first authenticates against an OIDC to get a JWT (token).
2. The user would then call the AssumeRoleWithWebIdentity API against the STS API,
passing in the role they want to access and the path to the JWT token created in Step 1.
3. RGW will check with the OIDC provider about the validity of the token.
4. STS will create and provide temporary credentials for the user
5. The user can access the S3 resources with a specific role.
An Amazon Resource Name (ARN) is used to identify a user, group or a role amongst others.
The ARN of a username has the general format of arn:aws:iam:::user1. ARNS are important
as they are the required in the permission policy language that S3 uses.
A permission policy instructs what actions are allowed or denied on a set of resources by a
set of principals. Principals are identified via an ARN.
Permission policies are used in two instances: bucket policy and role policy.
A bucket policy determines what actions can be performed inside a s3 bucket and is generally
used to restrict who can read/write objects. The bucket policy is administered via the S3 API
and can be self-managed by end users if they have appropriate permissions defined.
A role is similar to a user and has its own ARN in the general format of
arn:aws:iam:::role/rolename. In a bucket policy instead of giving access to users based on
their user ARN, a role ARN can be used instead. Users are then able to assume the role
IAM role policies, during STS authentication, users can request to assume a role and inherit
all the S3 permissions configured for that role by an RGW administrator, allowing to configure
RBAC or ABAC authorization policies. See Figure 7-3.
We will cover the basic deployment and configuration of Keycloak and IDM, as we need an
SSO (OIDC) and an IDP to test the STS/IAM functionality. The configuration we are using for
Keycloak and IDM is just for functional testing, use the RHSSO (Keycloak) and IDM
documentation to get a bulletproof production deployment in place.
2. Then, navigate to Red Hat OpenShift Operator Hub and search for Red Hat Single
Sign-On Operator. See Figure 7-5.
3. Install the Red Hat Single Sign-On operator. See Figure 7-6 on page 126.
4. After the installation of the operator is complete, go to the Red Hat Single Sign-On
Operator page and create a Keycloak instance. See Figure 7-7 on page 126.
6. Since we are not doing an advanced installation of Keycloak, we only need to specify the
storage class name we want to use and leave the other settings empty. See Figure 7-9.
The operator will now install a Keycloak instance and a PostgreSQL to keep the records. The
storage requirement for the total installation is 2x2GB. The total installation time might take up
to 1 hour, depending on the resources of your cluster.
After the installation, you can reach RH SSO admin credentials from the Secrets page of the
project. The values are stored under credential-keycloak secret.
If you would like to use an external DNS server with your IdM installation, a list of records that
would need to be added to DNS server will be provided by IdM at the end of the installation.
The installation will not be complete without adding these records.
Note: The DNS server within IdM is not intended for general-purpose external use.
Before the installation, you need to configure your RHEL system for IdM.
Make sure to set your hostname using the nmtui tool, for example. The installer may change
the hostname to localhost during installation, and Apache will refuse to use localhost as the
hostname and fail to start. This will cause the installer to fail as well.
Download the packages required for installation of IdM with integrated DNS with the following
command:
[root@idm ~]# dnf install ipa-server ipa-server-dns
IdM installer requires umask value to be set to 0022 for the root account to allow other users
to read files created during the installation. If this value is not set to 0022, the installer will give
a warning, and some functions of the IdM will not be working correctly. You can change this
value to its original after the installation. See Example 7-4.
The domain name has been determined based on the host name.
Please confirm the domain name [stg.local]:
The kerberos protocol requires a Realm name to be defined.
This is typically the domain name converted to uppercase.
Please provide a realm name [STG.LOCAL]:
Certain directory server operations require an administrative user.
This user is referred to as the Directory Manager and has full access
to the Directory for system management tasks and will be added to the
instance of directory server created for IPA.
The password must be at least 8 characters long.
Directory Manager password: <<ENTER DIRECTORY MANAGER PASSWORD>>
Password (confirm): <<VERIFY DIRECTORY MANAGER PASSWORD>>
The IPA server requires an administrative user, named 'admin'.
This user is a regular system account used for IPA server administration.
IPA admin password: <<ENTER ADMIN PASSWORD>>
Password (confirm): <<VERIFY ADMIN PASSWORD>>
Trust is configured but no NetBIOS domain name found, setting it now.
Enter the NetBIOS name for the IPA domain.
Only up to 15 uppercase ASCII letters, digits and dashes are allowed.
Example: EXAMPLE.
After a successful installation, IdM installation will provide a list of records to be added to your
DNS server under the /tmp directory:
These are required DNS records for your IdM to respond to requests properly. Therefore, the
installation will not be complete without adding these records.
If you are using a DNS server that does not support URI record registrations, you can install
IdM with --allow-zone-overlap option enabled. With this option, you need to make sure
your DNS server is forwarding to IdM properly. You can find an example install configuration
answers to --allow-zone-overlap option as shown in Example 7-7.
This includes:
* Configure a stand-alone CA (dogtag) for certificate management
* Configure the NTP client (chronyd)
* Create and configure an instance of Directory Server
* Create and configure a Kerberos Key Distribution Center (KDC)
* Configure Apache (httpd)
* Configure SID generation
* Configure the KDC to enable PKINIT
This user is a regular system account used for IPA server administration.
2. After creating groups, navigate to the Users tab and add the users. The following users
should be added:
s3admin
s3reader
S3writer
You can add users to their respective groups while creating them. See Figure 7-11 on
page 133.
3. After successful user creations (s3admin, s3writer and s3reader) with their respective
groups, they should look like the Figure 7-12 on page 134 (The picture shows 3 images
combined into one).
2. Navigate to User Federation from the left side menu and add an LDAP provider. Example
settings are shown in Figure 7-14 on page 135.
3. Enter the admin password to the Bind Credential field and test both connection and
authentication separately. You do not need to change other settings below the Required
Settings.
4. After successfully creating an LDAP provider, save settings, synchronize users, and
navigate to the Mappers section. We need to create an LDAP Group mapper to map the
users to their respective groups during the import process to RH SSO.
5. Click the Create button to start adding your LDAP Group Mapper. You can find an example
configuration in Figure 7-15 on page 136.
6. Save your settings and click Sync LDAP Groups to Keycloak button. In this step, if the
top message says “0 imported groups”, you can change the Mode from READ_ONLY to
LDAP_ONLY and revert this setting to READ_ONLY after the import is complete.
You can validate the imported users from the Users section on the left side menu. See
Figure 7-16 on page 137.
2. An example Ceph client configuration is shown in Figure 7-18 on page 138. Save the
settings after you enter the required values and navigate to the Credentials tab. Take note
of the client's secret, as this key will be used to authenticate our Ceph client.
3. In the following steps, you can always retrieve this value from this page. See Figure 7-19
on page 139.
We must create Ceph client mappers in types of User Property, Group Membership and
Audience. These values will be included in the token we receive from RH SSO (Keycloak)
during authentication and verification.
4. Firstly, we will create a User Property type of mapper named username. An example
configuration is shown in Figure 7-20.
5. Secondly, we need to create a Group Membership type of mapper with the name
ceph-groups. An example configuration is shown in Figure 7-21 on page 140.
key for the JWT is presented earlier than others. To do this, navigate to the Realm Settings
on the Keycloak interface and click rsa-generated on the provider column. Update the
Priority value to 105 and click Save. See Figure 7-23.
2. You can run this command using one of the users and passwords. See Example 7-9 on
page 141.
5NTEwOTk3MywianRpIjoiZWEyYTZmMzYtMTgwMC00YzAxLThkMDctNjY0MzA5ZjU3MDgxIiwiaXNzIjoia
HR0cHM6Ly9rZXljbG9hay1zc28uYXBwcy5vY3Auc3RnLmxvY2FsL2F1dGgvcmVhbG1zL2NlcGgiLCJhdWQ
iOlsiY2VwaCIsImFjY291bnQiXSwic3ViIjoiczNhZG1pbiIsInR5cCI6IkJlYXJlciIsImF6cCI6ImNlc
GgiLCJzZXNzaW9uX3N0YXRlIjoiYjg4YmYwZjMtMGQwNy00MTE3LThiOGMtNmE3N2EzOGMxMDg0IiwiYWN
yIjoiMSIsImFsbG93ZWQtb3JpZ2lucyI6WyJodHRwczovL2NlcGhwcm94eTEuc3RnLmxvY2FsIiwiaHR0c
HM6Ly9jZXBoczFuMS5zdGcubG9jYWwiXSwicmVhbG1fYWNjZXNzIjp7InJvbGVzIjpbImRlZmF1bHQtcm9
sZXMtY2VwaCIsIm9mZmxpbmVfYWNjZXNzIiwidW1hX2F1dGhvcml6YXRpb24iXX0sInJlc291cmNlX2FjY
2VzcyI6eyJhY2NvdW50Ijp7InJvbGVzIjpbIm1hbmFnZS1hY2NvdW50IiwibWFuYWdlLWFjY291bnQtbGl
ua3MiLCJ2aWV3LXByb2ZpbGUiXX19LCJzY29wZSI6Im9wZW5pZCBlbWFpbCBwcm9maWxlIiwic2lkIjoiY
jg4YmYwZjMtMGQwNy00MTE3LThiOGMtNmE3N2EzOGMxMDg0IiwiZW1haWxfdmVyaWZpZWQiOmZhbHNlLCJ
uYW1lIjoiUzMgQWRtaW4gQWRtaW4iLCJncm91cHMiOlsicmd3YWRtaW5zIl0sInByZWZlcnJlZF91c2Vyb
mFtZSI6InMzYWRtaW4iLCJnaXZlbl9uYW1lIjoiUzMgQWRtaW4iLCJmYW1pbHlfbmFtZSI6IkFkbWluIiw
iZW1haWwiOiJzM2FkbWluQHN0Zy5sb2NhbCJ9.ZZHz3n99kNvOI6a0JxDH5mI2BWCqbD0t2kb7PbqK8CCq
tXP0Am50pb7fXpH9Ya7GOZi-MqHgK4_bW932aS75EYgPkFiYk3NtV2VZziONNd-v0M7A7GiMT7STK5eEc2
TzX4Mg4jkiE3QUZRaJs_TPSxxs3IXXgUo08ST9e7DNIXNYmT3iuH6gbPc0oHoFSCHLqyxXyRPIe_bItq46
9GQPntcm7yBrrqA6vEiOxfoE4pkk438W0peYx0oAv5dE9WBVnmGR8Nu1O-k3ugSDsovV3ZgsVhKvAEzLO4
k36NPMD62x5dGKf5nOKdUnIJpiZt6mgR8IeKGs8AHfrKghlp4L-w","expires_in":300,"refresh_ex
pires_in":1800
...
Output omitted.
3. To verify the token, you can run the following script, which will retrieve the token using the
get_web_token.sh script. See Example 7-10.
4. An example output of running this script using one of the users is shown in Example 7-11.
"sub": "s3reader",
"typ": "Bearer",
"azp": "ceph",
"session_state": "5bb81c10-3b3a-47dc-9790-6a2acc5565e3",
"name": "S3 Reader Reader",
"given_name": "S3 Reader",
"family_name": "Reader",
"preferred_username": "s3reader",
"email": "s3reader@stg.local",
"email_verified": false,
"acr": "1",
"allowed-origins": [
"https://cephproxy1.stg.local",
"https://cephs1n1.stg.local"
],
"realm_access": {
"roles": [
"default-roles-ceph",
"offline_access",
"uma_authorization"
]
},
"resource_access": {
"account": {
"roles": [
"manage-account",
"manage-account-links",
"view-profile"
]
}
},
"scope": "openid email profile",
"sid": "5bb81c10-3b3a-47dc-9790-6a2acc5565e3",
"groups": [
"ipausers",
"rgwreaders"
],
"client_id": "ceph",
"username": "s3reader",
"active": true
}
You should be able to see both the user (s3reader) and the groups (rgwreader) the user
belongs to in the output.
Note: The user’s access and secret keys should be changed for production environments.
2. Then, you need to assign the admin capabilities to this user with the following command:
[root@cephs1n1 ~]# radosgw-admin caps add --uid="oidc"
--caps="oidc-provider=*"
3. Now, you need to obtain the thumbprint for the x5c certificate. You will use this thumbprint
in the next step to register the OIDC client on RGW.
An example script for retrieving the x5c thumbprint can be seen in Example 7-12.
Note: The script as shown in Example 7-12 captures the first key in the array:
“.keys[0].x5c[]”. Depending on your Keycloak configuration, you may have more than
one key in the output of the curl certs command: curl -k
${RHSSO_REALM}/protocol/openid-connect/certs. If you have multiple keys, you need to
use the key with the use: "sig" parameter. This is the key/cert used to sign JWT tokens, and
the one you need to get the thumbprint from for configuring RGW. Example 7-13 shows an
example output (This example is not an actual step to take, it is just an example of the
warning in this Note).
"kid": "LeuppLq90y1gfQlGHdgG9B7iTQ51fD4DGCA-58hrKns",
"kty": "RSA",
"alg": "RSA-OAEP",
"use": "enc", <----- Not this one!
"n":
"xhwk5ordbkSNRftiSAD-JYEq-g7KFykJMsn40PyWqWgPtINxVhLGzrAohYi5Sk2VTkohpgCkyWE63eWqP
GqnttNxVSKubO0j_IQ4PvzxDQb1yPiBc4cnsQ_b5m07ih0MfG_-nM_qU_kaiVz48ifzNRaCi0fK6H3Yi6u
5-vK7OtLRTOYcxwz4dw416QrkNTF8PKK98Qu_-5_-0yh5HAyGS6mMXhB0Esye1wUF2KCXAwwQ5HMi-QAUl
xiTlbLD7uPToMA0nqvkoZHG8ZzRJS-X5QAGgXKMJ39mIBBkLjo27U0UTkZpK0QYeZxytQiNlPRrFI0PhsI
M1mm6ivb4vQ9U9w",
"e": "AQAB",
"x5c": [
"MIIClzCCAX8CBgGKdRY7ozANBgkqhkiG9w0BAQsFADAPMQ0wCwYDVQQDDARjZXBoMB4XDTIzMDkwODEzN
TY0NVoXDTMzMDkwODEzNTgyNVowDzENMAsGA1UEAwwEY2VwaDCCASIwDQYJKoZIhvcNAQEBBQADggEPADC
CAQoCggEBAMYcJOaK3W5EjUX7YkgA/iWBKvoOyhcpCTLJ+ND8lqloD7SDcVYSxs6wKIWIuUpNlU5KIaYAp
MlhOt3lqjxqp7bTcVUirmztI/yEOD788Q0G9cj4gXOHJ7EP2+ZtO4odDHxv/pzP6lP5Golc+PIn8zUWgot
Hyuh92IurufryuzrS0UzmHMcM+HcONekK5DUxfDyivfELv/uf/tMoeRwMhkupjF4QdBLMntcFBdiglwMME
ORzIvkAFJcYk5Wyw+7j06DANJ6r5KGRxvGc0SUvl+UABoFyjCd/ZiAQZC46Nu1NFE5GaStEGHmccrUIjZT
0axSND4bCDNZpuor2+L0PVPcCAwEAATANBgkqhkiG9w0BAQsFAAOCAQEAwK2iM7r3OIYjjC7XMHteeglEf
1Dsu+f4zCRVR6YNlqDmU+UzWZAsEGu+cVxydvfNA3jEzgBs8US0nfwO9grX6tDUA5dWaqmIQbQXlhLBjz7
AjWUv0gh4Dy7r/JSq1ZmDV8kqDxotwJbmcGDI8Hm97U9tN3D2ajOYmpc31ilbiF7qYK/1jkI0xQ4iHZfon
ij85/975LosUf/7ScceNkZnfrTr8sbw9UDuuP/fXmTBvWf27/tjgdOOK8VBDvMQ9rNJYwtxUHuYC1nX7q/
OX/GyENne08rj6W+1nl8DKZmOwccevnrjcwH/tkeW309Pckop+NE/2rR4w2GjXFj9tCESIg=="
],
"x5t": "AOnP1pfgsW3RPIaw_9wplX5dJN8",
"x5t#S256": "LWJy0RSrq_VCKsYltMZsvUmPFWUuxZFC4al1ULZ2X8k"
},
{
"kid": "KQuLnre0heQKsc_BkAbWlaUvesGbMcpZw1mDHT4kSVQ",
"kty": "RSA",
"alg": "RS256",
"use": "sig", <-- This is the correct x5c key/cert!
"n":
"uJZsgG51iWUm_rLtkYgjDAIr8cew_7-aUlm7-XBd3vtNt6DGRPShqT59BBd1cpt4mg4zAan7x4RUL8ed-
nvNb0cph8DS5VLG1ON6CXm7N6FCpPx-eB_ssCjGFIyzLR9cxE3QuDUK3rO_AeEarn9mDw_fkv2edXxFLXC
xO1iAI0bSFrAb69iq33LiZh73smhQDC8zHnmLlGs2x2UhjB2_Vfa79-rGHZLsVMoLXBn-hFLTU3Td-auZX
rJHF5rFVKTrgsvl5s1a5Ek_hY8YvhhlD9BzK0gvnp1otIeU42W-gApS4CFG96cJEN4AGYJJWOlScrBnUQ8
6e9HVb0K_eB3Nfw",
"e": "AQAB",
"x5c": [
"MIIClzCCAX8CBgGKdRY6YjANBgkqhkiG9w0BAQsFADAPMQ0wCwYDVQQDDARjZXBoMB4XDTIzMDkwODEzN
TY0NVoXDTMzMDkwODEzNTgyNVowDzENMAsGA1UEAwwEY2VwaDCCASIwDQYJKoZIhvcNAQEBBQADggEPADC
CAQoCggEBALiWbIBudYllJv6y7ZGIIwwCK/HHsP+/mlJZu/lwXd77TbegxkT0oak+fQQXdXKbeJoOMwGp+
8eEVC/Hnfp7zW9HKYfA0uVSxtTjegl5uzehQqT8fngf7LAoxhSMsy0fXMRN0Lg1Ct6zvwHhGq5/Zg8P35L
9nnV8RS1wsTtYgCNG0hawG+vYqt9y4mYe97JoUAwvMx55i5RrNsdlIYwdv1X2u/fqxh2S7FTKC1wZ/oRS0
1N03fmrmV6yRxeaxVSk64LL5ebNWuRJP4WPGL4YZQ/QcytIL56daLSHlONlvoAKUuAhRvenCRDeABmCSVj
pUnKwZ1EPOnvR1W9Cv3gdzX8CAwEAATANBgkqhkiG9w0BAQsFAAOCAQEAaD2FAPmvBTVI4S6EoOgH8ktbe
AXu9odebGG14gAWZOOQduOeC4pdSemxENXN+OdoNFpJJLs2VTXasHORIo4Lx8Nw2P58so1GZec2uFS/Fpt
wk964eDwRcCt3bYbKKEmomsCPdjCCQDs/V3c1WRz3Z1pE+eqKZXnOsOr/9JEU8wcNDUeM2baXDACmvCtWH
tIxyFcyooLzBkEmHxuvKoa92C8VmuiJAIzbN8Qjym6TBSoTcZ9Ybzp99Q/EnrOccuXrEMI/Z1nwURp1miA
LgBu/BZ84bmU2Kz07vccOzdIoMi1NYfw4QVECIzk9ohzzwo2LWa9cfLxYf06qtj3pNYUkjQ=="
],
"x5t": "phbucEw9qJthh_z0MBUzNMOhHXU",
"x5t#S256": "Llz7h1B3aWyuSnyi_wpq2GGH9l7u552a_FD3sW-Y99s"
}
]
}
First, you must create an SSL certificate on the Rados GW host. To do this, the following
commands can be used.
1. Create a certs directory under /root and create the configuration file for the certificate
with Subject Alternate Names (SAN). We can use the certificate with the domains defined
in the SAN area. An example of a configuration file is shown in Example 7-14.
Example 7-14 Create a certs directory under /root and create the configuration file
[root@cephproxy1 certs]# cat cert.cnf
[req]
distinguished_name = req_distinguished_name
x509_extensions = x509
prompt = no
[req_distinguished_name]
countryName = TR
stateOrProvinceName = Istanbul
organizationName = IBM
organizationalUnitName = STG
commonName = cephproxy1.stg.local
[x509]
keyUsage = critical, digitalSignature, keyAgreement
extendedKeyUsage = serverAuth
subjectAltName = @sans
[sans]
DNS.1 = *.stg.local
DNS.2 = cephproxy1.stg.local
2. Then, you need to create a key for the certificate. See Example 7-15.
3. Finally, you can create your certificate file with the following command. See Example 7-16.
4. You can verify the certificate SANs with the following command. See Example 7-17.
5. After successful creation, the certificate file should be added to the trusted certificates on
the workstation node, which is the first ceph node.
To do this, you must first transfer both cert.pem and cert.key files to
/etc/pki/ca-trust/source/anchors directory under cephs1n1.stg.local host. This can
be done using SCP:
6. After the transfer, you must import the certificate into the system. The following
commands, as shown in Example 7-19, can be used to import the certificate to the system
and update the ca-bundle:
Every few seconds, each keepalived daemon checks whether the haproxy daemon on the
same host is responding. Keepalived also checks that the master keepalived daemon is
running without problems. If the master keepalived daemon or the active haproxy daemon is
not responding, one of the remaining keepalived daemons running in backup mode will be
elected as master, and the virtual IP will be moved to that node.
The active haproxy acts like a load balancer, distributing all RGW requests between all
available RGW daemons.
After a successful import of the certificate file, you need to configure the RGW with SSL
ingress.
1. You must first label the RGW host with rgws to use it with the RGW configuration file.
The following command can label the cephproxy1 node. See Example 7-20.
2. You can verify the host label with the following command. See Example 7-21.
3. After successfully labeling the host, configure an RGW service with SSL using port 8443.
To do this, you will need to create a YAML file that includes the certificate and key file for
the RGW. An example config file can be used to implement an RGW service with SSL
ingress. Note that this YAML file should be created on the RGW node and will be deployed
from there. See Example 7-22.
Example 7-22 Configure an RGW service with an SSL using port 8443
[root@cephproxy1 certs]# cat <<EOF >> /root/rgw-ssl.yaml
service_type: rgw
service_id: objectgw
service_name: rgw.objectgw
placement:
count: 1
label: rgws
spec:
rgw_frontend_port: 8443
rgw_frontend_type: beast
rgw_frontend_ssl_certificate: |
-----BEGIN CERTIFICATE-----
$( cat /root/certs/certificate.pem | grep -v CERTIFICATE | awk '{$1=" "$1}1' )
-----END CERTIFICATE-----
-----BEGIN RSA PRIVATE KEY-----
$( cat /root/certs/certificate.key | grep -v PRIVATE | awk '{$1=" "$1}1' )
-----END RSA PRIVATE KEY-----
'
rgw_realm: multisite
rgw_zone: zone1
ssl: true
extra_container_args:
- "-v"
- "/etc/pki:/etc/pki/:z"
EOF
# ceph orch apply -i /root/rgw-ssl.yaml
4. Note that the /etc/pki directory is bind mounted into the container that the RGW service
will be running in. This ensures that the RGW service will always use the latest updates of
the certificates imported into the RGW host.
###
parser.add_argument("-r","--rgw-endpoint",action="store",dest="RGW_ENDPOINT",requi
red=True)
parser.add_argument("-t","--oidc-thumbprint",action="store",dest="OIDC_THUMBPRINT"
,required=True)
parser.add_argument("-u","--oidc-url",action="store",dest="OIDC_URL",required=True
)
args = parser.parse_args()
OIDC_ARN = re.compile(r"https?://")
OIDC_ARN = "arn:aws:iam:::oidc-provider/" +
OIDC_ARN.sub('',args.OIDC_URL).strip().strip('/')
iam_client = boto3.client(
'iam',
aws_access_key_id=args.AWS_ACCESS_KEY_ID,
aws_secret_access_key=args.AWS_SECRET_ACCESS_KEY,
endpoint_url=args.RGW_ENDPOINT,
region_name='',
verify='/etc/pki/tls/certs/ca-bundle.crt'
)
if args.op == "del":
try:
oidc_delete =
iam_client.delete_open_id_connect_provider(OpenIDConnectProviderArn=OIDC_ARN)
print("Deleted OIDC provider")
except:
print("Provider already absent")
else:
try:
oidc_get = iam_client.list_open_id_connect_providers()
print("Provider already registered.")
except:
oidc_create = iam_client.create_open_id_connect_provider(
Url=args.OIDC_URL,
ClientIDList=[
"ceph",
],
ThumbprintList=[
args.OIDC_THUMBPRINT,
]
)
print("Provider Created")
2. Run the Python script with following arguments. See Example 7-24.
3. After running the code successfully, you can verify that the OIDC provider has been added
to Rados GW with the following command. See Example 7-25.
{
"OpenIDConnectProviderList": [
{
"Arn":
"arn:aws:iam:::oidc-provider/keycloak-sso.apps.ocp.stg.local/auth/realms/ceph"
}
]
}
First, create a JSON file defining the role properties. This JSON policy document will allow
access to the rgwadmins role to any user who has been authenticated by the SSO (OIDC
provider) and is part of the IDM/LDAP rgwadmins group. The condition checks for a match on
the JWT token and will allow the user to assume the role if it finds a StringLike rgwadmins
value in the JWT groups section of the token. See Example 7-26.
"arn:aws:iam:::oidc-provider/keycloak-sso.apps.ocp.stg.local/auth/realms/ceph"
]
},
"Action": [
"sts:AssumeRoleWithWebIdentity"
],
"Condition": {
"StringLike": {
"keycloak-sso.apps.ocp.stg.local/auth/realms/ceph:groups": [
"/rgwadmins"
]
}
}
}
]}
Create an RGW role using the JSON file. See Example 7-27.
"arn:aws:iam:::oidc-provider/keycloak-sso.apps.ocp.stg.local/auth/realms/ceph"
]
},
"Action": [
"sts:AssumeRoleWithWebIdentity"
],
"Condition": {
"StringLike": {
"keycloak-sso.apps.ocp.stg.local/auth/realms/ceph:groups":"rgwwriters"
}
}
}
]
}
Create a role for rgwwriters using the JSON file. See Example 7-29 on page 152.
Example 7-29 Create a role for rgwwriters using the JSON file
[root@cephs1n1 ~]# radosgw-admin role create --role-name rgwwriters \
> --assume-role-policy-doc=$(jq -rc . /root/role-rgwwriters.json)
{
"RoleId": "f5bd0b47-516b-4a31-8c17-ea5928a84b6e",
"RoleName": "rgwwriters",
"Path": "/",
"Arn": "arn:aws:iam:::role/rgwwriters",
"CreateDate": "2023-10-11T19:01:56.995Z",
"MaxSessionDuration": 3600,
"AssumeRolePolicyDocument":
"{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\
"Federated\":[\"arn:aws:iam:::oidc-provider/keycloak-sso.apps.ocp.stg.local/auth/r
ealms/ceph\"]},\"Action\":[\"sts:AssumeRoleWithWebIdentity\"],\"Condition\":{\"Str
ingLike\":{\"keycloak-sso.apps.ocp.stg.local/auth/realms/ceph:groups\":\"rgwwriter
s\"}}}]}"
}
"arn:aws:iam:::oidc-provider/keycloak-sso.apps.ocp.stg.local/auth/realms/ceph"
]
},
"Action": [
"sts:AssumeRoleWithWebIdentity"
],
"Condition": {
"StringLike": {
"keycloak-sso.apps.ocp.stg.local/auth/realms/ceph:groups":"rgwreaders"
}
}
}
]
}
Create a role for rgwreaders using the JSON file. See Example 7-31.
Example 7-31 Create a role for rgwreaders using the JSON file
[root@cephs1n1 ~]# radosgw-admin role create --role-name rgwreaders \
--assume-role-policy-doc=$(jq -rc . /root/role-rgwreaders.json)
{
"RoleId": "b088171c-ae9c-469b-9061-6ea943daffa0",
"RoleName": "rgwreaders",
"Path": "/",
"Arn": "arn:aws:iam:::role/rgwreaders",
"CreateDate": "2023-10-11T14:43:58.483Z",
"MaxSessionDuration": 3600,
"AssumeRolePolicyDocument":
"{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\
"Federated\":[\"arn:aws:iam:::oidc-provider/keycloak-sso.apps.ocp.stg.local/auth/r
ealms/ceph\"]},\"Action\":[\"sts:AssumeRoleWithWebIdentity\"],\"Condition\":{\"Str
ingLike\":{\"keycloak-sso.apps.ocp.stg.local/auth/realms/ceph:groups\":\"rgwreader
s\"}}}]}"
}
2. Apply the role policy for rgwadmins role. See Example 7-34.
3. Next, you will create a role policy for rgwwriters. See Example 7-34.
}
]}
7. Note that you can define multiple policies at the same time for a specific group. An
example of a multiple-policy definition is shown in Example 7-38. You can refer to AWS
documentation for more examples.
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:aws:s3:::stsbucket/*",
"arn:aws:s3:::stsbucket"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:aws:s3:::stsbucket/uploads",
"arn:aws:s3:::stsbucket/uploads/*"
]
},
{
"Effect": "Deny",
"Action": "s3:*",
"NotResource": [
"arn:aws:s3:::stsbucket/*",
"arn:aws:s3:::stsbucket"
]
}
]
}
5. After the transfer, update your system’s certificate bundle and restart the RGW service
with the following commands. See Example 7-40 on page 158.
Example 7-40 Update your system’s certificate bundle and restart the RGW service
[root@cephproxy1 anchors]# update-ca-trust
[root@cephproxy1 anchors]# update-ca-trust enable
[root@cephproxy1 anchors]# update-ca-trust extract
[root@cephproxy1 anchors]# ceph orch restart rgw.objectgw
6. Verify the certificate installation by doing a simple curl operation to Keycloak’s main page.
See Example 7-41.
7.4 Testing Assume Role With Web Identity for role based
access control
After successful import of the certificates into the system, you can test the Assume Role With
Web Identity functionality for role based access control of your IBM Storage Ceph cluster.
An example script for the test is shown in Example 7-42. Note that the AWS_CA_BUNDLE
argument is pointing to the certificate that was created on the RGW node.
--endpoint=https://cephproxy1.stg.local:8443
--web-identity-token="$KC_ACCESS_TOKEN")
echo "aws sts assume-role-with-web-identity --role-arn "arn:aws:iam:::role/$3"
--role-session-name testb --endpoint=https://cephproxy1.stg.local:8443
--web-identity-token="$KC_ACCESS_TOKEN""
echo $IDM_ASSUME_ROLE_CREDS
export AWS_ACCESS_KEY_ID=$(echo $IDM_ASSUME_ROLE_CREDS | jq -r
.Credentials.AccessKeyId)
export AWS_SECRET_ACCESS_KEY=$(echo $IDM_ASSUME_ROLE_CREDS | jq -r
.Credentials.SecretAccessKey)
export AWS_SESSION_TOKEN=$(echo $IDM_ASSUME_ROLE_CREDS | jq -r
.Credentials.SessionToken)
The test-assume-role.sh script that we ran in the previous step takes care of exporting three
environment variables, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and
AWS_SESSION_TOKEN. These variables are used by the AWS CLI S3 client to authenticate
against the S3 endpoint provided by the RGW, as you can see on the next step.
Running the script with the s3admin user grants you admin privileges. This means you will
have full control over the buckets and can perform any actions you desire. You can test the
functionality with the following commands, as shown in Example 7-44.
With the writer role, you will only be able to write to the bucket. You can test the functionality
with the following commands, as shown in Example 7-46.
With the reader role, you will only be able to list the bucket contents and retrieve objects from
it. You can test the functionality with the following commands, as shown in Example 7-48.
PRE uploads/
2023-10-12 17:15:15 3145728 3mfile
2023-10-05 16:53:01 5242880 5mfile
2023-10-05 10:56:32 3332 index.html
Related publications
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this paper.
IBM Redbooks
The following IBM Redbooks publications provide additional information about the topic in this
document. Note that some publications referenced in this list might be available in softcopy
only.
IBM Storage Ceph Concepts and Architecture Guide, REDP-5721
You can search for, view, download or order these documents and other Redbooks,
Redpapers, Web Docs, draft and additional materials, at the following website:
ibm.com/redbooks
Other publications
These websites are also relevant as further information sources:
IBM Storage Ceph Documentation:
https://www.ibm.com/docs/en/storage-ceph/6?topic=dashboard-monitoring-cluster
Community Ceph Documentation
https://docs.ceph.com/en/latest/monitoring/
Online resources
These websites are also relevant as further information sources:
Description1
http://????????.???.???/
Description2
http://????????.???.???/
Description3
http://????????.???.???/
ibm.com/services
REDP-5715-00
ISBN DocISBN
Printed in U.S.A.
®
ibm.com/redbooks