Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

IBM Cloud Pak for Data: An enterprise platform to operationalize data, analytics, and AI
IBM Cloud Pak for Data: An enterprise platform to operationalize data, analytics, and AI
IBM Cloud Pak for Data: An enterprise platform to operationalize data, analytics, and AI
Ebook563 pages4 hours

IBM Cloud Pak for Data: An enterprise platform to operationalize data, analytics, and AI

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Cloud Pak for Data is IBM's modern data and AI platform that includes strategic offerings from its data and AI portfolio delivered in a cloud-native fashion with the flexibility of deployment on any cloud. The platform offers a unique approach to addressing modern challenges with an integrated mix of proprietary, open-source, and third-party services.
You'll begin by getting to grips with key concepts in modern data management and artificial intelligence (AI), reviewing real-life use cases, and developing an appreciation of the AI Ladder principle. Once you've gotten to grips with the basics, you will explore how Cloud Pak for Data helps in the elegant implementation of the AI Ladder practice to collect, organize, analyze, and infuse data and trustworthy AI across your business. As you advance, you'll discover the capabilities of the platform and extension services, including how they are packaged and priced. With the help of examples present throughout the book, you will gain a deep understanding of the platform, from its rich capabilities and technical architecture to its ecosystem and key go-to-market aspects.
By the end of this IBM book, you'll be able to apply IBM Cloud Pak for Data's prescriptive practices and leverage its capabilities to build a trusted data foundation and accelerate AI adoption in your enterprise.

LanguageEnglish
Release dateNov 24, 2021
ISBN9781800567405
IBM Cloud Pak for Data: An enterprise platform to operationalize data, analytics, and AI

Related to IBM Cloud Pak for Data

Related ebooks

Data Modeling & Design For You

View More

Reviews for IBM Cloud Pak for Data

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    IBM Cloud Pak for Data - Hemanth Manda

    cover.png

    BIRMINGHAM—MUMBAI

    IBM Cloud Pak for Data

    Copyright © 2021 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Publishing Product Manager: Ali Abidi

    Senior Editor: Roshan Kumar

    Content Development Editors: Athikho Sapuni Rishana and Priyanka Soam

    Technical Editor: Manikandan Kurup

    Copy Editor: Safis Editing

    Project Coordinator: Aparna Ravikumar Nair

    Proofreader: Safis Editing

    Indexer: Pratik Shirodkar

    Production Designer: Aparna Bhagat

    First published: October 2021

    Production reference: 2221021

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham

    B3 2PB, UK.

    ISBN: 978-1-80056-212-7

    www.packt.com

    Contributors

    About the authors

    Hemanth Manda heads product management at IBM and is responsible for the Cloud Pak for Data platform. He has broad experience in the technology and software industry spanning a number of strategy and execution roles over the past 20 years. In his current role, Hemanth leads a team of over 20 product managers responsible for simplifying and modernizing IBM's data and AI portfolio to support cloud-native architectures through the new platform offering that is Cloud Pak for Data. Among other things, he is responsible for rationalizing and streamlining the data and AI portfolio at IBM, a $6 billion-dollar business, and delivering new platform-wide capabilities through Cloud Pak for Data.

    Sriram Srinivasan is an IBM Distinguished Engineer leading the architecture and development of Cloud Pak for Data. His interests lie in cloud-native technologies such as Kubernetes and their practical application for both client-managed environments and Software as a Service. Prior to this role, Sriram led the development of IBM Data Science Experience Local and the dashDB Warehouse as a Service for IBM Cloud. Early on in his career at IBM, Sriram led the development of various web and Eclipse tooling platforms, such as IBM Data Server Manager and the SQL Warehousing tool. He started his career at Informix, where he worked on application servers, database tools, e-commerce products, and Red Brick data warehouse.

    Deepak Rangarao leads WW Technical Sales at IBM and is responsible for the Cloud Pak for Data platform. He has broad cross-industry experience in the data warehousing and analytics space, building analytic applications at large organizations and technical presales, both with start-ups and large enterprise software vendors. Deepak has co-authored several books on topics such as OLAP analytics, change data capture, data warehousing, and object storage and is a regular speaker at technical conferences. He is a certified technical specialist in Red Hat OpenShift, Apache Spark, Microsoft SQL Server, and web development technologies.

    About the reviewers

    Sumeet S Kapoor is a technology leader, seasoned data and AI professional, inventor, and public speaker with over 18 years of experience in the IT Industry. He currently works for the IBM India software group as a solutions architect Leader and enables global partners and enterprise customers on the journey of adopting data and AI platforms. He has solved complex real-world problems across industry domains and has also filed a patent in the area of AI data virtualization and governance automation. Prior to IBM, he has worked as a senior technology specialist and development lead in Fortune 500 global product and consulting organizations. Sumeet enjoys running as his hobby and has successfully completed eight marathons and counting.

    Campbell Robertson is the worldwide data and AI practice leader for IBM's Customer Success Group. In his role, Campbell is responsible for providing strategy and subject matter expertise to IBM Customer Success Managers, organizations, and IBM business partners. His primary focus is to help clients make informed decisions on how they can successfully align people, processes, and policies with AI- and data-centric technology for improved outcomes and innovation. He has over 25 years of experience of working with public sector organizations worldwide to deploy best-of-breed technology solutions. Campbell has an extensive background in architecture, data and AI technologies, expert labs services, IT sales, marketing, and business development.

    Table of Contents

    Preface

    Section 1: The Basics

    Chapter 1: The AI Ladder – IBM's Prescriptive Approach

    Market dynamics and IBM's Data and AI portfolio

    Introduction to the AI ladder

    The rungs of the AI ladder

    Collect – making data simple and accessible

    Organize – creating a trusted analytics foundation

    People empowering your data citizens

    Analyze – building and scaling models with trust and transparency

    Infuse – operationalizing AI throughout the business

    Customer service

    Risk and compliance

    IT operations

    Financial operations

    Business operations

    The case for a data and AI platform

    Summary

    Chapter 2: Cloud Pak for Data: A Brief Introduction

    The case of a data and AI platform – recap

    Overview of Cloud Pak for Data

    Exploring unique differentiators, key use cases, and customer adoption

    Key use cases

    Customer use case: AI claim processing

    Customer use case: data and AI platform

    Cloud Pak for Data: additional details

    An open ecosystem

    Premium IBM cartridges and third-party services

    Industry accelerators

    Packaging and deployment options

    Red Hat OpenShift

    Summary

    Section 2: Product Capabilities

    Chapter 3: Collect – Making Data Simple and Accessible

    Data – the world's most valuable asset

    Data-centric enterprises

    Challenges with data-centric delivery

    Enterprise data architecture

    NoSQL data stores – key categories

    Data virtualization – accessing data anywhere

    Data virtualization versus ETL – when to use what?

    Platform connections – streamlining data connectivity

    Data estate modernization using Cloud Pak for Data

    Summary

    Chapter 4: Organize – Creating a Trusted Analytics Foundation

    Introducing Data Operations (DataOps)

    Organizing enterprise information assets

    Establishing metadata and stewardship

    Business metadata components

    Technical metadata components

    Profiling to get a better understanding of your data

    Classifying data for completeness

    Automating data discovery and business term assignment

    Enabling trust with data quality

    Steps to assess data quality

    DataOps in action

    Automation rules around data quality

    Data privacy and activity monitoring

    Data integration at scale

    Considerations for selecting a data integration tool

    The extract, transform, and load (ETL) service in Cloud Pak for Data

    Advantages of leveraging a cloud-native platform for ETL

    Master data management

    Extending MDM toward a Digital Twin

    Summary

    Chapter 5: Analyzing: Building, Deploying, and Scaling Models with Trust and Transparency

    Self-service analytics of governed data

    BI and reporting

    Predictive versus prescriptive analytics

    Understanding AI

    AI life cycle – Transforming insights into action

    AI governance: Trust and transparency

    Automating the AI life cycle using Cloud Pak for Data

    Data science tools for a diverse data science team

    Distributed AI

    Establishing a collaborative environment and building AI models

    Choosing the right tools to use

    ModelOps – Deployment phase

    ModelOps – Monitoring phase

    Streaming data/analytics

    Distributed processing

    Summary

    Chapter 6: Multi-Cloud Strategy and Cloud Satellite

    IBM's multi-cloud strategy

    Supported deployment options

    Managed OpenShift

    AWS Quick Start

    Azure Marketplace and QuickStart templates

    Cloud Pak for Data as a Service

    Packaging and pricing

    IBM Cloud Satellite

    A data fabric for a multi-cloud future

    Summary

    Chapter 7: IBM and Partner Extension Services

    IBM and third-party extension services

    Collect extension services

    Db2 Advanced

    Informix

    Virtual Data Pipeline

    EDB Postgres Advanced Server

    MongoDB Enterprise Advanced

    Organize extension services

    DataStage

    Information Server

    Master Data Management

    Analyze cartridges – IBM Palantir

    Infuse cartridges

    Cognos Analytics

    Planning Analytics

    Watson Assistant

    Watson Discovery

    Watson API Kit

    Modernization upgrades to Cloud Pak for Data cartridges

    Extension services

    Summary

    Chapter 8: Customer Use Cases

    Improving health advocacy program efficiency

    Voice-enabled chatbots

    Risk and control automation

    Enhanced border security

    Unified Data Fabric

    Financial planning and analytics

    Summary

    Section 3: Technical Details

    Chapter 9: Technical Overview, Management, and Administration

    Technical requirements

    Architecture overview

    Characteristics of the platform

    Technical underpinnings

    The operator pattern

    The platform technical stack

    Infrastructure requirements, storage, and networking

    Understanding how storage is used

    Networking

    Foundational services and the control plane

    Cloud Pak foundational services

    Cloud Pak for Data control plane

    Management and monitoring

    Multi-tenancy, resource management, and security

    Isolation using namespaces

    Resource management and quotas

    Enabling tenant self-management

    Day 2 operations

    Upgrades

    Scale-out

    Backup and restore

    Summary

    References

    Chapter 10: Security and Compliance

    Technical requirements

    Security and Privacy by Design

    Development practices

    Vulnerability detection

    Delivering security assured container images

    Secure operations in a shared environment

    Securing Kubernetes hosts

    Security in OpenShift Container Platform

    Namespace scoping and service account privileges

    RBAC and the least privilege principle

    Workload notification and reliability assurance

    Additional considerations

    Encryption in motion and securing entry points

    Encryption at rest

    Anti-virus software

    User access and authorizations

    Authentication

    Authorization

    User management and groups

    Securing credentials

    Meeting compliance requirements

    Configuring the operating environment for compliance

    Auditing

    Integration with IBM Security Guardium

    Summary

    References

    Chapter 11: Storage

    Understanding the concept of persistent volumes

    Kubernetes storage introduction

    Types of persistent volumes

    In-cluster storage

    Optimized hyperconverged storage and compute

    Separated compute and storage Nodes

    Provisioning procedure summary

    Off-cluster storage

    NFS-based persistent volumes

    Operational considerations

    Continuous availability with in-cluster storage

    Data protection – snapshots, backups, and active-passive disaster recovery

    Quiescing Cloud Pak for Data services

    Db2 database backups and HADR

    Kubernetes cluster backup and restore

    Summary

    Further reading

    Chapter 12: Multi-Tenancy

    Tenancy considerations

    Designating tenants

    Organizational and operational implications

    Architecting for multi-tenancy

    Achieving tenancy with namespace scoping

    Ensuring separation of duties with Kubernetes RBAC and separation of duties with operators

    Securing access to a tenant instance

    Choosing dedicated versus shared compute nodes

    Reviewing the tenancy requirements

    Isolating tenants

    Tenant security and compliance

    Self-service and management

    A summary of the assessment

    In-namespace sub-tenancy with looser isolation

    Approach

    Assessing the limitations of this approach

    Summary

    Other Books You May Enjoy

    Preface

    Cloud Pak for Data is IBM's modern Data and AI platform that includes strategic offerings from its data and AI portfolio delivered in a cloud-native fashion with the flexibility of deployment on any cloud. The platform offers a unique approach to address modern challenges with an integrated mix of proprietary, open source, and third-party services.

    You will start with key concepts in modern data management and AI, review real-life use cases, and develop an appreciation of the AI Ladder principle. With this foundation, you will explore how Cloud Pak for Data helps in the elegant implementation of the AI Ladder practice to collect, organize, analyze, and infuse data and trustworthy AI across your business. As you advance, you will also discover the capabilities of the platform and extension services, including how they are packaged and priced. With examples throughout the book, you will gain a deep understanding of the platform, from its rich capabilities and technical architecture to its ecosystem and key go-to-market aspects.

    At the end of this IBM book, you will be well-versed in the concepts of IBM Cloud Pak for Data, and be able to apply its prescriptive practices and leverage its capabilities in building a trusted data foundation and accelerate AI adoption in your enterprise.

    Note

    The content in this book is comprehensive and covers multiple versions in support as of Oct 2021 including version 3.5 and version 4.0. Some of the services, capabilities, and features highlighted in the book might not be relevant to all versions, and as the product evolves we expect a few more changes.

    However, the overarching message, value prop, and underlying architecture will remain more or less consistent. Given the rapid progress and product evolution, we decided to be exhaustive while focusing to highlight the core concepts.

    We sincerely hope that you will find this book helpful and overlook any inconsistencies attributed to product evolution.

    Who this book is for

    This book is for business executives, CIOs, CDOs, data scientists, data stewards, data engineers, and developers interested in learning about IBM's Cloud Pak for Data. Knowledge of technical concepts and familiarity with data, analytics, and AI initiatives at various levels of maturity is required to make the most of this book.

    What this book covers

    Chapter 1, The AI Ladder: IBM's Prescriptive Approach, explores market dynamics, IBM's data and AI portfolio, and a detailed overview of the AI Ladder, what it entails, and how IBM offerings map to the different rungs of the ladder.

    Chapter 2, Cloud Pak for Data: A Brief Introduction, covers IBM's modern data and AI platform in detail, along with some of its key differentiators. We will discuss Red Hat OpenShift, the implied cloud benefits it confers, and the platform foundational services that form the basis of Cloud Pak for Data.

    Chapter 3, Collect – Making Data Simple and Accessible, offers a flexible approach to address the modern challenges with data-centric delivery, with the proliferation of data both in terms of volume and variety, with a mix of proprietary, open source, and third-party services.

    Chapter 4, Organize – Creating a Trusted Analytics Foundation, allows you to learn how Cloud Pak for Data enables Data Ops (data operations), orchestration of people, processes, and technology to deliver trusted, business-ready data to data citizens, operations, applications, and artificial intelligence (AI) fast.

    Chapter 5, Analyzing: Building, Deploying, and Scaling Models with Trust and Transparency, explains how to analyze your data in smarter ways and benefit from visualization and AI models that empower your organization to gain new insights and make better and smarter decisions.

    Chapter 6, Multi-Cloud Strategy and Cloud Satellite, offers to operationalize AI throughout the business, allowing your employees to focus on higher-value work.

    Chapter 7, IBM and Partner Extension Services, covers the technical concepts underpinning Cloud Pak for Data, including, but not limited to, an architecture overview, common services, Day-2 operations, infrastructure and storage support, and other advanced concepts.

    Chapter 8, Customer Use Cases, drills down into the concepts of extension services, how they are packaged and priced, and the various IBM extension services available on Cloud Pak for Data across the Collect, Organize, Analyze, and Infuse rungs of the AI ladder.

    Chapter 9, Technical Overview, Management, and Administration, addresses the importance of a partner ecosystem, the different tiers of business partners, and how clients can benefit from an open ecosystem on Cloud Pak for Data.

    Chapter 10, Security and Compliance, focuses on the importance of business outcomes and key customer use case patterns of Cloud Pak for Data while highlighting the top three use case patterns: data modernization, DataOps, and an automated AI life cycle.

    Chapter 11, Storage, looks at how the two critical prerequisites for enterprise adoption, security and governance, are addressed in Cloud Pak for Data.

    Chapter 12, Multi-Tenancy, covers the different storage options supported by Cloud Pak for Data and how to configure it for high availability and disaster recovery.

    To get the most out of this book

    Knowledge of technical concepts and familiarity with data, analytics, and AI initiatives at various levels of maturity is required to make the most of this book.

    If you are using the digital version of this book, we advise you to type the code yourself. Doing so will help you avoid any potential errors related to the copying and pasting of code.

    Download the color images

    We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here:

    https://static.packt-cdn.com/downloads/9781800562127_ColorImages.pdf

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: The Cloud Pak for Data control plane introduces a special persistent volume claim called user-home-pvc.

    A block of code is set as follows:

    kubectl get pvc user-home-pvc

    NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE

    user-home-pvc Bound pvc-44e5a492-9921-41e1-bc42-b96a9a4dd3dc 10Gi RWX nfs-client 33d

    When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    Port: zencoreapi-tls 4444/TCP

    TargetPort: 4444/TCP

    Endpoints: 10.254.16.52:4444,10.254.20.23:4444

    Bold: Indicates a new term, an important word, or words that you see on screen. For instance, words in menus or dialog boxes appear in bold. Here is an example: There are essentially two types of host nodes (as presented in the screenshot) – the Master and Compute (worker) nodes.

    Tips or important notes

    Appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

    Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

    Share Your Thoughts

    Once you've read IBM Cloud Pak for Data, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

    Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

    Section 1: The Basics

    In this section, we will learn about market trends, data and AI, IBM's offering portfolio, its prescriptive approach to AI adoption, and an overview of Cloud Pak for Data.

    This section comprises the following chapters:

    Chapter 1, The AI Ladder: IBM's Prescriptive Approach

    Chapter 2, Cloud Pak for Data – A Brief Introduction

    Chapter 1: The AI Ladder – IBM's Prescriptive Approach

    Digital transformation is impacting every industry and business, with data and artificial intelligence (AI) playing a prominent role. For example, some of the largest companies in the world, such as Amazon, Facebook, Uber, and Google, leverage data and AI as a key differentiator. However, not every enterprise is successful in embracing AI and monetizing their data. The AI ladder is IBM's response to this market need – it's a prescriptive approach to AI adoption and entails four simple steps or rungs of the ladder.

    In this chapter, you will learn about market dynamics, IBM's Data and AI portfolio, and a detailed overview of the AI ladder. We are also going to cover what it entails and how IBM offerings map to the different rungs of the ladder.

    In this chapter, we will be covering the following main topics:

    Market dynamics and IBM's Data and AI portfolio

    Introduction to the AI ladder

    Collect – making data simple and accessible

    Organize – creating a trusted analytics foundation

    Analyze – building and scaling AI with trust and transparency

    Infuse – operationalizing AI throughout the business

    Market dynamics and IBM's Data and AI portfolio

    The fact is that every company in the world today is a data company. As the Economist magazine rightly pointed out in 2017, data is the world's most valuable resource and unless you are leveraging your data as a strategic differentiator, you are likely missing out on opportunities.

    Simply put, data is the fuel, the cloud is the vehicle, and AI is the destination. The intersection of these three pillars of IT is the driving force behind digital transformation disrupting every company and industry. To be successful, companies need to quickly modernize their portfolio and embrace an intentional strategy to re-tool their data, AI, and application workloads by leveraging a cloud-native architecture. So, cloud platforms act as a great enabler by infusing agility, while AI is the ultimate destination, the so-called nirvana that every enterprise seeks to master.

    While the benefits of the cloud are becoming obvious by the day, there are still several enterprises that are reluctant to embrace the public cloud right away. These enterprises are, in some cases, constrained by regulatory concerns, which make it a challenge to operate on public clouds. However, this doesn't mean that they don't see the value of the cloud and the benefits derived from embracing the cloud architecture. Everyone understands that the cloud is the ultimate destination, and taking the necessary steps to prepare and modernize their workloads is not an option, but a survival necessity:

    Figure 1.1 – What's reshaping how businesses operate? The driving forces behind digital transformation

    Figure 1.1 – What's reshaping how businesses operate? The driving forces behind digital transformation

    IBM enjoys a strong Data and AI portfolio, with 100+ products being developed and acquired over the past 40 years, including some marquee offerings such as Db2, Informix, DataStage, Cognos Analytics, SPSS Modeler, Planning Analytics, and more. The depth and breadth of IBM's portfolio is what makes it stand out in the market. With Cloud Pak for Data, IBM is doubling down on this differentiation, further simplifying and modernizing its portfolio as customers look to a hybrid, multi-cloud future.

    Introduction to the AI ladder

    We all know data is the foundation for businesses to drive smarter decisions. Data is what fuels digital transformation. But it is AI that unlocks the value of that data, which is why AI is poised to transform businesses with the potential to add almost 16 trillion dollars to the global economy by 2030.

    Enjoying the preview?
    Page 1 of 1