Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
What’s New in ScyllaDB
Operator for Kubernetes
Tomáš Nožička
Maciej Zimnoch
Tomáš Nožička
■ Leads the development of Scylla Operator
■ Emeritus Kubernetes SIG-Apps approver
■ Used to work on a self-hosted, auto-upgrading Kubernetes control
plane for RedHat OpenShift
Principal Software Engineer
Maciej Zimnoch
■ Scylla Operator maintainer
■ Previously worked on Scylla Manager, Instant Messaging servers,
SDN and LTE networks
Senior Software Engineer
YOUR PHOTO
GOES HERE
About
■ Scylla Operator is a Kubernetes Operator for managing and automating
tasks related to managing Scylla clusters.
■ https://github.com/scylladb/scylla-operator
■ https://operator.docs.scylladb.com
ScyllaCluster
apiVersion: scylla.scylladb.com/v1
kind: ScyllaCluster
metadata:
name: my-summit-cluster
spec:
version: 4.5.1
agentVersion: 2.5.2
datacenter:
name: us-east-1
racks:
- name: us-east-1a
members: 3
storage:
capacity: 100Gi
resources:
limits:
cpu: 8
memory: 16Gi
Releases
Release GA EOL
1.0 2021-01-21 2021-05-06
1.1 2021-03-22 2021-06-17
1.2 2021-05-06 2021-08-10
1.3 2021-06-17 2021-09-16
1.4 2021-08-10 2021-12-03
1.5 2021-09-16 Release of 1.7 (TBA)
1.6 2021-12-03 Release of 1.8
1.7 TBA Release of 1.9
■ 6 releases from the last summit
■ Aiming for ~6 weeks cadence (modulo PTOs, holidays, ..)
■ Supporting 2 latest releases
■ N-1 compatibility
Activity
Scylla Summit 2022: What’s New in ScyllaDB Operator for Kubernetes
Performance - experimental
New CRD allowing to specify which K8s Nodes should
be optimized.
Available from: 1.6.0
apiVersion: scylla.scylladb.com/v1alpha1
kind: NodeConfig
metadata:
name: cluster
spec:
placement:
nodeSelector:
scylla.scylladb.com/node-type: scylla
Node 3
Performance - experimental
apiVersion: scylla.scylladb.com/v1alpha1
kind: NodeConfig
metadata:
name: cluster
spec:
placement:
nodeSelector:
node-type: scylla
Node 2
node-type: scylla
Node 1 Node 4
node-type: scylla
Performance - experimental
Throughput 100% read - [kreq/s] 100% write - [kreq/s] 50% read 50% write - [kreq/s]
AWS 80 180 90
EKS 70 140 60
Latency
50% of throughput
p99 read - [ms] p99 write - [ms]
p99 mixed - [ms]
read write
AWS 3.63 6.05 5.27 2.96
EKS 5.0 8.1 6.80 4.10
Disk intensive workflow using 3 x i3.4xlarge
Performance
IO Tune - 2 minute disk benchmark - is part of Scylla
startup. Since 1.2.0 users may skip it by providing
their precomputed values for known hardware types.
From 1.7.0 benchmark result is cached in persistent
location and reused on Scylla restarts.
Seedless mode
Scylla nodes are not longer asymmetric.
Seed nodes are no longer special.
Deployment model doesn’t change, but no manual
steps around seed nodes are no longer required.
All nodes can be automatically replaced.
Required minimum Scylla version: 4.3, 2021.1
Available from: 1.4.0
Security
Image pull secrets added to CRD.
Users may specify their private secure
repository of Scylla images.
Available from: 1.5.0
apiVersion: scylla.scylladb.com/v1
kind: ScyllaCluster
metadata:
name: my-summit-cluster
spec:
repository: custom-repo/scylla
agentRepository: custom-repo/scylla-manager-agent
imagePullSecrets:
- name: repository-credentials
Security
Communication with Scylla API is secured
by token-based authentication.
From 1.3.0 Operator automatically
provisions Secret containing token,
and configures endpoints.
Available from: 1.3.0
Scylla Manager Agent
secret token
Stability
■ 1.2.0 - Operator automatically provisions Pod Disruption Budget protecting
Scylla nodes from being voluntary disrupted
■ 1.2.0 - Enhanced Operator deployment model
■ 1.4.0 - Operator Webhook was extracted into separate entity
■ 1.5.0 - Operator deployments are protected by Pod Disruption Budget
Reconciliation
■ Scylla Operator has been rewritten in 1.4.0 using informers and other machinery that’s used by Kubernetes
controllers
● Cache based with optimistic concurrency and live calls on demand (same as Kubernetes controllers)
● 94% reduction of API calls made by the scylla cluster sidecars
● 82% reduction of API calls from the controller
● Less bug prone (typed)
● The machinery is battle tested by Kubernetes controllers
■ Full reconciliation [since 1.4.0]
● Any change to a field in the ScyllaCluster custom resource will be reconciled automatically
● Previously, only a few fields was supported to be changed and required dedicated logic
● Users can adjust resources, placement and repository spec [1.5.0]
■ Pruning old resources (e.g. services on scale down) [since 1.4.0]
User experience
■ updatedMembers and stale fields help determine rack status more reliably [1.4.0]
■ ScyllaCluster.status supports observedGeneration API concept [1.4.0]
■ Users can now force a rolling restart of the ScyllaCluster by setting
spec.forceRedeploymentReason [1.4.0]
■ Validating webhooks now chain the errors (avoids iterating one by one) [1.5.0]
Testing
■ We’ve added an integrated end-to-end suite in 1.2.0
■ Gradually increasing our coverage (in addition to QA coverage)
■ New features have to contain e2e tests
■ Tests run in parallel since 1.7.0
What’s next
■ Additional performance R&D
■ Persistent storage support
■ Managed TLS (internode + client)
■ Manual MultiDC
■ Managed MultiDC
■ Managed Scylla credentials
■ More deployments methods (OLM / OperatorHub, OpenShift Marketplace)
■ Supporting Azure Cloud
■ Autoscaling
■ And much more!
Thank you!
Stay in touch
scylladb-users@googlegroups.com
scylladb-users.slack.com
#scylla-operator

More Related Content

Scylla Summit 2022: What’s New in ScyllaDB Operator for Kubernetes

  • 1. What’s New in ScyllaDB Operator for Kubernetes Tomáš Nožička Maciej Zimnoch
  • 2. Tomáš Nožička ■ Leads the development of Scylla Operator ■ Emeritus Kubernetes SIG-Apps approver ■ Used to work on a self-hosted, auto-upgrading Kubernetes control plane for RedHat OpenShift Principal Software Engineer
  • 3. Maciej Zimnoch ■ Scylla Operator maintainer ■ Previously worked on Scylla Manager, Instant Messaging servers, SDN and LTE networks Senior Software Engineer YOUR PHOTO GOES HERE
  • 4. About ■ Scylla Operator is a Kubernetes Operator for managing and automating tasks related to managing Scylla clusters. ■ https://github.com/scylladb/scylla-operator ■ https://operator.docs.scylladb.com
  • 5. ScyllaCluster apiVersion: scylla.scylladb.com/v1 kind: ScyllaCluster metadata: name: my-summit-cluster spec: version: 4.5.1 agentVersion: 2.5.2 datacenter: name: us-east-1 racks: - name: us-east-1a members: 3 storage: capacity: 100Gi resources: limits: cpu: 8 memory: 16Gi
  • 6. Releases Release GA EOL 1.0 2021-01-21 2021-05-06 1.1 2021-03-22 2021-06-17 1.2 2021-05-06 2021-08-10 1.3 2021-06-17 2021-09-16 1.4 2021-08-10 2021-12-03 1.5 2021-09-16 Release of 1.7 (TBA) 1.6 2021-12-03 Release of 1.8 1.7 TBA Release of 1.9 ■ 6 releases from the last summit ■ Aiming for ~6 weeks cadence (modulo PTOs, holidays, ..) ■ Supporting 2 latest releases ■ N-1 compatibility
  • 9. Performance - experimental New CRD allowing to specify which K8s Nodes should be optimized. Available from: 1.6.0 apiVersion: scylla.scylladb.com/v1alpha1 kind: NodeConfig metadata: name: cluster spec: placement: nodeSelector: scylla.scylladb.com/node-type: scylla
  • 10. Node 3 Performance - experimental apiVersion: scylla.scylladb.com/v1alpha1 kind: NodeConfig metadata: name: cluster spec: placement: nodeSelector: node-type: scylla Node 2 node-type: scylla Node 1 Node 4 node-type: scylla
  • 11. Performance - experimental Throughput 100% read - [kreq/s] 100% write - [kreq/s] 50% read 50% write - [kreq/s] AWS 80 180 90 EKS 70 140 60 Latency 50% of throughput p99 read - [ms] p99 write - [ms] p99 mixed - [ms] read write AWS 3.63 6.05 5.27 2.96 EKS 5.0 8.1 6.80 4.10 Disk intensive workflow using 3 x i3.4xlarge
  • 12. Performance IO Tune - 2 minute disk benchmark - is part of Scylla startup. Since 1.2.0 users may skip it by providing their precomputed values for known hardware types. From 1.7.0 benchmark result is cached in persistent location and reused on Scylla restarts.
  • 13. Seedless mode Scylla nodes are not longer asymmetric. Seed nodes are no longer special. Deployment model doesn’t change, but no manual steps around seed nodes are no longer required. All nodes can be automatically replaced. Required minimum Scylla version: 4.3, 2021.1 Available from: 1.4.0
  • 14. Security Image pull secrets added to CRD. Users may specify their private secure repository of Scylla images. Available from: 1.5.0 apiVersion: scylla.scylladb.com/v1 kind: ScyllaCluster metadata: name: my-summit-cluster spec: repository: custom-repo/scylla agentRepository: custom-repo/scylla-manager-agent imagePullSecrets: - name: repository-credentials
  • 15. Security Communication with Scylla API is secured by token-based authentication. From 1.3.0 Operator automatically provisions Secret containing token, and configures endpoints. Available from: 1.3.0 Scylla Manager Agent secret token
  • 16. Stability ■ 1.2.0 - Operator automatically provisions Pod Disruption Budget protecting Scylla nodes from being voluntary disrupted ■ 1.2.0 - Enhanced Operator deployment model ■ 1.4.0 - Operator Webhook was extracted into separate entity ■ 1.5.0 - Operator deployments are protected by Pod Disruption Budget
  • 17. Reconciliation ■ Scylla Operator has been rewritten in 1.4.0 using informers and other machinery that’s used by Kubernetes controllers ● Cache based with optimistic concurrency and live calls on demand (same as Kubernetes controllers) ● 94% reduction of API calls made by the scylla cluster sidecars ● 82% reduction of API calls from the controller ● Less bug prone (typed) ● The machinery is battle tested by Kubernetes controllers ■ Full reconciliation [since 1.4.0] ● Any change to a field in the ScyllaCluster custom resource will be reconciled automatically ● Previously, only a few fields was supported to be changed and required dedicated logic ● Users can adjust resources, placement and repository spec [1.5.0] ■ Pruning old resources (e.g. services on scale down) [since 1.4.0]
  • 18. User experience ■ updatedMembers and stale fields help determine rack status more reliably [1.4.0] ■ ScyllaCluster.status supports observedGeneration API concept [1.4.0] ■ Users can now force a rolling restart of the ScyllaCluster by setting spec.forceRedeploymentReason [1.4.0] ■ Validating webhooks now chain the errors (avoids iterating one by one) [1.5.0]
  • 19. Testing ■ We’ve added an integrated end-to-end suite in 1.2.0 ■ Gradually increasing our coverage (in addition to QA coverage) ■ New features have to contain e2e tests ■ Tests run in parallel since 1.7.0
  • 20. What’s next ■ Additional performance R&D ■ Persistent storage support ■ Managed TLS (internode + client) ■ Manual MultiDC ■ Managed MultiDC ■ Managed Scylla credentials ■ More deployments methods (OLM / OperatorHub, OpenShift Marketplace) ■ Supporting Azure Cloud ■ Autoscaling ■ And much more!
  • 21. Thank you! Stay in touch scylladb-users@googlegroups.com scylladb-users.slack.com #scylla-operator