Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

From Ceilometer
to Telemetry
Not so alarming!

A Julien Danjou & Nick Barcet presentation
for
OpenStack in action! 4
on the 5th December 2013

2

Speakers
Nick Barcet
VP Products @ eNovance
Co-founded the Ceilometer project at the Folsom
summit and led the project through incubation
Julien Danjou Ceilometer Lead Dev @ eNovance
Has been a core Ceilometer contributor from the
outset, taking over the PTL reins for Havana

3

State of the project
● Officially named OpenStack Telemetry
● Havana is the first integrated release
● Community growth
○ Grizzly: 30 contributors, 267 commits
○ Havana: 57 contributors, 434 commits

4

What was done during
the Havana cycle?

5

UDP transport
● Faster, stateless
● Lighter (msgpack encoding)
but…

● No delivery guaranteed
● Not signed
▶ Use case: gathering metrics for alarms

6

Improved API
● Group samples by fields when requesting
statistics (?groupby[]=user_id)
● Limit the number of items returned (?limit=42)
● Provides links to other resources in the API

7

Send your own samples
Users or operators can
send samples
➔ Leverage the
statistics
➔ Usable for alarming

POST /v2/meters/mymeter
[{
"counter_type": "gauge",
"counter_unit": "megabyte",
"counter_volume": 142.0,
"user_id": "efd87807-12d2-4b38-9c705f5c2ac427ff",
"project_id": "35b17138-b364-4e6a-a1318f3099c5be68",
"resource_id": "bd9431c1-8d69-4ad3-803a8d4a6b89fd36",
"resource_metadata": {
"name1": "value1",
"name2": "value2"
},
"source": "mypaasplatform",
"timestamp": "2013-09-10T20:34:13.711330"
}]

8

New storage backends

9

Database TTL
Previously:
No way to purge data.
Ceilometer produces a lot of data
(gigabytes per day)
Now:
ceilometer-expirer will drop data older
than the configured time-to-live delay

10

Hyper-V

➔ Disk, network and CPU usage

11

New meters
● API endpoints
○ Meters the requests made to API server (Neutron,
Glance, Nova, Swift, etc)

● Neutron bandwidth
○ Meter the bandwidth consumed by each project
○ Traffic labeled as configured by operator
(based on source/destination)

12

Neutron Traffic Labels
Internet
label: Ext
label: Compute

VM

VM

label: Object

VM

Swift

Swift

Swift

13

Alarms

Regularly watch for meters statistics
values and triggers actions based on
threshold crossings.

14

Alarms architecture
Ceilometer API
R
P
C

H
T
T
P

Ceilometer alarm
evaluator

Webhook, SMS, e-mail…

B
u
s
Trigger

Trigger

Ceilometer
Ceilometer
alarm notifier
Ceilometer
alarm notifier
alarm notifier

15

Alarm types
● Threshold alarms
Triggered once a value crosses a threshold
“Call a Webhook as soon as CPU usage goes above 80%”

● Combination alarms
Triggered once all alarms in that alarm are triggered
“Call a Webhook as soon as alarm “foo” and alarm “bar” are
triggered”

16

Alarms API
POST /v2/alarms

GET /v2/alarms/foobar
PUT /v2/alarms/foobar

{
"alarm_actions": [ "http://site:8000/alarm"],
"insufficient_data_actions": ["http://site:8000/nodata"],
"ok_actions": ["http://site:8000/ok"],
"comparison_operator": "gt",
"description": "An alarm",
"evaluation_periods": 2,
"matching_metadata": {"key_name": "key_value"},
"meter_name": "storage.objects",
"name": "SwiftObjectAlarm",
"period": 240,
"statistic": "avg",
"threshold": 200.0
}

DELETE /v2/alarms/foobar

17

Heat & auto-scaling

API service

Heat Engine
injects user
metadata

triggers alarm
my_stack

Instance

Alarm
evaluator

monitors instances

Compute
Agent

Ceilometer

creates alarms

18

Heat & auto-scaling
API

Heat Engine

Alarms
injects user
metadata
my_stack

Instance
Instance
Instance

scales out
stack

Compute

Ceilometer

alarming

19

Heat & auto-scaling
API

Heat Engine

Alarms
injects user
metadata
my_stack

Instance
Instance
Instance
Instance
Instance

scales out
stack

Compute

Ceilometer

alarming

20

Events storage
(Almost) all OpenStack components send notifications on
events: let’s store them.
➔ Useful to be able to re-generate samples
➔ Useful to generate new sample we did not think about
➔ Allow to have a double-entry accounting
➔ Audit ability
Not yet complete, to be continued in Icehouse

21

Exciting ideas for Icehouse
we’re going to hack on.

22

General improvements
● Split the collector in two logical pieces
● Rely on notification for samples rather than
RPC
● Bring SQLAlchemy and MongoDB driver
almost on parity
● Support for hardware polling
● Support Ironic

23

API improvements
● Complex filtering and query DSL
x OR y AND z

● /v2/samples
(a.k.a. /v2/meter without the meter)
● Return rate rather than absolute value
● More statistics functions (rate of change,
moving-window averages…)
● Bulk requests

24

Alarming
Exclude low sample counts
● Allow time constrained alarms
●

25

Distributed polling
Leveraging Tooz and Taskflow to distribute
tasks among workers (agents).
★ Ability to distribute the polling
★ Replace alarm evaluator custom distributor

26

OpenStack
Telemetry

Ceilometer

#openstack-ceilometer @ Freenode

The end.

27

Backup slides

28

Heat & auto-scaling

my_stack

Instance

API service
Meter store

queries
stats

reports
samples

Compute
Agent

provides
alarm rules

Alarm
evaluator

Ceilometer

Heat Engine

More Related Content

From Ceilometer to Telemetry: not so alarming!

  • 1. From Ceilometer to Telemetry Not so alarming! A Julien Danjou & Nick Barcet presentation for OpenStack in action! 4 on the 5th December 2013
  • 2. Speakers Nick Barcet VP Products @ eNovance Co-founded the Ceilometer project at the Folsom summit and led the project through incubation Julien Danjou Ceilometer Lead Dev @ eNovance Has been a core Ceilometer contributor from the outset, taking over the PTL reins for Havana
  • 3. State of the project ● Officially named OpenStack Telemetry ● Havana is the first integrated release ● Community growth ○ Grizzly: 30 contributors, 267 commits ○ Havana: 57 contributors, 434 commits
  • 4. What was done during the Havana cycle?
  • 5. UDP transport ● Faster, stateless ● Lighter (msgpack encoding) but… ● No delivery guaranteed ● Not signed ▶ Use case: gathering metrics for alarms
  • 6. Improved API ● Group samples by fields when requesting statistics (?groupby[]=user_id) ● Limit the number of items returned (?limit=42) ● Provides links to other resources in the API
  • 7. Send your own samples Users or operators can send samples ➔ Leverage the statistics ➔ Usable for alarming POST /v2/meters/mymeter [{ "counter_type": "gauge", "counter_unit": "megabyte", "counter_volume": 142.0, "user_id": "efd87807-12d2-4b38-9c705f5c2ac427ff", "project_id": "35b17138-b364-4e6a-a1318f3099c5be68", "resource_id": "bd9431c1-8d69-4ad3-803a8d4a6b89fd36", "resource_metadata": { "name1": "value1", "name2": "value2" }, "source": "mypaasplatform", "timestamp": "2013-09-10T20:34:13.711330" }]
  • 9. Database TTL Previously: No way to purge data. Ceilometer produces a lot of data (gigabytes per day) Now: ceilometer-expirer will drop data older than the configured time-to-live delay
  • 10. Hyper-V ➔ Disk, network and CPU usage
  • 11. New meters ● API endpoints ○ Meters the requests made to API server (Neutron, Glance, Nova, Swift, etc) ● Neutron bandwidth ○ Meter the bandwidth consumed by each project ○ Traffic labeled as configured by operator (based on source/destination)
  • 12. Neutron Traffic Labels Internet label: Ext label: Compute VM VM label: Object VM Swift Swift Swift
  • 13. Alarms Regularly watch for meters statistics values and triggers actions based on threshold crossings.
  • 14. Alarms architecture Ceilometer API R P C H T T P Ceilometer alarm evaluator Webhook, SMS, e-mail… B u s Trigger Trigger Ceilometer Ceilometer alarm notifier Ceilometer alarm notifier alarm notifier
  • 15. Alarm types ● Threshold alarms Triggered once a value crosses a threshold “Call a Webhook as soon as CPU usage goes above 80%” ● Combination alarms Triggered once all alarms in that alarm are triggered “Call a Webhook as soon as alarm “foo” and alarm “bar” are triggered”
  • 16. Alarms API POST /v2/alarms GET /v2/alarms/foobar PUT /v2/alarms/foobar { "alarm_actions": [ "http://site:8000/alarm"], "insufficient_data_actions": ["http://site:8000/nodata"], "ok_actions": ["http://site:8000/ok"], "comparison_operator": "gt", "description": "An alarm", "evaluation_periods": 2, "matching_metadata": {"key_name": "key_value"}, "meter_name": "storage.objects", "name": "SwiftObjectAlarm", "period": 240, "statistic": "avg", "threshold": 200.0 } DELETE /v2/alarms/foobar
  • 17. Heat & auto-scaling API service Heat Engine injects user metadata triggers alarm my_stack Instance Alarm evaluator monitors instances Compute Agent Ceilometer creates alarms
  • 18. Heat & auto-scaling API Heat Engine Alarms injects user metadata my_stack Instance Instance Instance scales out stack Compute Ceilometer alarming
  • 19. Heat & auto-scaling API Heat Engine Alarms injects user metadata my_stack Instance Instance Instance Instance Instance scales out stack Compute Ceilometer alarming
  • 20. Events storage (Almost) all OpenStack components send notifications on events: let’s store them. ➔ Useful to be able to re-generate samples ➔ Useful to generate new sample we did not think about ➔ Allow to have a double-entry accounting ➔ Audit ability Not yet complete, to be continued in Icehouse
  • 21. Exciting ideas for Icehouse we’re going to hack on.
  • 22. General improvements ● Split the collector in two logical pieces ● Rely on notification for samples rather than RPC ● Bring SQLAlchemy and MongoDB driver almost on parity ● Support for hardware polling ● Support Ironic
  • 23. API improvements ● Complex filtering and query DSL x OR y AND z ● /v2/samples (a.k.a. /v2/meter without the meter) ● Return rate rather than absolute value ● More statistics functions (rate of change, moving-window averages…) ● Bulk requests
  • 24. Alarming Exclude low sample counts ● Allow time constrained alarms ●
  • 25. Distributed polling Leveraging Tooz and Taskflow to distribute tasks among workers (agents). ★ Ability to distribute the polling ★ Replace alarm evaluator custom distributor
  • 28. Heat & auto-scaling my_stack Instance API service Meter store queries stats reports samples Compute Agent provides alarm rules Alarm evaluator Ceilometer Heat Engine