Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
1
Google Cloud Platform
monitoring with Zabbix
Google Cloud Platform
Max Kuzkin (maxkuzkin@gmail.com)
version 24 | Sep-02-2016
2
Zabbix – Terminology
Host = any monitored device, where Agent (or SNMP, JMX,
IPMI interfaces) are installed
Examples: Linux/Windows Server, Router, etc.
Item = particular metric configured for a particular Host
Examples: system.cpu.load[all,avg5], system.cpu.num[online], script[echo,hello,world],
net.tcp.service[http] …
Key = type of Item that can be gathered from the Host
Examples: system.cpu.load[<cpu>,<mode>], system.cpu.num[<type>],
script[<parameter1>,<parameter2>,...], net.tcp.service[<service>,<ip>,<port>] …
3
Zabbix: Operations Flow
Operations
Agent
system.cpu.load[all,avg5]
system.cpu.num[online]
script[echo,hello,world]
net.tcp.service[http] …
DB
Agent
system.cpu.load[all,avg5]
system.cpu.num[online]
script[echo,hello,world]
net.tcp.service[http] …
Host 2
Host 2
4
Zabbix Key → Item Mapping
Key + Parameters
Data type mapping
Value mapping
Type of Information Data Type (numeric)
5
Extending Zabbix
1. External Checks
Syntax: script[<parameter1>, <parameter2>,...]
Example: check_oracle.sh["-h", "{HOST.CONN}"]
More: https://www.zabbix.com/documentation/2.4/manual/config/items/itemtypes/external
2. User Parameters
Syntax: UserParameter=key[*], command
Example: UserParameter=ping[*], echo $1
More: https://www.zabbix.com/documentation/2.4/manual/config/items/userparameters
Demoed in this presentation
6
Zabbix External Checks
/etc/zabbix/zabbix_server.conf
/usr/lib/zabbix/externalscripts/some-script.sh
7
Monitoring Model
Google Cloud
8
Google Cloud Monitoring
Operations
Service 1
Service 2
DB 1
DB 2
App Engine
Cloud SQL
Google Cloud Platform
Monitoring API
9
Google Monitoring API v3
Try yourself: https://developers.google.com/apis-explorer/#p/monitoring.googleapis.com/v3/
# Method Description
1 collectdTimeSeries.create
Stackdriver Monitoring Agent only: Creates a new time series. This method is only for use by the Google
Monitoring Agent. Use projects.timeSeries.create instead.
2 groups.create Creates a new group.
3 groups.delete Deletes an existing group.
4 groups.get Gets a single group.
5 groups.list Lists the existing groups.
6 groups.members.list Lists the monitored resources that are members of a group.
7 groups.update Updates an existing group. You can change any group attributes except name.
8 metricDescriptors.create
Creates a new metric descriptor. User-created metric descriptors define custom metrics
(/monitoring/custom-metrics).
9 metricDescriptors.delete Deletes a metric descriptor. Only user-created custom metrics (/monitoring/custom-metrics) can be deleted.
10 metricDescriptors.get Gets a single metric descriptor.
11 metricDescriptors.list Lists metric descriptors that match a filter.
12 monitoredResourceDescriptors.get Gets a single monitored resource descriptor.
13 monitoredResourceDescriptors.list Lists monitored resource descriptors that match a filter. This method does not require a Stackdriver account.
14 timeSeries.create Creates or adds data to one or more time series.
15 timeSeries.list Lists time series that match a filter.
10
Some of App Engine Counters
# Counter ID Value Description
1 http/server/response_count DELTA
INT64 Delta HTTP response count.
2 http/server/response_latencies DELTA,
DISTRIBUTION HTTP response latency.
3 system/network/received_bytes_count DELTA
INT64
Delta count of incoming network
bandwidth.
4 system/network/sent_bytes_count DELTA
INT64
Delta count of outgoing network
bandwidth.
5 system/cpu/usage GAUGE
INT64 CPU usage in megacycles.
Much more: https://cloud.google.com/monitoring/api/metrics
Services can also define their own custom counters.
11
timeSeries.list parameters
# Parameter Example
1 Project Name projects/odin-ap
2 Period of aggregation 3600s
3 Overall aggregation rule REDUCE_SUM
4 Fields for grouping metric.label.response_code
5 Series aggregation rule ALIGN_SUM
6 Filter metric.type = "appengine.googleapis.com/http/server/response_count"
AND resource.label.module_id = "service"
AND metric.label.response_code >= 500
AND metric.label.response_code <= 599
7 Start Time 2016-08-27T23:00:00.000Z
8 End Time 2016-08-28T00:00:00.000Z
More on filters: https://cloud.google.com/monitoring/api/v3/filters#comparisons
12
timeSeries.list example call
# Parameter Example
1 Project Name projects/odin-ap
2 Period of aggregation 3600s
3 Overall aggregation rule REDUCE_SUM
4 Fields for grouping metric.label.response_code
5 Series aggregation rule ALIGN_SUM
6 Filter metric.type =
"appengine.googleapis.com/http/server/response_count"
AND resource.label.module_id = "service"
AND metric.label.response_code >= 500
AND metric.label.response_code <= 599
7 Start Time 2016-08-27T23:00:00.000Z
8 End Time 2016-08-28T00:00:00.000Z
13
GCP Monitoring API Command Line
https://github.com/odin-public/gcpmetrics
14
GCP Monitoring API Command Line
$ pip install gcpmetrics
15
Zabbix: External Checks
Tracking HTTP response statuses 5xx
(server errors) count using Google Cloud
Platform Monitoring API v3
16
Web Scenario
Zabbix: Web Monitoring
Zabbix Host
gcpmetrics
External Check
Agent
Detailed
Analysis
DBs
Cloud SQL
Google Cloud Platform
Monitoring
Monitoring
Services
App Engine
Monitoring API v3
DB
Operations
17
Configure Timeouts of Zabbix
•Default installation of Zabbix is configured to terminate all
scripts in 3 seconds. Google Monitoring API may take 5-10+
seconds to respond, depends on the actual query.
•We suggest to set Zabbix Timeout to 30 seconds
Note: Applies to both server
and agent configurations
18
Install gcpmetrics
CentOS example:
$ easy_install pip
$ pip install --upgrade gcpmetrics
Check that installation was successful with:
$ gcpmetrics --version
19
Configure ExternalScripts
$ cd /usr/lib/zabbix/externalscripts
$ gcpmetrics --init-config ./gcp
Creating folder: ./gcp...
Creating configuration file: ./gcp/config.yaml...
Creating key file: ./gcp/keyfile.json...
Configuration initialized, use --config to reference it.
20
Edit config.yaml: project, service, etc.
21
Update keyfile.json
22
Create 2 scripts in
$ cd /externalscripts/
$ vim tm-http5xx-absolute.sh
$ vim tm-http5xx-relative.sh
2 scripts are needed to demo both Absolute and Delta behavior of Zabbix
because it doesn’t allow 2 items to refer to the same script.
23
Create 2 Items
tm-http5xx-absolute.sh tm-http5xx-relative.sh
24
Create Graph with those 2 Items
25
Graph with those 2 Items
absolute
relative
Some 5xx error
happened there
26
Thank You!
Questions?

More Related Content

Google Cloud Platform monitoring with Zabbix

  • 1. 1 Google Cloud Platform monitoring with Zabbix Google Cloud Platform Max Kuzkin (maxkuzkin@gmail.com) version 24 | Sep-02-2016
  • 2. 2 Zabbix – Terminology Host = any monitored device, where Agent (or SNMP, JMX, IPMI interfaces) are installed Examples: Linux/Windows Server, Router, etc. Item = particular metric configured for a particular Host Examples: system.cpu.load[all,avg5], system.cpu.num[online], script[echo,hello,world], net.tcp.service[http] … Key = type of Item that can be gathered from the Host Examples: system.cpu.load[<cpu>,<mode>], system.cpu.num[<type>], script[<parameter1>,<parameter2>,...], net.tcp.service[<service>,<ip>,<port>] …
  • 3. 3 Zabbix: Operations Flow Operations Agent system.cpu.load[all,avg5] system.cpu.num[online] script[echo,hello,world] net.tcp.service[http] … DB Agent system.cpu.load[all,avg5] system.cpu.num[online] script[echo,hello,world] net.tcp.service[http] … Host 2 Host 2
  • 4. 4 Zabbix Key → Item Mapping Key + Parameters Data type mapping Value mapping Type of Information Data Type (numeric)
  • 5. 5 Extending Zabbix 1. External Checks Syntax: script[<parameter1>, <parameter2>,...] Example: check_oracle.sh["-h", "{HOST.CONN}"] More: https://www.zabbix.com/documentation/2.4/manual/config/items/itemtypes/external 2. User Parameters Syntax: UserParameter=key[*], command Example: UserParameter=ping[*], echo $1 More: https://www.zabbix.com/documentation/2.4/manual/config/items/userparameters Demoed in this presentation
  • 8. 8 Google Cloud Monitoring Operations Service 1 Service 2 DB 1 DB 2 App Engine Cloud SQL Google Cloud Platform Monitoring API
  • 9. 9 Google Monitoring API v3 Try yourself: https://developers.google.com/apis-explorer/#p/monitoring.googleapis.com/v3/ # Method Description 1 collectdTimeSeries.create Stackdriver Monitoring Agent only: Creates a new time series. This method is only for use by the Google Monitoring Agent. Use projects.timeSeries.create instead. 2 groups.create Creates a new group. 3 groups.delete Deletes an existing group. 4 groups.get Gets a single group. 5 groups.list Lists the existing groups. 6 groups.members.list Lists the monitored resources that are members of a group. 7 groups.update Updates an existing group. You can change any group attributes except name. 8 metricDescriptors.create Creates a new metric descriptor. User-created metric descriptors define custom metrics (/monitoring/custom-metrics). 9 metricDescriptors.delete Deletes a metric descriptor. Only user-created custom metrics (/monitoring/custom-metrics) can be deleted. 10 metricDescriptors.get Gets a single metric descriptor. 11 metricDescriptors.list Lists metric descriptors that match a filter. 12 monitoredResourceDescriptors.get Gets a single monitored resource descriptor. 13 monitoredResourceDescriptors.list Lists monitored resource descriptors that match a filter. This method does not require a Stackdriver account. 14 timeSeries.create Creates or adds data to one or more time series. 15 timeSeries.list Lists time series that match a filter.
  • 10. 10 Some of App Engine Counters # Counter ID Value Description 1 http/server/response_count DELTA INT64 Delta HTTP response count. 2 http/server/response_latencies DELTA, DISTRIBUTION HTTP response latency. 3 system/network/received_bytes_count DELTA INT64 Delta count of incoming network bandwidth. 4 system/network/sent_bytes_count DELTA INT64 Delta count of outgoing network bandwidth. 5 system/cpu/usage GAUGE INT64 CPU usage in megacycles. Much more: https://cloud.google.com/monitoring/api/metrics Services can also define their own custom counters.
  • 11. 11 timeSeries.list parameters # Parameter Example 1 Project Name projects/odin-ap 2 Period of aggregation 3600s 3 Overall aggregation rule REDUCE_SUM 4 Fields for grouping metric.label.response_code 5 Series aggregation rule ALIGN_SUM 6 Filter metric.type = "appengine.googleapis.com/http/server/response_count" AND resource.label.module_id = "service" AND metric.label.response_code >= 500 AND metric.label.response_code <= 599 7 Start Time 2016-08-27T23:00:00.000Z 8 End Time 2016-08-28T00:00:00.000Z More on filters: https://cloud.google.com/monitoring/api/v3/filters#comparisons
  • 12. 12 timeSeries.list example call # Parameter Example 1 Project Name projects/odin-ap 2 Period of aggregation 3600s 3 Overall aggregation rule REDUCE_SUM 4 Fields for grouping metric.label.response_code 5 Series aggregation rule ALIGN_SUM 6 Filter metric.type = "appengine.googleapis.com/http/server/response_count" AND resource.label.module_id = "service" AND metric.label.response_code >= 500 AND metric.label.response_code <= 599 7 Start Time 2016-08-27T23:00:00.000Z 8 End Time 2016-08-28T00:00:00.000Z
  • 13. 13 GCP Monitoring API Command Line https://github.com/odin-public/gcpmetrics
  • 14. 14 GCP Monitoring API Command Line $ pip install gcpmetrics
  • 15. 15 Zabbix: External Checks Tracking HTTP response statuses 5xx (server errors) count using Google Cloud Platform Monitoring API v3
  • 16. 16 Web Scenario Zabbix: Web Monitoring Zabbix Host gcpmetrics External Check Agent Detailed Analysis DBs Cloud SQL Google Cloud Platform Monitoring Monitoring Services App Engine Monitoring API v3 DB Operations
  • 17. 17 Configure Timeouts of Zabbix •Default installation of Zabbix is configured to terminate all scripts in 3 seconds. Google Monitoring API may take 5-10+ seconds to respond, depends on the actual query. •We suggest to set Zabbix Timeout to 30 seconds Note: Applies to both server and agent configurations
  • 18. 18 Install gcpmetrics CentOS example: $ easy_install pip $ pip install --upgrade gcpmetrics Check that installation was successful with: $ gcpmetrics --version
  • 19. 19 Configure ExternalScripts $ cd /usr/lib/zabbix/externalscripts $ gcpmetrics --init-config ./gcp Creating folder: ./gcp... Creating configuration file: ./gcp/config.yaml... Creating key file: ./gcp/keyfile.json... Configuration initialized, use --config to reference it.
  • 22. 22 Create 2 scripts in $ cd /externalscripts/ $ vim tm-http5xx-absolute.sh $ vim tm-http5xx-relative.sh 2 scripts are needed to demo both Absolute and Delta behavior of Zabbix because it doesn’t allow 2 items to refer to the same script.
  • 24. 24 Create Graph with those 2 Items
  • 25. 25 Graph with those 2 Items absolute relative Some 5xx error happened there