Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Yun Cheol Ha
Sr. PostgreSQL Specialist Solutions Architect
Amazon Web Services
Aurora PostgreSQL
Performance Monitoring
and troubleshooting
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
AWS DATA & AI ROADSHOW 2024
2
Agenda
• Performance monitoring approach
• PostgreSQL and Amazon Aurora PostgreSQL architecture
• Amazon Aurora PostgreSQL Performance monitoring tools and
interfaces
• Common Performance monitoring use cases
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Performance Monitoring Approach
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Root cause and symptoms of database performance issues
Poor
Database
Performance
High DB
Connections
High
Memory
Utilization
High
read/write
latency
High I/O
Usage
High CPU
Utilization
High query
execution
time
Lower
throughput
High Active
sessions
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Performance Monitoring Approach
System
Monitoring
Database
Monitoring
Analyzing and
Identifying
bottlenecks
Optimization
Establish Performance Baseline
Monitoring Key Performance Indicators (KPI)
Alert and Notifications
Monitoring and analysis Tools and interface
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
PostgreSQL and Aurora PostgreSQL Architecture
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Understanding PostgreSQL Architecture
Backend
Process
Backend
Process
Backend
Process
Postmaster
Lock Space
Shared Buffers
WAL Buffers
CLOG Buffers
Other Buffers
OS Buffers
Storage
Background Writer, WAL Writer, Checkpointer,
Archiver, Logging Collector, Stats Collector,
AutoVacuum Launcher
Client Libraries
(libpq, JDBC)
Client Application
Shared Memory
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Amazon Aurora PostgreSQL distributed architecture
8
Shared storage volume
Availability
Zone 2
Availability
Zone 3
SQL
Transactions
Caching
SQL
Transactions
Caching
SQL
Transactions
Caching
• Purpose-built log-structured
distributed storage system
designed for databases
• Storage volume is striped
across hundreds of storage
nodes distributed over 3
different availability zones
• Six copies of data, two copies
in each availability zone to
protect against AZ+1 failures
• 10GB of segment size
• Primary and replicas (up to 15)
all point to the same storage
Availability
Zone 1
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
1. Receive log records and add
to in-memory queue
2. Persist records in hot log and ACK
3. Organize records and identify
gaps in log
4. Gossip with peers to fill in holes
5. Coalesce log records into new
page versions
6. Periodically stage log and new
page versions to Amazon S3
7. Periodically garbage collect old versions
8. Periodically validate CRC codes on blocks
Notes:
All steps are asynchronous
Only steps 1 and 2 are
in foreground latency path
Log records
Database
Instance
Incoming queue
Storage node
S3 BACKUP
1
2
3
4
5
6
7
8
Update
queue
ACK
Hot
Log
Data
pages
Continuous backup
GC
Scrub
Sort
group
Peer to peer gossip
Peer
Storage
Nodes
Coalesce
Anatomy of storage node
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Aurora PostgreSQL Performance Monitoring Tools and
interfaces
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Overview of monitoring in Amazon Aurora PostgreSQL
Amazon Aurora comes with comprehensive monitoring built-in
Publishing database logs
(errors, audit, and slow
queries)
to a centralized log store
Query and wait-
level
performance data
Additional database-
specific
metrics at up to 1 second
granularity
Amazon
CloudWatch
Metrics
Amazon
CloudWatch
Logs
Performance
Insights
Enhanced
Monitoring DevOps Guru for RDS
ML-Powered Capability
to detect Anomalous
behavior
Monitor core (CPU, memory) and
transactional (throughput, latency)
metrics
PostgreSQL statistics views for database activity monitoring
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
CloudWatch Metrics
CloudWatch(CW) gathers metrics on the host underlying the RDS database. You can view
these metrics in the RDS console under the monitoring tab.
CloudWatch Metrics:
• CPU Utilization
• DB Connections
• Free Storage
• Free Memory
• Write IOPS
• Read IOPS
• Network Throughput
• BufferCacheHitRatio
• Maximum Used TransactionID
• VolumeBytesUsed
• VolumeWriteIOPs
• Choose time period
• Compare RDS instances
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Enhanced Monitoring
Enhanced Monitoring gathers finer grained OS metrics from an agent installed on the RDS
host.
• By default metrics are stored for 30 days. Governed by RDSOSMetrics log group in
CloudWatch
• Incurs additional CloudWatch costs based on granularity (from 1 to 60 seconds).
• Process list with Total CPU bandwidth (CPU%) and Total memory used (MEM%).
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Using CloudWatch logs
Publishing your Aurora database logs (Postgresql.log and Upgrade.log) to Amazon
CloudWatch Logs is a best practice.
• Ensures logs are preserved in highly durable storage
• Perform real-time analysis of log data
Enable Query Logging
• log_min_duration_statement – Limit in milliseconds for a statement to be logged
• log_statement – Determines which statements are logged [none, all, ddl, dml]
• Note: Too much logging will degrade performance in production systems
• Log_statement doesn’t have correlation with log_min_duration_statement but both are independent
parameters
• Query information is included in postgres.log file
Retention
• rds.log_retention_period (default 3 days)
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Amazon RDS Performance Insights
• Easy and powerful dashboard showing load
on your database
• Uses Average Active Session (AAS) as a
load aggregation method over time
• Helps you identify source of bottlenecks:
• top SQL queries, wait statistics, DB engine counters
• Consolidated Metrics
• CloudWatch, os metrics, db metrics
• Adjustable time frame (hour, day, week,
month)
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Performance Insights Example: Wait Bottleneck
All engines have a connections list showing
– active
– idle
We sample every second
§ For each active session, collect:
– SQL,
– State :CPU, I/O, Lock, Commit log wait, etc.
– Host
– User
Expose as “Average Active Sessions” (AAS)
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Performance Insights – Sampling
Sampling every second for current activity( active, idle etc ) and wait event of each backend
process
Query run: often
Fast query run: rarely
Slow query
User 1
User 2
User 3
Time
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
AAS compared to Max CPU
Max vCPU
Performance Insights – Max vCPU chart
Wait events
AAS < 1 :
Database is not blocked
AAS ~= 0 :
Database basically idle.
Problems are in the APP not
DB
AAS < # of CPUs :
CPU available
Are any single sessions 100%
active?
AAS > # of CPUs :
Could have performance
problems
AAS >> # of CPUS :
There is a bottleneck
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
AWS DATA & AI ROADSHOW 2024
Wait events
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Reference.Waitevents.html
• Wait event shows the type of event for which the backend is waiting.
• This could commonly indicate performance problems.
§ List of the most common wait events
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
20
SQL Statistics in Performance Insights
• Correlating SQL level wait events with database level wait events
• per second statistics and per call statistics
• Calls per sec, Avg latency(ms) per call, Blk hits, Blk reads, Blk writes, Local blk
hits, Temp blk read etc
• The SQL statistics are an average for the selected time range.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
AWS DATA & AI ROADSHOW 2024
Consolidated Metrics Dashboard View
21
• Aurora PostgreSQL database health summary – pre defined
• Custom dashboard
• CloudWatch metrics
• OS metrics
• Database metrics
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Performance Insights Example: Wait Bottleneck
Get contextual help
for wait events
Identify
problem
statement
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
§ PostgreSQL collects and reports information about server activities.
§ pg_locks, pg_stat_user_tables, pg_statio_user_tables
PostgreSQL native monitoring statistics
§ pg_stat_activity view
– One row per server process showing
information related to the current activity of
that process
– Monitoring query connections, query
processing status ( active/idle ), currently
active query start time and wait events
(locks/IOs etc)
§ pg_stat_statements view
– One row for each distinct combination of
database ID, user ID, query ID ( normalized query )
– The module tracks planning and execution
statistics of all the SQL statements executed in
the server
• Identifying slow and top queries
• Cumulative values
– pg_stat_statements extension should be installed
and registered in the database system.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Query Analysis
§ Slow Running Query Plan Logging
– auto_explain module provides a means for Logging execution plans of slow statements without
running EXPLAIN manually.
– auto_explain.log_min_duration
• Sets a minimum statement execution time, in milliseconds, that will cause the statement’s plan to be logged
• Setting this to 0 logs all plans and -1 (the default) disables logging of plans.
• auto_explain.log_analyze can be enabled with auto_explain.log_timing=off to print EXPLAIN ANALYZE output.
– aurora_stat_plans function
• pg_stat_statements.track : none, top(default), all
§ EXPLAIN
– EXPLAIN command returns the query plan, which PostgreSQL chose, and help you understand what
the query planner decides to execute a query.
– The output of EXPLAIN has one line for each node in the plan tree. It shows node type and cost
estimation for each node operation.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
AWS DATA & AI ROADSHOW 2024
• Finds database performance
anomalies
• Analyzes the anomaly
• Highlights
§ Prevalent wait events
§ Prevalent SQL statements
§ Other anomalous metrics
• Recommends next steps
LOCKS
78%
SELECT NAME FROM
CUSTOMERS;
SELECT ITEM FROM F;
MEMORY
What to do
about
locking
issues . . .
Amazon DevOps Guru for RDS
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Common Performance monitoring use cases
https://catalog.us-east-1.prod.workshops.aws/workshops/af784ecc-7ceb-4b86-8c7c-e03cbdbe4e0d/en-US
• Low system memory due to query memory consumption
• Impact of idle connections
• High CPU due to run-away query
• Vacuum not able to cleanup dead tuples
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Low system memory due to query memory consumption
§ Problem
– Our Aurora PostgreSQL writer instance memory is dropping rapidly with less than 10
connections.
– This is an OLTP workload and we expect more users.
– However we are concerned about running out of memory when more users login
§ Symptoms
– Freeable memory dropped to below 5% within 10 minutes of workload startup
– Number of database connections is less than 10
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
§ Monitoring from
Performance Insight
dashboard
1. Database Load
2. Metrics dashboard
• Aurora PostgreSQL
database health summary
• CPU utilization
• Free Memory
• Connection utilization
Low system memory due to query memory consumption
BufferFileRead&Write wait
event
High wait CPU
Low FreeableMemory
Low # of Connections
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Low system memory due to query memory consumption
1. Metrics dashboard
2. Custom dashboard
3. Add Widget
• FreeableMemory under
CloudWatch metrics
• FreeableMemory size
dropped to around
200MB from over 3G
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
§ Major memory consumers in PostgreSQL
1. Global shared memory for data cache
2. Query memory usage
3. Connection memory usage
§ Rule out 1 and 3 cases
Low system memory due to query memory consumption
SELECT name, setting, unit, boot_val, reset_val
FROM pg_settings
WHERE name in ( 'shared_buffers', 'work_mem', 'maintenance_work_mem', 'autovacuum_work_mem', 'logical_decoding_work_mem');
name | setting | unit | boot_val | reset_val
---------------------------+---------+------+----------+-----------
autovacuum_work_mem | 499402 | kB | -1 | 499402
logical_decoding_work_mem | 65536 | kB | 65536 | 65536
maintenance_work_mem | 261120 | kB | 65536 | 261120
shared_buffers | 1309394 | 8kB | 16384 | 1309394
work_mem | 4096 | kB | 4096 | 4096
1
3
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
§ Review Query Memory usage using OS process list from Enhanced monitoring
– RES displays the actual physical memory being used by the process.
– 4 processes and its parallel work processes have high memory usages.
§ Check correlation between the process memory(RSS) and FreeableMemory usage from
Performance Insight Metrics Custom dashboard
Low system memory due to query memory consumption
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
§ Identify the top queries contributing the database load the most from Performance Insight
Dimensions top SQL tab
– The select statement has many block hit and temp writes per second with very high avg latency.
– IPC:HashGrowBatchesAllocate and IPC:HashBuildHashOuter wait events are related to hash join and its
memory allocation.
– IO:BufFileRead and IO:BufFileWrite occur when PostgreSQL creates temporary files.
Low system memory due to query memory consumption
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
§ Review query plan of the top SQL
– This query has suboptimal plan due to outdated statistics on the tsmall table.
Low system memory due to query memory consumption
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
§ Solutions
– Stop query execution
– Increase server memory
– Query Optimization
§ The query is optimized after statistics
update on the table with ANALYZE
command
§ No more Freeable Memory drop
Low system memory due to query memory consumption
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Impact of idle connections
§ Problem
– Our Aurora PostgreSQL is low on memory. We have a low volume, read-only workload
running on the system. How can we find out what is consuming the memory ?
§ Symptoms
– CloudWatch Freeable Memory metric is below 5%.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
§ Monitoring from
Performance Insight
dashboard
1. Database Load
2. Metrics dashboard
• Aurora PostgreSQL
database health summary
• CPU utilization
• Free Memory
• Connection utilization
Impact of idle connections
Hardly see Database load
Very low Nice CPU usage
Low FreeableMemory
Very high # of Connections
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
§ Major memory consumers in PostgreSQL
1. Global shared memory for data cache
2. Query memory usage
3. Connection memory usage
§ Rule out 1 and 2 cases
SELECT name, setting, unit, boot_val, reset_val
FROM pg_settings
WHERE name in ( 'shared_buffers', 'work_mem', 'maintenance_work_mem', 'autovacuum_work_mem', 'logical_decoding_work_mem');
name | setting | unit | boot_val | reset_val
---------------------------+---------+------+----------+-----------
autovacuum_work_mem | 499402 | kB | -1 | 499402
logical_decoding_work_mem | 65536 | kB | 65536 | 65536
maintenance_work_mem | 261120 | kB | 65536 | 261120
shared_buffers | 1309394 | 8kB | 16384 | 1309394
work_mem | 4096 | kB | 4096 | 4096
1
2
Impact of idle connections
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Impact of idle connections
§ Review connection memory usage from Performance Insight Metrics dashboard
§ Add widget from Custom dashboard
• FreeableMemory CloudWatch metric and DB Resident Set Size(RSS) OS metric
• User Max Connection Database metric and DatabaseConnections CloudWatch metric
• Task Total, Tasks Sleeping, Tasks Running OS metrics 1000 database connections
Almost 0 tasks.runnning
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Impact of idle connections
§ Solutions
– Client side Connection pooling
– External Connection pooling such as PgBouncer, Pgpool-II, or RDS Proxy
§ RDS Proxy is deployed with the same workload
– There is only slight Freeable Memory drop.
– Aurora PostgreSQL can handle the same workload with 13 database connections.
– The number of Tasks Running is < 1, which indicates connections are sitting in idle state most of
time. There are opportunity to further reduce RDS Proxy connection pool size.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
High CPU due to run-away query
§ Problem
– Our Aurora PostgreSQL database has been running fine until yesterday. Something
happened today that drove the CPU usage to 100%. Everything takes much longer to
run. Our production system is down because of that. What need help bringing things
back to normal.
§ Symptoms
– CPU Utilization total(Percent) metric is at 100 %.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
§ Monitoring from
Performance Insight
dashboard
1. Database Load
2. Metrics dashboard
• Aurora PostgreSQL
database health summary
• CPU utilization
• Connection utilization
High nice CPU usage
Low # of Connections
High CPU due to run-away query
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
§ High nice CPU usage with 6 database connections
§ The most common reasons that can cause high CPU usage with PostgreSQL
1. High number of active connections
2. Inefficient or run-away query
§ Rule out 1
High CPU due to run-away query
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
§ Review query CPU usage using OS process list from Enhanced monitoring
High CPU due to run-away query
§ Process id 22435 and its parallel worker
processes consume 74.5% of the
instance CPU.
§ Identify the SQL statement by querying
pg_stat_activity dynamic statistics view
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
High CPU due to run-away query
§ Solutions
– Stop query execution
– Increase available CPU
– Query optimization
§ Cancel the SQL
§ CPU utilization goes back
to normal.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Vacuum not able to cleanup dead tuples
§ Problem
– We have been seeing increasing query response time.
• Frequent updates on the tables affected
• autovacuum is enabled
§ Symptoms
– Elevated query response time
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
§ Simulation of a long running transaction blocking vacuum from cleaning up dead tuples
Vacuum not able to cleanup dead tuples
user1 user2
2 x buffer
cache hit
increase
Execution
time
increase
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Vacuum not able to cleanup dead tuples
§ The most common reasons that can increase query execution time
1. Change of query execution plan
2. Autovacuum is turned off
3. Table is bloated
4. Vacuum is blocked, and unable to cleanup dead tuples effectively
• Long-running transactions
• Abandon replication slots
• Orphaned prepare transactions
• Long-running transactions on read replica
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Vacuum not able to cleanup dead tuples
§ Review query plans
– query plans of two runs are identical
Query plan of Run #1
Query plan of Run #4
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Vacuum not able to cleanup dead tuples
§ Review global and table level autovacuum settings
– No table level autovacuum configuration is set for pgbench_accounts table
– Global configuration setting is used for the table
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Vacuum not able to cleanup dead tuples
§ Review table bloat
– The n_live_tup and n_dead_tup indicate that half of the table is bloated.
§ Verify problem with vacuum cleaning up the dead tuples
– 9,999,999 dead tuples are not able to be cleanup.
– Any active long running transaction(s) that are accessing those tuples could prevent vacuum from cleaning
the dead tuples.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Vacuum not able to cleanup dead tuples
§ Identify vacuum blockers
– Check any long running transactions
– pid 28210 has a transaction opened, but not currently executing any commands. It is idle in
transaction state. It has been running for over 40 minutes.
– The SQL from user1 is the in the idle in transaction state.
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Vacuum not able to cleanup dead tuples
§ Identify vacuum blockers
– Check for any abandon replication slots
– Check for any orphaned prepare transactions
– Check for any long running transactions on read replica
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Vacuum not able to cleanup dead tuples
§ Solutions
– End the long running transaction
– Run vacuum verbose on table to manually cleanup dead tuples
– For long term, follow the below practices to avoid future issues
• Setup process for monitoring table bloat
• Avoid long running transaction
• Set idle_in_transaction_session_timeout
§ Manual vacuum after terminating the long running transaction
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
Summary
§ Get used to Aurora PostgreSQL monitoring tools and PostgreSQL native statistics views
– CloudWatch
– Enhanced Monitoring and os process list
– Performance insights and wait events
– Top SQLs
– Consolidated Metrics dashboard
– pg_locks, pg_stat_user_tables, pg_statio_user_tables
§ Review database highest bottlenecks and related Top SQLs and metrics
§ List solutions based on the analysis on the monitoring information
§ Make a decision on the solution and apply it and verify the changes
© 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DATA & AI ROADSHOW 2024
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
Yun Cheol Ha
yuncheol@amazon.com

More Related Content

[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use cases

  • 1. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Yun Cheol Ha Sr. PostgreSQL Specialist Solutions Architect Amazon Web Services Aurora PostgreSQL Performance Monitoring and troubleshooting
  • 2. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 AWS DATA & AI ROADSHOW 2024 2 Agenda • Performance monitoring approach • PostgreSQL and Amazon Aurora PostgreSQL architecture • Amazon Aurora PostgreSQL Performance monitoring tools and interfaces • Common Performance monitoring use cases
  • 3. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Performance Monitoring Approach
  • 4. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Root cause and symptoms of database performance issues Poor Database Performance High DB Connections High Memory Utilization High read/write latency High I/O Usage High CPU Utilization High query execution time Lower throughput High Active sessions
  • 5. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Performance Monitoring Approach System Monitoring Database Monitoring Analyzing and Identifying bottlenecks Optimization Establish Performance Baseline Monitoring Key Performance Indicators (KPI) Alert and Notifications Monitoring and analysis Tools and interface
  • 6. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 PostgreSQL and Aurora PostgreSQL Architecture
  • 7. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Understanding PostgreSQL Architecture Backend Process Backend Process Backend Process Postmaster Lock Space Shared Buffers WAL Buffers CLOG Buffers Other Buffers OS Buffers Storage Background Writer, WAL Writer, Checkpointer, Archiver, Logging Collector, Stats Collector, AutoVacuum Launcher Client Libraries (libpq, JDBC) Client Application Shared Memory
  • 8. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Amazon Aurora PostgreSQL distributed architecture 8 Shared storage volume Availability Zone 2 Availability Zone 3 SQL Transactions Caching SQL Transactions Caching SQL Transactions Caching • Purpose-built log-structured distributed storage system designed for databases • Storage volume is striped across hundreds of storage nodes distributed over 3 different availability zones • Six copies of data, two copies in each availability zone to protect against AZ+1 failures • 10GB of segment size • Primary and replicas (up to 15) all point to the same storage Availability Zone 1
  • 9. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 1. Receive log records and add to in-memory queue 2. Persist records in hot log and ACK 3. Organize records and identify gaps in log 4. Gossip with peers to fill in holes 5. Coalesce log records into new page versions 6. Periodically stage log and new page versions to Amazon S3 7. Periodically garbage collect old versions 8. Periodically validate CRC codes on blocks Notes: All steps are asynchronous Only steps 1 and 2 are in foreground latency path Log records Database Instance Incoming queue Storage node S3 BACKUP 1 2 3 4 5 6 7 8 Update queue ACK Hot Log Data pages Continuous backup GC Scrub Sort group Peer to peer gossip Peer Storage Nodes Coalesce Anatomy of storage node
  • 10. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Aurora PostgreSQL Performance Monitoring Tools and interfaces
  • 11. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Overview of monitoring in Amazon Aurora PostgreSQL Amazon Aurora comes with comprehensive monitoring built-in Publishing database logs (errors, audit, and slow queries) to a centralized log store Query and wait- level performance data Additional database- specific metrics at up to 1 second granularity Amazon CloudWatch Metrics Amazon CloudWatch Logs Performance Insights Enhanced Monitoring DevOps Guru for RDS ML-Powered Capability to detect Anomalous behavior Monitor core (CPU, memory) and transactional (throughput, latency) metrics PostgreSQL statistics views for database activity monitoring
  • 12. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 CloudWatch Metrics CloudWatch(CW) gathers metrics on the host underlying the RDS database. You can view these metrics in the RDS console under the monitoring tab. CloudWatch Metrics: • CPU Utilization • DB Connections • Free Storage • Free Memory • Write IOPS • Read IOPS • Network Throughput • BufferCacheHitRatio • Maximum Used TransactionID • VolumeBytesUsed • VolumeWriteIOPs • Choose time period • Compare RDS instances
  • 13. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Enhanced Monitoring Enhanced Monitoring gathers finer grained OS metrics from an agent installed on the RDS host. • By default metrics are stored for 30 days. Governed by RDSOSMetrics log group in CloudWatch • Incurs additional CloudWatch costs based on granularity (from 1 to 60 seconds). • Process list with Total CPU bandwidth (CPU%) and Total memory used (MEM%).
  • 14. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Using CloudWatch logs Publishing your Aurora database logs (Postgresql.log and Upgrade.log) to Amazon CloudWatch Logs is a best practice. • Ensures logs are preserved in highly durable storage • Perform real-time analysis of log data Enable Query Logging • log_min_duration_statement – Limit in milliseconds for a statement to be logged • log_statement – Determines which statements are logged [none, all, ddl, dml] • Note: Too much logging will degrade performance in production systems • Log_statement doesn’t have correlation with log_min_duration_statement but both are independent parameters • Query information is included in postgres.log file Retention • rds.log_retention_period (default 3 days)
  • 15. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Amazon RDS Performance Insights • Easy and powerful dashboard showing load on your database • Uses Average Active Session (AAS) as a load aggregation method over time • Helps you identify source of bottlenecks: • top SQL queries, wait statistics, DB engine counters • Consolidated Metrics • CloudWatch, os metrics, db metrics • Adjustable time frame (hour, day, week, month)
  • 16. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Performance Insights Example: Wait Bottleneck All engines have a connections list showing – active – idle We sample every second § For each active session, collect: – SQL, – State :CPU, I/O, Lock, Commit log wait, etc. – Host – User Expose as “Average Active Sessions” (AAS)
  • 17. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Performance Insights – Sampling Sampling every second for current activity( active, idle etc ) and wait event of each backend process Query run: often Fast query run: rarely Slow query User 1 User 2 User 3 Time
  • 18. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 AAS compared to Max CPU Max vCPU Performance Insights – Max vCPU chart Wait events AAS < 1 : Database is not blocked AAS ~= 0 : Database basically idle. Problems are in the APP not DB AAS < # of CPUs : CPU available Are any single sessions 100% active? AAS > # of CPUs : Could have performance problems AAS >> # of CPUS : There is a bottleneck
  • 19. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 AWS DATA & AI ROADSHOW 2024 Wait events https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Reference.Waitevents.html • Wait event shows the type of event for which the backend is waiting. • This could commonly indicate performance problems. § List of the most common wait events
  • 20. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 20 SQL Statistics in Performance Insights • Correlating SQL level wait events with database level wait events • per second statistics and per call statistics • Calls per sec, Avg latency(ms) per call, Blk hits, Blk reads, Blk writes, Local blk hits, Temp blk read etc • The SQL statistics are an average for the selected time range.
  • 21. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 AWS DATA & AI ROADSHOW 2024 Consolidated Metrics Dashboard View 21 • Aurora PostgreSQL database health summary – pre defined • Custom dashboard • CloudWatch metrics • OS metrics • Database metrics
  • 22. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Performance Insights Example: Wait Bottleneck Get contextual help for wait events Identify problem statement
  • 23. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 § PostgreSQL collects and reports information about server activities. § pg_locks, pg_stat_user_tables, pg_statio_user_tables PostgreSQL native monitoring statistics § pg_stat_activity view – One row per server process showing information related to the current activity of that process – Monitoring query connections, query processing status ( active/idle ), currently active query start time and wait events (locks/IOs etc) § pg_stat_statements view – One row for each distinct combination of database ID, user ID, query ID ( normalized query ) – The module tracks planning and execution statistics of all the SQL statements executed in the server • Identifying slow and top queries • Cumulative values – pg_stat_statements extension should be installed and registered in the database system.
  • 24. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Query Analysis § Slow Running Query Plan Logging – auto_explain module provides a means for Logging execution plans of slow statements without running EXPLAIN manually. – auto_explain.log_min_duration • Sets a minimum statement execution time, in milliseconds, that will cause the statement’s plan to be logged • Setting this to 0 logs all plans and -1 (the default) disables logging of plans. • auto_explain.log_analyze can be enabled with auto_explain.log_timing=off to print EXPLAIN ANALYZE output. – aurora_stat_plans function • pg_stat_statements.track : none, top(default), all § EXPLAIN – EXPLAIN command returns the query plan, which PostgreSQL chose, and help you understand what the query planner decides to execute a query. – The output of EXPLAIN has one line for each node in the plan tree. It shows node type and cost estimation for each node operation.
  • 25. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 AWS DATA & AI ROADSHOW 2024 • Finds database performance anomalies • Analyzes the anomaly • Highlights § Prevalent wait events § Prevalent SQL statements § Other anomalous metrics • Recommends next steps LOCKS 78% SELECT NAME FROM CUSTOMERS; SELECT ITEM FROM F; MEMORY What to do about locking issues . . . Amazon DevOps Guru for RDS
  • 26. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Common Performance monitoring use cases https://catalog.us-east-1.prod.workshops.aws/workshops/af784ecc-7ceb-4b86-8c7c-e03cbdbe4e0d/en-US • Low system memory due to query memory consumption • Impact of idle connections • High CPU due to run-away query • Vacuum not able to cleanup dead tuples
  • 27. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Low system memory due to query memory consumption § Problem – Our Aurora PostgreSQL writer instance memory is dropping rapidly with less than 10 connections. – This is an OLTP workload and we expect more users. – However we are concerned about running out of memory when more users login § Symptoms – Freeable memory dropped to below 5% within 10 minutes of workload startup – Number of database connections is less than 10
  • 28. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 § Monitoring from Performance Insight dashboard 1. Database Load 2. Metrics dashboard • Aurora PostgreSQL database health summary • CPU utilization • Free Memory • Connection utilization Low system memory due to query memory consumption BufferFileRead&Write wait event High wait CPU Low FreeableMemory Low # of Connections
  • 29. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Low system memory due to query memory consumption 1. Metrics dashboard 2. Custom dashboard 3. Add Widget • FreeableMemory under CloudWatch metrics • FreeableMemory size dropped to around 200MB from over 3G
  • 30. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 § Major memory consumers in PostgreSQL 1. Global shared memory for data cache 2. Query memory usage 3. Connection memory usage § Rule out 1 and 3 cases Low system memory due to query memory consumption SELECT name, setting, unit, boot_val, reset_val FROM pg_settings WHERE name in ( 'shared_buffers', 'work_mem', 'maintenance_work_mem', 'autovacuum_work_mem', 'logical_decoding_work_mem'); name | setting | unit | boot_val | reset_val ---------------------------+---------+------+----------+----------- autovacuum_work_mem | 499402 | kB | -1 | 499402 logical_decoding_work_mem | 65536 | kB | 65536 | 65536 maintenance_work_mem | 261120 | kB | 65536 | 261120 shared_buffers | 1309394 | 8kB | 16384 | 1309394 work_mem | 4096 | kB | 4096 | 4096 1 3
  • 31. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 § Review Query Memory usage using OS process list from Enhanced monitoring – RES displays the actual physical memory being used by the process. – 4 processes and its parallel work processes have high memory usages. § Check correlation between the process memory(RSS) and FreeableMemory usage from Performance Insight Metrics Custom dashboard Low system memory due to query memory consumption
  • 32. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 § Identify the top queries contributing the database load the most from Performance Insight Dimensions top SQL tab – The select statement has many block hit and temp writes per second with very high avg latency. – IPC:HashGrowBatchesAllocate and IPC:HashBuildHashOuter wait events are related to hash join and its memory allocation. – IO:BufFileRead and IO:BufFileWrite occur when PostgreSQL creates temporary files. Low system memory due to query memory consumption
  • 33. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 § Review query plan of the top SQL – This query has suboptimal plan due to outdated statistics on the tsmall table. Low system memory due to query memory consumption
  • 34. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 § Solutions – Stop query execution – Increase server memory – Query Optimization § The query is optimized after statistics update on the table with ANALYZE command § No more Freeable Memory drop Low system memory due to query memory consumption
  • 35. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Impact of idle connections § Problem – Our Aurora PostgreSQL is low on memory. We have a low volume, read-only workload running on the system. How can we find out what is consuming the memory ? § Symptoms – CloudWatch Freeable Memory metric is below 5%.
  • 36. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 § Monitoring from Performance Insight dashboard 1. Database Load 2. Metrics dashboard • Aurora PostgreSQL database health summary • CPU utilization • Free Memory • Connection utilization Impact of idle connections Hardly see Database load Very low Nice CPU usage Low FreeableMemory Very high # of Connections
  • 37. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 § Major memory consumers in PostgreSQL 1. Global shared memory for data cache 2. Query memory usage 3. Connection memory usage § Rule out 1 and 2 cases SELECT name, setting, unit, boot_val, reset_val FROM pg_settings WHERE name in ( 'shared_buffers', 'work_mem', 'maintenance_work_mem', 'autovacuum_work_mem', 'logical_decoding_work_mem'); name | setting | unit | boot_val | reset_val ---------------------------+---------+------+----------+----------- autovacuum_work_mem | 499402 | kB | -1 | 499402 logical_decoding_work_mem | 65536 | kB | 65536 | 65536 maintenance_work_mem | 261120 | kB | 65536 | 261120 shared_buffers | 1309394 | 8kB | 16384 | 1309394 work_mem | 4096 | kB | 4096 | 4096 1 2 Impact of idle connections
  • 38. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Impact of idle connections § Review connection memory usage from Performance Insight Metrics dashboard § Add widget from Custom dashboard • FreeableMemory CloudWatch metric and DB Resident Set Size(RSS) OS metric • User Max Connection Database metric and DatabaseConnections CloudWatch metric • Task Total, Tasks Sleeping, Tasks Running OS metrics 1000 database connections Almost 0 tasks.runnning
  • 39. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Impact of idle connections § Solutions – Client side Connection pooling – External Connection pooling such as PgBouncer, Pgpool-II, or RDS Proxy § RDS Proxy is deployed with the same workload – There is only slight Freeable Memory drop. – Aurora PostgreSQL can handle the same workload with 13 database connections. – The number of Tasks Running is < 1, which indicates connections are sitting in idle state most of time. There are opportunity to further reduce RDS Proxy connection pool size.
  • 40. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 High CPU due to run-away query § Problem – Our Aurora PostgreSQL database has been running fine until yesterday. Something happened today that drove the CPU usage to 100%. Everything takes much longer to run. Our production system is down because of that. What need help bringing things back to normal. § Symptoms – CPU Utilization total(Percent) metric is at 100 %.
  • 41. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 § Monitoring from Performance Insight dashboard 1. Database Load 2. Metrics dashboard • Aurora PostgreSQL database health summary • CPU utilization • Connection utilization High nice CPU usage Low # of Connections High CPU due to run-away query
  • 42. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 § High nice CPU usage with 6 database connections § The most common reasons that can cause high CPU usage with PostgreSQL 1. High number of active connections 2. Inefficient or run-away query § Rule out 1 High CPU due to run-away query
  • 43. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 § Review query CPU usage using OS process list from Enhanced monitoring High CPU due to run-away query § Process id 22435 and its parallel worker processes consume 74.5% of the instance CPU. § Identify the SQL statement by querying pg_stat_activity dynamic statistics view
  • 44. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 High CPU due to run-away query § Solutions – Stop query execution – Increase available CPU – Query optimization § Cancel the SQL § CPU utilization goes back to normal.
  • 45. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Vacuum not able to cleanup dead tuples § Problem – We have been seeing increasing query response time. • Frequent updates on the tables affected • autovacuum is enabled § Symptoms – Elevated query response time
  • 46. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 § Simulation of a long running transaction blocking vacuum from cleaning up dead tuples Vacuum not able to cleanup dead tuples user1 user2 2 x buffer cache hit increase Execution time increase
  • 47. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Vacuum not able to cleanup dead tuples § The most common reasons that can increase query execution time 1. Change of query execution plan 2. Autovacuum is turned off 3. Table is bloated 4. Vacuum is blocked, and unable to cleanup dead tuples effectively • Long-running transactions • Abandon replication slots • Orphaned prepare transactions • Long-running transactions on read replica
  • 48. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Vacuum not able to cleanup dead tuples § Review query plans – query plans of two runs are identical Query plan of Run #1 Query plan of Run #4
  • 49. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Vacuum not able to cleanup dead tuples § Review global and table level autovacuum settings – No table level autovacuum configuration is set for pgbench_accounts table – Global configuration setting is used for the table
  • 50. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Vacuum not able to cleanup dead tuples § Review table bloat – The n_live_tup and n_dead_tup indicate that half of the table is bloated. § Verify problem with vacuum cleaning up the dead tuples – 9,999,999 dead tuples are not able to be cleanup. – Any active long running transaction(s) that are accessing those tuples could prevent vacuum from cleaning the dead tuples.
  • 51. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Vacuum not able to cleanup dead tuples § Identify vacuum blockers – Check any long running transactions – pid 28210 has a transaction opened, but not currently executing any commands. It is idle in transaction state. It has been running for over 40 minutes. – The SQL from user1 is the in the idle in transaction state.
  • 52. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Vacuum not able to cleanup dead tuples § Identify vacuum blockers – Check for any abandon replication slots – Check for any orphaned prepare transactions – Check for any long running transactions on read replica
  • 53. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Vacuum not able to cleanup dead tuples § Solutions – End the long running transaction – Run vacuum verbose on table to manually cleanup dead tuples – For long term, follow the below practices to avoid future issues • Setup process for monitoring table bloat • Avoid long running transaction • Set idle_in_transaction_session_timeout § Manual vacuum after terminating the long running transaction
  • 54. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 Summary § Get used to Aurora PostgreSQL monitoring tools and PostgreSQL native statistics views – CloudWatch – Enhanced Monitoring and os process list – Performance insights and wait events – Top SQLs – Consolidated Metrics dashboard – pg_locks, pg_stat_user_tables, pg_statio_user_tables § Review database highest bottlenecks and related Top SQLs and metrics § List solutions based on the analysis on the monitoring information § Make a decision on the solution and apply it and verify the changes
  • 55. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DATA & AI ROADSHOW 2024 © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! Yun Cheol Ha yuncheol@amazon.com