This document provides an overview of five steps to improve PostgreSQL performance: 1) hardware optimization, 2) operating system and filesystem tuning, 3) configuration of postgresql.conf parameters, 4) application design considerations, and 5) query tuning. The document discusses various techniques for each step such as selecting appropriate hardware components, spreading database files across multiple disks or arrays, adjusting memory and disk configuration parameters, designing schemas and queries efficiently, and leveraging caching strategies.
Devrim Gunduz gives a presentation on Write-Ahead Logging (WAL) in PostgreSQL. WAL logs all transactions to files called write-ahead logs (WAL files) before changes are written to data files. This allows for crash recovery by replaying WAL files. WAL files are used for replication, backup, and point-in-time recovery (PITR) by replaying WAL files to restore the database to a previous state. Checkpoints write all dirty shared buffers to disk and update the pg_control file with the checkpoint location.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
This document discusses PostgreSQL replication. It provides an overview of replication, including its history and features. Replication allows data to be copied from a primary database to one or more standby databases. This allows for high availability, load balancing, and read scaling. The document describes asynchronous and synchronous replication modes.
Best Practices for Becoming an Exceptional Postgres DBA EDB
Drawing from our teams who support hundreds of Postgres instances and production database systems for customers worldwide, this presentation provides real-real best practices from the nation's top DBAs. Learn top-notch monitoring and maintenance practices, get resource planning advice that can help prevent, resolve, or eliminate common issues, learning top database tuning tricks for increasing system performance and ultimately, gain greater insight into how to improve your effectiveness as a DBA.
PostgreSQL Replication High Availability MethodsMydbops
This slides illustrates the need for replication in PostgreSQL, why do you need a replication DB topology, terminologies, replication nodes and many more.
This document discusses PostgreSQL statistics and how to use them effectively. It provides an overview of various PostgreSQL statistics sources like views, functions and third-party tools. It then demonstrates how to analyze specific statistics like those for databases, tables, indexes, replication and query activity to identify anomalies, optimize performance and troubleshoot issues.
The document discusses PostgreSQL query planning and tuning. It covers the key stages of query execution including syntax validation, query tree generation, plan estimation, and execution. It describes different plan nodes like sequential scans, index scans, joins, and sorts. It emphasizes using EXPLAIN to view and analyze the execution plan for a query, which can help identify performance issues and opportunities for optimization. EXPLAIN shows the estimated plan while EXPLAIN ANALYZE shows the actual plan after executing the query.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
29回勉強会資料「PostgreSQLのリカバリ超入門」
See also http://www.interdb.jp/pgsql (Coming soon!)
初心者向け。PostgreSQLのWAL、CHECKPOINT、 オンラインバックアップの仕組み解説。
これを見たら、次は→ http://www.slideshare.net/satock/29shikumi-backup
There are many ways to run high availability with PostgreSQL. Here, we present a template for you to create your own customized, high-availability solution using Python and for maximum accessibility, a distributed configuration store like ZooKeeper or etcd.
The document provides an overview of PostgreSQL performance tuning. It discusses caching, query processing internals, and optimization of storage and memory usage. Specific topics covered include the PostgreSQL configuration parameters for tuning shared buffers, work memory, and free space map settings.
This document summarizes Grand Unified Configuration (GUC) parameters in PostgreSQL. It describes how GUC parameters can be modified, the contexts in which modifications can be reverted, and how to view current parameter settings and sources using pg_settings. It provides examples of modifying parameters at different scopes like system-wide, database-level, and for individual users.
This technical presentation shows you the best practices with EDB Postgres tools, that are designed to make database administration easier and more efficient:
● Tune a new database using Postgres Expert
● Set up streaming replication in EDB Postgres Enterprise Manager (PEM)
● Create a backup schedule in EDB Postgres Backup and Recovery
● Automatically failover with EDB Postgres Failover Manager
● Use SQL Profiler and Index Advisor to add indexes
The presentation also included a demonstration. To access the recording visit www.enterprisedb.com and access the webcast recordings section or email info@enterprisedb.com.
Creating a complete disaster recovery strategyMariaDB plc
Jens Bollmann, Principal Consultant at MariaDB, discusses all of the disaster recovery features and tools available in MariaDB, including MariaDB Flashback for point-in-time rollback, MariaDB Backup for incremental backup/restore, delayed replication and dedicated/tiered databases for backups.
MySQL replication has evolved a lot in 5.6 ,5.7 and 8.0. This presentation focus on the changes made in parallel replication. It covers MySQL 8.0. It was presented at Mydbops database meetup on 04-08-2016 in Bangalore.
Architecture for building scalable and highly available Postgres ClusterAshnikbiz
As PostgreSQL has made way into business critical applications, many customers who are using Oracle RAC for high availability and load balancing have asked for similar functionality for using PostgreSQL.
In this Hangout session we would discuss architecture and alternatives, based on real life experience, for achieving high availability and load balancing functionality when you deploy PostgreSQL. We will also present some of the key tools and how to deploy them for effectiveness of this architecture.
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
PostgreSQL is a very popular and feature-rich DBMS. At the same time, PostgreSQL has a set of annoying wicked problems, which haven't been resolved in decades. Miraculously, with just a small patch to PostgreSQL core extending this API, it appears possible to solve wicked PostgreSQL problems in a new engine made within an extension.
- Greenplum Database is an open source relational database system designed for big data analytics. It uses a massively parallel processing (MPP) architecture that distributes data and processing across multiple servers or "segments" to achieve high performance.
- The master node coordinates the segments and handles connections from client applications. It parses queries, generates execution plans, and manages query dispatch, execution and results retrieval.
- Segments store and process data in parallel. They each have their own storage, memory and CPU resources in a "shared nothing" architecture to ensure scalability.
Optimizing MariaDB for maximum performanceMariaDB plc
When it comes to optimizing the performance of a database, DBAs have to look at everything from the OS to the network. In this session, MariaDB Enterprise Architect Manjot Singh shares best practices for getting the most out of MariaDB. He highlights recommended OS settings, important configuration and tuning parameters, options for improving replication and clustering performance and features such as query result caching.
The document discusses tools for troubleshooting database performance issues. It describes operating system tools like ps, vmstat, iostat that can help identify hardware and resource bottlenecks. It also covers PostgreSQL-specific tools like the pg_stat views and logs that provide insight into query performance and activity. Benchmarks like pgbench, bonnie++, and the more complex DBT2 are presented as options for reproducing and analyzing problems in a controlled way. The overall approach presented is to start with less invasive tools and progress to more targeted benchmarks if needed to pinpoint severe issues.
The document provides an overview of five steps to optimize PostgreSQL performance: 1) application design, 2) query tuning, 3) hardware/OS configuration, 4) PostgreSQL configuration, and 5) caching. It discusses best practices for schema design, indexing, queries, transactions, and connection management to improve performance. Key recommendations include normalizing schemas, indexing commonly used columns, batching queries and transactions, using prepared statements, and implementing caching at multiple levels.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
29回勉強会資料「PostgreSQLのリカバリ超入門」
See also http://www.interdb.jp/pgsql (Coming soon!)
初心者向け。PostgreSQLのWAL、CHECKPOINT、 オンラインバックアップの仕組み解説。
これを見たら、次は→ http://www.slideshare.net/satock/29shikumi-backup
There are many ways to run high availability with PostgreSQL. Here, we present a template for you to create your own customized, high-availability solution using Python and for maximum accessibility, a distributed configuration store like ZooKeeper or etcd.
The document provides an overview of PostgreSQL performance tuning. It discusses caching, query processing internals, and optimization of storage and memory usage. Specific topics covered include the PostgreSQL configuration parameters for tuning shared buffers, work memory, and free space map settings.
This document summarizes Grand Unified Configuration (GUC) parameters in PostgreSQL. It describes how GUC parameters can be modified, the contexts in which modifications can be reverted, and how to view current parameter settings and sources using pg_settings. It provides examples of modifying parameters at different scopes like system-wide, database-level, and for individual users.
This technical presentation shows you the best practices with EDB Postgres tools, that are designed to make database administration easier and more efficient:
● Tune a new database using Postgres Expert
● Set up streaming replication in EDB Postgres Enterprise Manager (PEM)
● Create a backup schedule in EDB Postgres Backup and Recovery
● Automatically failover with EDB Postgres Failover Manager
● Use SQL Profiler and Index Advisor to add indexes
The presentation also included a demonstration. To access the recording visit www.enterprisedb.com and access the webcast recordings section or email info@enterprisedb.com.
Creating a complete disaster recovery strategyMariaDB plc
Jens Bollmann, Principal Consultant at MariaDB, discusses all of the disaster recovery features and tools available in MariaDB, including MariaDB Flashback for point-in-time rollback, MariaDB Backup for incremental backup/restore, delayed replication and dedicated/tiered databases for backups.
MySQL replication has evolved a lot in 5.6 ,5.7 and 8.0. This presentation focus on the changes made in parallel replication. It covers MySQL 8.0. It was presented at Mydbops database meetup on 04-08-2016 in Bangalore.
Architecture for building scalable and highly available Postgres ClusterAshnikbiz
As PostgreSQL has made way into business critical applications, many customers who are using Oracle RAC for high availability and load balancing have asked for similar functionality for using PostgreSQL.
In this Hangout session we would discuss architecture and alternatives, based on real life experience, for achieving high availability and load balancing functionality when you deploy PostgreSQL. We will also present some of the key tools and how to deploy them for effectiveness of this architecture.
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
PostgreSQL is a very popular and feature-rich DBMS. At the same time, PostgreSQL has a set of annoying wicked problems, which haven't been resolved in decades. Miraculously, with just a small patch to PostgreSQL core extending this API, it appears possible to solve wicked PostgreSQL problems in a new engine made within an extension.
- Greenplum Database is an open source relational database system designed for big data analytics. It uses a massively parallel processing (MPP) architecture that distributes data and processing across multiple servers or "segments" to achieve high performance.
- The master node coordinates the segments and handles connections from client applications. It parses queries, generates execution plans, and manages query dispatch, execution and results retrieval.
- Segments store and process data in parallel. They each have their own storage, memory and CPU resources in a "shared nothing" architecture to ensure scalability.
Optimizing MariaDB for maximum performanceMariaDB plc
When it comes to optimizing the performance of a database, DBAs have to look at everything from the OS to the network. In this session, MariaDB Enterprise Architect Manjot Singh shares best practices for getting the most out of MariaDB. He highlights recommended OS settings, important configuration and tuning parameters, options for improving replication and clustering performance and features such as query result caching.
The document discusses tools for troubleshooting database performance issues. It describes operating system tools like ps, vmstat, iostat that can help identify hardware and resource bottlenecks. It also covers PostgreSQL-specific tools like the pg_stat views and logs that provide insight into query performance and activity. Benchmarks like pgbench, bonnie++, and the more complex DBT2 are presented as options for reproducing and analyzing problems in a controlled way. The overall approach presented is to start with less invasive tools and progress to more targeted benchmarks if needed to pinpoint severe issues.
The document provides an overview of five steps to optimize PostgreSQL performance: 1) application design, 2) query tuning, 3) hardware/OS configuration, 4) PostgreSQL configuration, and 5) caching. It discusses best practices for schema design, indexing, queries, transactions, and connection management to improve performance. Key recommendations include normalizing schemas, indexing commonly used columns, batching queries and transactions, using prepared statements, and implementing caching at multiple levels.
The document discusses performance troubleshooting for databases. It provides an overview of common issues ("moles") that can impact database performance and tools/techniques for identifying and resolving them. Some key points:
- Most database performance issues are not actually problems with the database itself but other areas like hardware, OS, middleware, or application code.
- A small number (less than 10%) of issues usually account for the vast majority (90%) of performance degradation.
- The first steps in troubleshooting are establishing a baseline configuration and gathering performance metrics from across the full software stack using tools like OS monitoring utilities, database admin views, and benchmarks.
- Common types of performance issues ("moles") include
This document summarizes the results of benchmarking PostgreSQL database performance on several cloud platforms, including AWS EC2, RDS, Google Compute Engine, DigitalOcean, Rackspace, and Heroku.
The benchmarks tested small and large instance sizes across the clouds on different workload types, including in-memory and disk-based transactions and queries. Key metrics measured were transactions per second (TPS), load time to set up the database, and cost per TPS and load bandwidth.
The results show large performance and cost variations between clouds and instance types. In general, dedicated instances like EC2 outperformed shared instances, and DBaaS options like RDS were more expensive but offered higher availability. The document discusses challenges
On X86 systems, using an Unbreakable Enterprise Kernel (UEK) is recommended over other enterprise distributions as it provides better hardware support, security patches, and testing from the larger Linux community. Key configuration recommendations include enabling maximum CPU performance in BIOS, using memory types validated by Oracle, ensuring proper NUMA and CPU frequency settings, and installing only Oracle-validated packages to avoid issues. Monitoring tools like top, iostat, sar and ksar help identify any CPU, memory, disk or I/O bottlenecks.
Data deduplication is a hot topic in storage and saves significant disk space for many environments, with some trade offs. We’ll discuss what deduplication is and where the Open Source solutions are versus commercial offerings. Presentation will lean towards the practical – where attendees can use it in their real world projects (what works, what doesn’t, should you use in production, etcetera).
This document provides tips for optimizing performance with SAP Sybase IQ. It discusses sizing recommendations for memory, CPUs, storage and configuration options. Key aspects of sizing include allocating 4-8GB of RAM per core and 75% of RAM to IQ caches. For load performance, 1 CPU can load 10-20GB of data per hour, while queries typically use 1-2 CPUs. The document also covers index types in IQ and considerations for when to apply indexes.
Kafka on ZFS: Better Living Through Filesystems confluent
(Hugh O'Brien, Jet.com) Kafka Summit SF 2018
You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!)
Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka.
-Striping cheap disks to maximize instance IOPS
-Block compression to reduce disk usage by ~80% (JSON data)
-Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments
-Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free
We’ll cover:
-Basic Principles
-Adapting ZFS for cloud instances (gotchas)
-Performance tuning for Kafka
-Benchmarks
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
This document discusses Dell's support for CEPH storage solutions and provides an agenda for a CEPH Day event at Dell. Key points include:
- Dell is a certified reseller of Red Hat-Inktank CEPH support, services, and training.
- The agenda covers why Dell supports CEPH, hardware recommendations, best practices shared with CEPH colleagues, and a concept for research data storage that is seeking input.
- Recommended CEPH architectures, components, configurations, and considerations are discussed for planning and implementing a CEPH solution. Dell server hardware options that could be used are also presented.
The document discusses establishing a performance baseline for a PostgreSQL database. It recommends gathering hardware, operating system, database, and application configuration details. The baseline involves configuring these layers with generally recommended settings, including updating hardware/OS, using appropriate filesystem and PostgreSQL configuration settings, and setting up regular maintenance tasks. Establishing a baseline configuration helps identify potential performance issues and allows comparison to other systems.
MongoDB stores data in files on disk that are broken into variable-sized extents containing documents. These extents, as well as separate index structures, are memory mapped by the operating system for efficient read/write. A write-ahead journal is used to provide durability and prevent data corruption after crashes by logging operations before writing to the data files. The journal increases write performance by 5-30% but can be optimized using a separate drive. Data fragmentation over time can be addressed using the compact command or adjusting the schema.
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
The document summarizes a presentation given by representatives from various companies on optimizing Ceph for high-performance solid state drives. It discusses testing a real workload on a Ceph cluster with 50 SSD nodes that achieved over 280,000 read and write IOPS. Areas for further optimization were identified, such as reducing latency spikes and improving single-threaded performance. Various companies then described their contributions to Ceph performance, such as Intel providing hardware for testing and Samsung discussing SSD interface improvements.
The document summarizes a presentation on optimizing Linux, Windows, and Firebird for heavy workloads. It describes two customer implementations using Firebird - a medical company with 17 departments and over 700 daily users, and a repair services company with over 500 daily users. It discusses tuning the operating system, hardware, CPU, RAM, I/O, network, and Firebird configuration to improve performance under heavy loads. Specific recommendations are provided for Linux and Windows configuration.
Gluster for Geeks: Performance Tuning Tips & TricksGlusterFS
This document summarizes a webinar on performance tuning tips and tricks for GlusterFS. The webinar covered planning cluster hardware configuration to meet performance requirements, choosing the correct volume type for workloads, key tuning parameters, benchmarking techniques, and the top 5 causes of performance issues. The webinar provided guidance on optimizing GlusterFS performance through hardware sizing, configuration, implementation best practices, and tuning.
MariaDB Server Performance Tuning & OptimizationMariaDB plc
This document discusses various techniques for optimizing MariaDB server performance, including:
- Tuning configuration settings like the buffer pool size, query cache size, and thread pool settings.
- Monitoring server metrics like CPU usage, memory usage, disk I/O, and MariaDB-specific metrics.
- Analyzing slow queries with the slow query log and EXPLAIN statements to identify optimization opportunities like adding indexes.
The document discusses disk I/O performance in SQL Server 2005. It begins with some questions about which queries and RAID configurations would affect disk I/O the most. It then covers the basics of I/O and different RAID levels, their pros and cons. The document provides an overview of monitoring physical and logical disk performance, and offers tips on tuning disk I/O performance when bottlenecks occur. It concludes with resources for further information.
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red_Hat_Storage
Red Hat Ceph Storage can utilize flash technology to accelerate applications in three ways: 1) utilize flash caching to accelerate critical data writes and reads, 2) utilize storage tiering to place performance critical data on flash and less critical data on HDDs, and 3) utilize all-flash storage to accelerate performance when all data is critical or caching/tiering cannot be used. The document then discusses best practices for leveraging NVMe SSDs versus SATA SSDs in Ceph configurations and optimizing Linux settings.
This session will cover performance-related developments in Red Hat Gluster Storage 3 and share best practices for testing, sizing, configuration, and tuning.
Join us to learn about:
Current features in Red Hat Gluster Storage, including 3-way replication, JBOD support, and thin-provisioning.
Features that are in development, including network file system (NFS) support with Ganesha, erasure coding, and cache tiering.
New performance enhancements related to the area of remote directory memory access (RDMA), small-file performance, FUSE caching, and solid state disks (SSD) readiness.
The document summarizes the results of benchmarking and comparing the performance of PostgreSQL databases hosted on Amazon EC2, RDS, and Heroku. It finds that EC2 provides the most configuration options but requires more management, RDS offers simplified deployment but less configuration options, and Heroku requires no management but has limited configuration and higher costs. Benchmark results show EC2 performing best for raw performance while RDS and Heroku trade off some performance for manageability. Heroku was the most expensive option.
This document provides a summary of a presentation on becoming an accidental PostgreSQL database administrator (DBA). It covers topics like installation, configuration, connections, backups, monitoring, slow queries, and getting help. The presentation aims to help those suddenly tasked with DBA responsibilities to not panic and provides practical advice on managing a PostgreSQL database.
Howdah - An Application using Pylons, PostgreSQL, Simpycity and ExceptableCommand Prompt., Inc
Aurynn Shaw
This mini-tutorial covers building a small application on Howdah, an open source, Python based web development framework by Commandprompt, Inc. We will cover the full process of designing a vertically coherent application on Howdah, integrating DB-level stored procedures, DB exception propagation through Exceptable, DB access through Simpycity, authentication through repoze.who, permissions through VerticallyChallenged, and application views through Pylons. By the end of the talk, we will have covered a full application built on The Stack, and how to cover common pitfalls in using Howdah components.
The document discusses PostgreSQL backup and recovery options including:
- pg_dump and pg_dumpall for creating database and cluster backups respectively.
- pg_restore for restoring backups in various formats.
- Point-in-time recovery (PITR) which allows restoring the database to a previous state by restoring a base backup and replaying write-ahead log (WAL) segments up to a specific point in time.
- The process for enabling and performing PITR including configuring WAL archiving, taking base backups, and restoring from backups while replaying WAL segments.
This document provides an overview of advanced PostgreSQL administration topics covered in a presentation, including installation, initialization, configuration, starting and stopping the Postmaster, connections, authentication, security, data directories, shared memory sizing, the write-ahead log, and vacuum settings. The document includes configuration examples from postgresql.conf and discusses parameters for tuning memory usage, connections, authentication and security.
Scott Bailey
Few things we model in our databases are as complicated as time. The major database vendors have struggled for years with implementing the base data types to represent time. And the capabilities and functionality vary wildly among databases. Fortunately PostgreSQL has one of the best implementations out there. We will look at PostgreSQL's core functionality, discuss temporal extensions, modeling temporal data, time travel and bitemporal data.
PostgreSQL Replication describes PostgreSQL Replicator, an open source solution for replicating PostgreSQL databases. Key features include asynchronous replication from a master to multiple slaves, supporting various replication types like role, grant, and large object replication without triggers. Replicator uses a Master Control Process to manage replication between nodes. It allows unlimited slaves without impacting the master and operates on any server.
PostgreSQL Replication describes PostgreSQL Replicator, an open source solution for replicating PostgreSQL databases. Key features include asynchronous replication from a master to multiple slaves, supporting various replication types like role, grant, and large object replication without triggers. Replicator uses a Master Control Process to manage replication between nodes. It allows unlimited slaves without impacting the master and operates on any server.
Bruce Momjian
Pg_Migrator allows data to be transfered between major Postgres versions
without a dump/restore. This talk explains the internal workings of
pg_migrator and includes a pg_migrator demonstration
Adrian Klaver
An exploration of various Python projects (PyRTF,ReportLab,xlwt) that help with presenting your data in formats (rtf,pdf,xls) that other people want. I will step through a simple data extraction and conversion process using the above software to create an RTF,PDF and XLS file respectively.
PostgreSQL, Extensible to the Nth Degree: Functions, Languages, Types, Rules,...Command Prompt., Inc
Jeff Davis
I'll be showing how the extensible pieces of PostgreSQL fit together to give you the full power of native functionality -- including performance. These pieces, when combined, make PostgreSQL able to do almost anything you can imagine. A variety add-ons have been very successful in PostgreSQL merely by using this extensibility. Examples in this talk will range from PostGIS (a GIS extension for PostgreSQL) to DBI-Link (manage any data source accessible via perl DBI).
The document discusses using pg_proctab, a PostgreSQL extension that provides functions to query operating system process and statistics tables from within PostgreSQL. It demonstrates how to use pg_proctab to monitor CPU and memory usage, I/O, and other process-level metrics for queries. The document also shows how to generate custom reports on database activity and performance by taking snapshots before and after queries and analyzing the differences.
Jeff Davis
UNIQUE indexes have long held a unique position among constraints: they are the only way to express a constraint that two tuples in a table conflict without resorting to triggers and locks (which severely impact performance). But what if you want to impose the constraint that one person can't be in two places at the same time? In other words, you have a schedule, and you want to be sure that two periods of time for the same person do not overlap. This is nearly impossible to do efficiently with the current version of PostgreSQL -- and most other database systems. I will be presenting Generalized Index Constraints, which is being submitted for inclusion in the next PostgreSQL release, along with the PERIOD data type (available now from PgFoundry). I will show how these can, together, offer a fast, scalable, and highly concurrent solution to a very common business requirement. A business requirement is still a requirement even if your current database system can't do it!
Implementing the Future of PostgreSQL Clustering with TungstenCommand Prompt., Inc
Robert Hodges
Users have traditionally used database clusters to solve database availability and performance requirements. However, clustering requirements are changing as hardware improvements make performance concerns obsolete for many users. In this talk I will discuss how the Tungsten project uses master/slave replication, group communications, and rules processing to develop easy-to-manage database clusters that solve database availability, protect data, and address hardware utilization. Our implementation is based on existing PostgreSQL capabilities like Londiste and WAL shipping, which we eventually plan to replace with our own log-based replication. Come see the future of database clustering with Tungsten!
Josh Berkus
Most users know that PostgreSQL has a 23-year development history. But did you know that Postgres code is used for over a dozen other database systems? Thanks to our liberal licensing, many companies and open source projects over the years have taken the Postgres or PostgreSQL code, changed it, added things to it, and/or merged it into something else. Illustra, Truviso, Aster, Greenplum, and others have seen the value of Postgres not just as a database but as some darned good code they could use. We'll explore the lineage of these forks, and go into the details of some of the more interesting ones.
Joshua D. Drake
Are you tired of not having a real solution for PITR? Enter PITRTools, a single and secure solution for using Point In Time Recovery for PostgreSQL.
This document provides an overview of Bucardo, an open source tool for replicating and synchronizing PostgreSQL databases. Bucardo uses triggers and asynchronous notifications to replicate data changes between a master and slave database. It allows custom filtering and processing of replication events. The document discusses Bucardo's architecture, installation, configuration, administration, and limitations.
Matt Smiley
This is a basic primer aimed primarily at developers or DBAs new to Postgres. The format is a Q/A style tour with examples, based on common questions and pitfalls. Begin with a quick tour of relevant parts of the postgres catalog, with an aim to answer simple but important questions like:
How many rows does the optimizer think my table has?
When was it last analyzed?
Which other tables also have a column named "foo"?
How often is this index used?
Rod Anderson
For the small business support person being able to provide PostgreSQL hosting for a small set of specific applications without having to build and support several Pg installations is necessary. By building a multi-tenant Pg cluster with one tenant per database and each application in it's own schema maintenance and support is much simpler. The issues that present themselves are how to provide and control dba and user access to the database and get the applications into their own schema. With this comes need to make logging in to the database (pg_hba.conf) as non-complex as possible.
The document discusses the history of database normalization. It explains the concepts of first normal form (1NF), second normal form (2NF), and third normal form (3NF). 1NF requires eliminating duplicate columns and creating separate tables for related data. 2NF builds on 1NF by removing subsets of data that apply to multiple rows. 3NF builds on 1NF and 2NF by removing columns not dependent on the primary key. The document notes that fourth and fifth normal forms are not commonly used, while sixth normal form only applies to alien databases. It concludes by stating that denormalization is key to data warehousing.
The document provides a historical overview of databases from the 1950s to present. It describes the earliest databases that were directly linked to applications in memory, then the development of network, hierarchical and relational database models. It discusses Edgar Codd's influential paper on relational database theory in 1970 and the emergence of relational database management systems. The summary traces key events like the rise of SQL and impact of the personal computer and internet on databases.
Leo Hsu and Regina Obe
We'll demonstrate integrating PostGIS in both PHP and ASP.NET applications.
We'll demonstrate using the new PostGIS 1.5 geography offering to extend existing web applications with proximity analysis.
More advanced use to display maps and stats using OpenLayers, WMS/WFS services and roll your own WFS like service using the PostGIS KML/GML/and or GeoJSON output functions.
6. What Flavor is Your DB? O
1
W ►Web Application (Web)
● DB smaller than RAM
● 90% or more “one-liner” queries
O ►Online Transaction Processing (OLTP)
● DB slightly larger than RAM to 1TB
● 20-40% small data write queries, some large transactions
D ►Data Warehousing (DW)
● Large to huge databases (100GB to 100TB)
● Large complex reporting queries
● Large bulk loads of data
● Also called "Decision Support" or "Business Intelligence"
7. P.E. Tips O
1
►Engineer for the problems you have
● not for the ones you don't
►A little overallocation is cheaper than downtime
● unless you're an OEM, don't stint a few GB
● resource use will grow over time
►Test, Tune, and Test Again
● you can't measure performance by “it seems fast”
►Most server performance is thresholded
● “slow” usually means “25x slower”
● it's not how fast it is, it's how close you are to capacity
9. Hardware Basics
►Four basic components:
● CPU
● RAM
● I/O: Disks and disk bandwidth
● Network
►Different priorities for different applications
● Web: CPU, Netowrk, RAM, ... I/O W
● OLTP: balance all O
● DW: I/O, CPU, RAM D
10. Getting Enough CPU 1
►Most applications today are CPU-bound
● even I/O takes CPU
►One Core, One Query
● PostgreSQL is a multi-process application
▬ Except for IOwaits, each core can only process one query at a
time.
▬ How many concurrent queries do you need?
● Best performance at 1 core per no more than two concurrent
queries
►So if you can up your core count, do
● you don't have to pay for licenses for the extra cores!
11. CPU Tips 1
►CPU
● SMP scaling isn't perfect; fewer faster cores is usually better
than more slower ones
▬ exception: highly cachable web applications W
▬ more processors with less cores each should perform better
● CPU features which matter
▬ Speed
▬ Large L2 cache helps with large data
▬ 64-bit performance can be 5-20% better
– especially since it lets you use large RAM
– but sometimes it isn't an improvement
12. Getting Enough RAM 1
►RAM use is "thresholded"
● as long as you are above the amount of RAM you need, even
1%, server will be fast
● go even 1% over and things slow down a lot
►Critical RAM thresholds
● Do you have enough RAM to keep the database in W
shared_buffers?
▬ Ram 6x the size of DB
● Do you have enough RAM to cache the whole database? O
▬ RAM 2x to 3x the on-disk size of the database
● Do you have enough RAM for sorts & aggregates? D
▬ What's the largest data set you'll need to work with?
▬ For how many users
13. Other RAM Issues 1
►Get ECC RAM
● Better to know about bad RAM before it corrupts your data.
►What else will you want RAM for?
● RAMdisk?
● SWRaid?
● Applications?
14. Getting Enough I/O 1
►Will your database be I/O Bound?
● many writes: bound by transaction log
● database 3x larger than RAM: bound by I/O for every query
►Optimize for the I/O you'll need
● if you DB is terabytes, spend most of your money on disks
● calculate how long it will take to read your entire database
from disk
● don't forget the transaction log!
15. I/O Decision Tree 1
lots of fits in
No Yes mirrored
writes? RAM?
Yes No
afford
terabytes HW RAID
good HW Yes No
of data?
RAID?
Yes
No
mostly
SW RAID SAN/NAS read?
Yes No
RAID 5 RAID 1+0
16. I/O Tips 1
►RAID
● get battery backup and turn your write cache on
● SAS has 2x the real throughput of SATA
● more spindles = faster database
▬ big disks are generally slow
►SAN/NAS
● measure lag time: it can kill response time
● how many channels?
▬ “gigabit” is only 100mb/s
▬ make sure multipath works
● use fiber if you can afford it
17. SSD: Not There Yet 1
►Fast
● 1 SSD as fast as a 4-drive RAID
● low-energy and low-profile
►But not reliable
● MTF in months or weeks
● Mainly good for static data
● Seeks are supposed to be as fast as scans …
▬ but they're not
►Don't rely on SSD now
● but you will be using it next year
18. Network 1
►Network can be your bottleneck
● lag time
● bandwith
● oversubscribed switches
►Have dedicated connections
● between appserver and database server
● between database server and failover server
● multiple interfaces!
►Data Transfers
● Gigabit is 100MB/s
● Calculate capacity for data copies, standby, dumps
19. The Most Important
Hardware Advice:
1
►Quality matters
● not all CPUs are the same
● not all RAID cards are the same
● not all server systems are the same
● one bad piece of hardware, or bad driver, can destroy your
application performance
►High-performance databases means hardware
expertise
● the statistics don't tell you everything
● vendors lie
● you will need to research different models and combinations
● read the pgsql-performance mailing list
20. The Most Important
Hardware Advice:
1
►So Test, Test, Test!
● CPU: PassMark, sysbench, Spec CPU
● RAM: memtest, cachebench, Stream
● I/O: bonnie++, dd, iozone
● Network: bwping, netperf
● DB: pgBench, sysbench
►Make sure you test your hardware before you
put your database on it
● “Try before you buy”
● Never trust the vendor or your sysadmins
22. Spread Your Files Around 1
2
►Separate the transaction log if possible O D
● pg_xlog directory
● on a dedicated disk/array, performs 10-50% faster
● many WAL options only work if you have a separate drive
number of drives/arrays 1 2 3
which partition
OS/applications 1 1 1
transaction log 1 1 2
database 1 2 3
23. Spread Your Files Around 1
2
►Tablespaces for large tables O D
● try giving the most used table/index its own tablespace & disk
▬ if that table gets more transactions than any other
▬ if that table is larger than any other
▬ having tables and indexes in separate tablespaces helps with
very large tables
● however, often not worth the headache for most applications
24. Linux Tuning 1
2
►Filesystems
● XFS & JFS are best in OLTP tests O
▬ but can be unstable on RHEL
● Otherwise, use Ext3
● Reduce logging
▬ data=writeback, noatime, nodiratime
►OS tuning
● must increase shmmax, shmall in kernel
● use deadline scheduler to speed writes O
● check your kernel version carefully for performance issues!
▬ any 2.6 before 2.6.9 is bad
25. Solaris Tuning 1
2
►Filesystems
● ZFS for very large DBs D
● UFS for everything else W O
● Mount the transaction log on a partition forcedirectio
▬ even if it's on the same disk
● turn off full_page_writes with UFS
►OS configuration
● no need to configure shared memory, semaphores in Solaris
10
● compile PostgreSQL with aggressive optimization using Sun
Studio 11/12
26. FreeBSD Tuning 1
2
►Filesystems
● Increase readahead on the FS O D
vfs.read_max = 64
►OS tuning
● need to increase shmall, shmmax and semaphores:
kernel.ipc.shmmax = (1/3 RAM in Bytes)
kernel.ipc.shmall = (1/3 RAM in pages)
kernel.ipc.semmap = 256
kernel.ipc.semmni = 256 W O D
kernel.ipc.semmns = 512
kernel.ipc.semmnu = 256
28. Set up Monitoring! 1
2
►Get warning ahead of time
● know about performance problems before they go critical
● set up alerts
▬ 80% of capacity is an emergency!
● set up trending reports
▬ is there a pattern of steady growth?
►Monitor everything
● cpu / io / network load
● disk space & memory usage
►Use your favorite tools
● nagios, cacti, reconnitor, Hyperic, OpenNMS
30. shared_buffers 3
1
►Increase: how much?
● shared_buffers are usually a minority of RAM
▬ use filesystem cache for data
● but should be large: 1/4 of RAM on a dedicated server
▬ as of 8.1, no reason to worry about too large
● cache_miss statistics can tell you if you need more
● more buffers needed especially for:
▬ many concurrent queries
W O
▬ many CPUs
31. Other memory parameters 3
1
►work_mem
● non-shared
▬ lower it for many connections W O
▬ raise it for large queries D
● watch for signs of misallocation
▬ swapping RAM: too much work_mem
▬ log temp files: not enough work_mem
● probably better to allocate by task/ROLE
32. Other memory parameters 3
1
►maintenance_work_mem
● the faster vacuum completes, the better
▬ but watch out for multiple autovacuum workers!
● raise to 256MB to 1GB for large databases
● also used for index creation
▬ raise it for bulk loads
33. Commits 3
1
►wal_buffers
● raise it to 8MB for SMP systems
►checkpoint_segments
● more if you have the disk: 16, 64, 128
►synchronous_commit W
● response time more important than data integrity?
● turn synchronous_commit = off
● lose a finite amount of data in a shutdown
►effective_io_concurrency
● set to number of disks or channels
34. Query tuning 3
1
►effective_cache_size
● RAM available for queries
● set it to 2/3 of your available RAM
►default_statistics_target D
● raise to 200 to 1000 for large databases
● now defaults to 100
● setting statistics per column is better
35. Maintenance 3
1
►Autovacuum
● turn it on for any application which gets constant writes W O
● not so good for batch writes -- do manual vacuum for bulk
loads D
● make sure to include analyze
● have 100's or 1000's of tables?
multiple_autovacuum_workers
▬ but not more than ½ cores
►Vacuum delay
● 50-100ms
● Makes vacuum take much longer, but have little impact on
performance
37. Schema Design 1
4
►Table design
● do not optimize prematurely
▬ normalize your tables and wait for a proven issue to
denormalize
▬ Postgres is designed to perform well with normalized tables
● Entity-Attribute-Value tables and other innovative designs
tend to perform poorly
● think of when data needs to be updated, as well as read
▬ sometimes you need to split tables which will be updated at
different times
▬ don't trap yourself into updating the same rows multiple times
● BLOBs are slow
▬ have to be completely rewritten, compressed
38. Schema Design 1
4
►Indexing
● index most foreign keys
● index common WHERE criteria
● index common aggregated columns
● learn to use special index types: expressions, full text, partial
►Not Indexing
● indexes cost you on updates, deletes
▬ especially with HOT
● too many indexes can confuse the planner
● don't index: tiny tables, low-cardinality columns
39. Right indexes? 5
1
►pg_stat_user_indexes
● shows indexes not being used
● note that it doesn't record unique index usage
►pg_stat_user_tables
● shows seq scans: index candidates?
● shows heavy update/delete tables: index less
40. Partitioning 5
1
►Partition large or growing tables
● historical data
▬ data will be purged
▬ massive deletes are server-killers
● very large tables
▬ anything over 1GB / 10m rows
▬ partition by active/passive
►Application must be partition-compliant
● every query should call the partition key
● pre-create your partitions
▬ do not create them on demand … they will lock
41. Query design 1
4
►Do more with each query
● PostgreSQL does well with fewer larger queries
● not as well with many small queries
● avoid doing joins, tree-walking in middleware
►Do more with each transaction
● batch related writes into large transactions
►Know the query gotchas (per version)
● try swapping NOT IN and NOT EXISTS for bad queries
● avoid multiple outer joins before 8.2 if you can
● try to make sure that index/key types match
● avoid unanchored text searches "ILIKE '%josh%'"
42. But I use ORM! 1
4
►Object-Relational Management
!= high performance
● ORM is for ease of development
● make sure your ORM allows "tweaking" queries
● applications which are pushing the limits of performance
probably can't use ORM
▬ but most don't have a problem
43. It's All About Caching 1
4
►Use prepared queries W O
►Cache, cache everywhere W O
● plan caching: on the PostgreSQL server
● parse caching: in some drivers
● data caching:
▬ in the appserver
▬ in memcached
▬ in the client (javascript, etc.)
● use as many kinds of caching as you can
►think carefully about cache invalidation
● and avoid “cache storms”
44. Connection Management 1
4
►Connections take resources W O
● RAM, CPU
● transaction checking
►Make sure you're only using connections you
need
● look for “<IDLE>” and “<IDLE> in Transaction”
● log and check for a pattern of connection growth
▬ may indicate a “connecion leak”
● make sure that database and appserver timeouts are
synchronized
● if your app requires > 500 database connections, you need
better pooling
45. Pooling 1
4
►New connections are expensive W
● use persistent connections or connection pooling sofware
▬ appservers
▬ pgBouncer / pgPool
● set pool side to maximum connections needed
▬ establishing hundreds of new connections in a few seconds can
bring down your application
Webserver
Webserver Pool PostgreSQL
Webserver
47. Optimize Your Queries
5
1
in Test
►Before you go production
● simulate user load on the application
● monitor and fix slow queries
● look for worst procedures
►Look for “bad queries”
● queries which take too long
● data updates which never complete
● long-running stored procedures
● interfaces issuing too many queries
● queries which block
49. Finding bad queries 5
1
►Log Analysis
● dozens of logging options
● log_min_duration
● pgfouine
50. Fixing bad queries 5
1
►EXPLAIN ANALYZE
● things to look for:
▬ bad rowcount estimates
▬ sequential scans
▬ high-count loops
● reading explain analyze is an art
▬ it's an inverted tree
▬ look for the deepest level at which the problem occurs
● try re-writing complex queries several ways