Tracking of page changes for backup purposes. PTRACK [POC]
От | Anastasia Lubennikova |
---|---|
Тема | Tracking of page changes for backup purposes. PTRACK [POC] |
Дата | |
Msg-id | 429c92fd-dd2d-54e0-a41d-3673a0726f57@postgrespro.ru обсуждение исходный текст |
Ответы |
Re: Tracking of page changes for backup purposes. PTRACK [POC]
Re: Tracking of page changes for backup purposes. PTRACK [POC] Re: Tracking of page changes for backup purposes. PTRACK [POC] Re: Tracking of page changes for backup purposes. PTRACK [POC] |
Список | pgsql-hackers |
In this thread I would like to raise the issue of incremental backups. What I suggest in this thread, is to choose one direction, so we can concentrate our community efforts. There is already a number of tools, which provide incremental backup. And we can see five principle techniques they use: 1. Use file modification time as a marker that the file has changed. 2. Compute file checksums and compare them. 3. LSN-based mechanisms. Backup pages with LSN >= last backup LSN. 4. Scan all WAL files in the archive since the previous backup and collect information about changed pages. 5. Track page changes on the fly. (ptrack) They can also be combined to achieve better performance. My personal candidate is the last one, since it provides page-level granularity, while most of the others approaches can only do file-level incremental backups or require additional reads or calculations. In a nutshell, using ptrack patch, PostgreSQL can track page changes on the fly. Each time a relation page is updated, this page is marked in a special PTRACK bitmap fork for this relation. As one page requires just one bit in the PTRACK fork, such bitmaps are quite small. Tracking implies some minor overhead on the database server operation but speeds up incremental backups significantly. Detailed overview of the implementation with all pros and cons, patches and links to the related threads you can find here: https://wiki.postgresql.org/index.php?title=PTRACK_incremental_backups. Patches for v 10.1 and v 9.6 are attached. Since ptrack is basically just an API for use in backup tools, it is impossible to test the patch independently. Now it is integrated with our backup utility, called pg_probackup. You can find it herehttps://github.com/postgrespro/pg_probackup Let me know if you find the documentation too long and complicated, I'll write a brief How-to for ptrack backups. Spoiler: Please consider this patch and README as a proof of concept. It can be improved in some ways, but in its current state PTRACK is a stable prototype, reviewed and tested well enough to find many non-trivial corner cases and subtle problems. And any discussion of change track algorithm must be aware of them. Feel free to share your concerns and point out any shortcomings of the idea or the implementation. -- Anastasia Lubennikova Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Вложения
В списке pgsql-hackers по дате отправления: