|
| 1 | +$PostgreSQL: pgsql/src/backend/access/transam/README,v 1.1 2004/08/01 20:57:59 tgl Exp $ |
| 2 | + |
| 3 | +The Transaction System |
| 4 | +---------------------- |
| 5 | + |
| 6 | +PostgreSQL's transaction system is a three-layer system. The bottom layer |
| 7 | +implements low-level transactions and subtransactions, on top of which rests |
| 8 | +the mainloop's control code, which in turn implements user-visible |
| 9 | +transactions and savepoints. |
| 10 | + |
| 11 | +The middle layer of code is called by postgres.c before and after the |
| 12 | +processing of each query: |
| 13 | + |
| 14 | + StartTransactionCommand |
| 15 | + CommitTransactionCommand |
| 16 | + AbortCurrentTransaction |
| 17 | + |
| 18 | +Meanwhile, the user can alter the system's state by issuing the SQL commands |
| 19 | +BEGIN, COMMIT, ROLLBACK, SAVEPOINT, ROLLBACK TO or RELEASE. The traffic cop |
| 20 | +redirects these calls to the toplevel routines |
| 21 | + |
| 22 | + BeginTransactionBlock |
| 23 | + EndTransactionBlock |
| 24 | + UserAbortTransactionBlock |
| 25 | + DefineSavepoint |
| 26 | + RollbackToSavepoint |
| 27 | + ReleaseSavepoint |
| 28 | + |
| 29 | +respectively. Depending on the current state of the system, these functions |
| 30 | +call low level functions to activate the real transaction system: |
| 31 | + |
| 32 | + StartTransaction |
| 33 | + CommitTransaction |
| 34 | + AbortTransaction |
| 35 | + CleanupTransaction |
| 36 | + StartSubTransaction |
| 37 | + CommitSubTransaction |
| 38 | + AbortSubTransaction |
| 39 | + CleanupSubTransaction |
| 40 | + |
| 41 | +Additionally, within a transaction, CommandCounterIncrement is called to |
| 42 | +increment the command counter, which allows future commands to "see" the |
| 43 | +effects of previous commands within the same transaction. Note that this is |
| 44 | +done automatically by CommitTransactionCommand after each query inside a |
| 45 | +transaction block, but some utility functions also do it internally to allow |
| 46 | +some operations (usually in the system catalogs) to be seen by future |
| 47 | +operations in the same utility command (for example, in DefineRelation it is |
| 48 | +done after creating the heap so the pg_class row is visible, to be able to |
| 49 | +lock it). |
| 50 | + |
| 51 | + |
| 52 | +For example, consider the following sequence of user commands: |
| 53 | + |
| 54 | +1) BEGIN |
| 55 | +2) SELECT * FROM foo |
| 56 | +3) INSERT INTO foo VALUES (...) |
| 57 | +4) COMMIT |
| 58 | + |
| 59 | +In the main processing loop, this results in the following function call |
| 60 | +sequence: |
| 61 | + |
| 62 | + / StartTransactionCommand; |
| 63 | + / ProcessUtility; << BEGIN |
| 64 | +1) < BeginTransactionBlock; |
| 65 | + \ CommitTransactionCommand; |
| 66 | + \ StartTransaction; |
| 67 | + |
| 68 | + / StartTransactionCommand; |
| 69 | +2) / ProcessQuery; << SELECT * FROM foo |
| 70 | + \ CommitTransactionCommand; |
| 71 | + \ CommandCounterIncrement; |
| 72 | + |
| 73 | + / StartTransactionCommand; |
| 74 | +3) / ProcessQuery; << INSERT INTO foo VALUES (...) |
| 75 | + \ CommitTransactionCommand; |
| 76 | + \ CommandCounterIncrement; |
| 77 | + |
| 78 | + / StartTransactionCommand; |
| 79 | + / ProcessUtility; << COMMIT |
| 80 | +4) < EndTransactionBlock; |
| 81 | + \ CommitTransaction; |
| 82 | + \ CommitTransactionCommand; |
| 83 | + |
| 84 | +The point of this example is to demonstrate the need for |
| 85 | +StartTransactionCommand and CommitTransactionCommand to be state smart -- they |
| 86 | +should call CommandCounterIncrement between the calls to BeginTransactionBlock |
| 87 | +and EndTransactionBlock and outside these calls they need to do normal start, |
| 88 | +commit or abort processing. |
| 89 | + |
| 90 | +Furthermore, suppose the "SELECT * FROM foo" caused an abort condition. In |
| 91 | +this case AbortCurrentTransaction is called, and the transaction is put in |
| 92 | +aborted state. In this state, any user input is ignored except for |
| 93 | +transaction-termination statements, or ROLLBACK TO <savepoint> commands. |
| 94 | + |
| 95 | +Transaction aborts can occur in two ways: |
| 96 | + |
| 97 | +1) system dies from some internal cause (syntax error, etc) |
| 98 | +2) user types ROLLBACK |
| 99 | + |
| 100 | +The reason we have to distinguish them is illustrated by the following two |
| 101 | +situations: |
| 102 | + |
| 103 | + case 1 case 2 |
| 104 | + ------ ------ |
| 105 | +1) user types BEGIN 1) user types BEGIN |
| 106 | +2) user does something 2) user does something |
| 107 | +3) user does not like what 3) system aborts for some reason |
| 108 | + she sees and types ABORT (syntax error, etc) |
| 109 | + |
| 110 | +In case 1, we want to abort the transaction and return to the default state. |
| 111 | +In case 2, there may be more commands coming our way which are part of the |
| 112 | +same transaction block; we have to ignore these commands until we see a COMMIT |
| 113 | +or ROLLBACK. |
| 114 | + |
| 115 | +Internal aborts are handled by AbortCurrentTransaction, while user aborts are |
| 116 | +handled by UserAbortTransactionBlock. Both of them rely on AbortTransaction |
| 117 | +to do all the real work. The only difference is what state we enter after |
| 118 | +AbortTransaction does its work: |
| 119 | + |
| 120 | +* AbortCurrentTransaction leaves us in TBLOCK_ABORT, |
| 121 | +* UserAbortTransactionBlock leaves us in TBLOCK_ENDABORT |
| 122 | + |
| 123 | +Low-level transaction abort handling is divided in two phases: |
| 124 | +* AbortTransaction executes as soon as we realize the transaction has |
| 125 | + failed. It should release all shared resources (locks etc) so that we do |
| 126 | + not delay other backends unnecessarily. |
| 127 | +* CleanupTransaction executes when we finally see a user COMMIT |
| 128 | + or ROLLBACK command; it cleans things up and gets us out of the transaction |
| 129 | + internally. In particular, we mustn't destroy TopTransactionContext until |
| 130 | + this point. |
| 131 | + |
| 132 | +Also, note that when a transaction is committed, we don't close it right away. |
| 133 | +Rather it's put in TBLOCK_END state, which means that when |
| 134 | +CommitTransactionCommand is called after the query has finished processing, |
| 135 | +the transaction has to be closed. The distinction is subtle but important, |
| 136 | +because it means that control will leave the xact.c code with the transaction |
| 137 | +open, and the main loop will be able to keep processing inside the same |
| 138 | +transaction. So, in a sense, transaction commit is also handled in two |
| 139 | +phases, the first at EndTransactionBlock and the second at |
| 140 | +CommitTransactionCommand (which is where CommitTransaction is actually |
| 141 | +called). |
| 142 | + |
| 143 | +The rest of the code in xact.c are routines to support the creation and |
| 144 | +finishing of transactions and subtransactions. For example, AtStart_Memory |
| 145 | +takes care of initializing the memory subsystem at main transaction start. |
| 146 | + |
| 147 | + |
| 148 | +Subtransaction handling |
| 149 | +----------------------- |
| 150 | + |
| 151 | +Subtransactions are implemented using a stack of TransactionState structures, |
| 152 | +each of which has a pointer to its parent transaction's struct. When a new |
| 153 | +subtransaction is to be opened, PushTransaction is called, which creates a new |
| 154 | +TransactionState, with its parent link pointing to the current transaction. |
| 155 | +StartSubTransaction is in charge of initializing the new TransactionState to |
| 156 | +sane values, and properly initializing other subsystems (AtSubStart routines). |
| 157 | + |
| 158 | +When closing a subtransaction, either CommitSubTransaction has to be called |
| 159 | +(if the subtransaction is committing), or AbortSubTransaction and |
| 160 | +CleanupSubTransaction (if it's aborting). In either case, PopTransaction is |
| 161 | +called so the system returns to the parent transaction. |
| 162 | + |
| 163 | +One important point regarding subtransaction handling is that several may need |
| 164 | +to be closed in response to a single user command. That's because savepoints |
| 165 | +have names, and we allow to commit or rollback a savepoint by name, which is |
| 166 | +not necessarily the one that was last opened. In the case of subtransaction |
| 167 | +commit this is not a problem, and we close all the involved subtransactions |
| 168 | +right away by calling CommitTransactionToLevel, which in turn calls |
| 169 | +CommitSubTransaction and PopTransaction as many times as needed. |
| 170 | + |
| 171 | +In the case of subtransaction abort (when the user issues ROLLBACK TO |
| 172 | +<savepoint>), things are not so easy. We have to keep the subtransactions |
| 173 | +open and return control to the main loop. So what RollbackToSavepoint does is |
| 174 | +abort the innermost subtransaction and put it in TBLOCK_SUBENDABORT state, and |
| 175 | +put the rest in TBLOCK_SUBABORT_PENDING state. Then we return control to the |
| 176 | +main loop, which will in turn return control to us by calling |
| 177 | +CommitTransactionCommand. At this point we can close all subtransactions that |
| 178 | +are marked with the "abort pending" state. When that's done, the outermost |
| 179 | +subtransaction is created again, to conform to SQL's definition of ROLLBACK TO. |
| 180 | + |
| 181 | +Other subsystems are allowed to start "internal" subtransactions, which are |
| 182 | +handled by BeginInternalSubtransaction. This is to allow implementing |
| 183 | +exception handling, e.g. in PL/pgSQL. ReleaseCurrentSubTransaction and |
| 184 | +RollbackAndReleaseCurrentSubTransaction allows the subsystem to close said |
| 185 | +subtransactions. The main difference between this and the savepoint/release |
| 186 | +path is that BeginInternalSubtransaction is allowed when no explicit |
| 187 | +transaction block has been established, while DefineSavepoint is not. |
| 188 | + |
| 189 | + |
| 190 | +pg_clog and pg_subtrans |
| 191 | +----------------------- |
| 192 | + |
| 193 | +pg_clog and pg_subtrans are permanent (on-disk) storage of transaction related |
| 194 | +information. There is a limited number of pages of each kept in memory, so |
| 195 | +in many cases there is no need to actually read from disk. However, if |
| 196 | +there's a long running transaction or a backend sitting idle with an open |
| 197 | +transaction, it may be necessary to be able to read and write this information |
| 198 | +from disk. They also allow information to be permanent across server restarts. |
| 199 | + |
| 200 | +pg_clog records the commit status for each transaction. A transaction can be |
| 201 | +in progress, committed, aborted, or "sub-committed". This last state means |
| 202 | +that it's a subtransaction that's no longer running, but its parent has not |
| 203 | +updated its state yet (either it is still running, or the backend crashed |
| 204 | +without updating its status). A sub-committed transaction's status will be |
| 205 | +updated again to the final value as soon as the parent commits or aborts, or |
| 206 | +when the parent is detected to be aborted. |
| 207 | + |
| 208 | +Savepoints are implemented using subtransactions. A subtransaction is a |
| 209 | +transaction inside a transaction; it gets its own TransactionId, but its |
| 210 | +commit or abort status is not only dependent on whether it committed itself, |
| 211 | +but also whether its parent transaction committed. To implement multiple |
| 212 | +savepoints in a transaction we allow unlimited transaction nesting depth, so |
| 213 | +any particular subtransaction's commit state is dependent on the commit status |
| 214 | +of each and every ancestor transaction. |
| 215 | + |
| 216 | +The "subtransaction parent" (pg_subtrans) mechanism records, for each |
| 217 | +transaction, the TransactionId of its parent transaction. This information is |
| 218 | +stored as soon as the subtransaction is created. Top-level transactions do |
| 219 | +not have a parent, so they leave their pg_subtrans entries set to the default |
| 220 | +value of zero (InvalidTransactionId). |
| 221 | + |
| 222 | +pg_subtrans is used to check whether the transaction in question is still |
| 223 | +running --- the main Xid of a transaction is recorded in the PGPROC struct, |
| 224 | +but since we allow arbitrary nesting of subtransactions, we can't fit all Xids |
| 225 | +in shared memory, so we have to store them on disk. Note, however, that for |
| 226 | +each transaction we keep a "cache" of Xids that are known to be part of the |
| 227 | +transaction tree, so we can skip looking at pg_subtrans unless we know the |
| 228 | +cache has been overflowed. See storage/ipc/sinval.c for the gory details. |
| 229 | + |
| 230 | +slru.c is the supporting mechanism for both pg_clog and pg_subtrans. It |
| 231 | +implements the LRU policy for in-memory buffer pages. The high-level routines |
| 232 | +for pg_clog are implemented in transam.c, while the low-level functions are in |
| 233 | +clog.c. pg_subtrans is contained completely in subtrans.c. |
0 commit comments