|
1 |
| -$PostgreSQL: pgsql/src/backend/executor/README,v 1.10 2009/10/12 18:10:41 tgl Exp $ |
| 1 | +$PostgreSQL: pgsql/src/backend/executor/README,v 1.11 2009/10/26 02:26:29 tgl Exp $ |
2 | 2 |
|
3 | 3 | The Postgres Executor
|
4 | 4 | =====================
|
@@ -160,41 +160,38 @@ modified tuple. SELECT FOR UPDATE/SHARE behaves similarly, except that its
|
160 | 160 | action is just to lock the modified tuple and return results based on that
|
161 | 161 | version of the tuple.
|
162 | 162 |
|
163 |
| -To implement this checking, we actually re-run the entire query from scratch |
164 |
| -for each modified tuple, but with the scan node that sourced the original |
165 |
| -tuple set to return only the modified tuple, not the original tuple or any |
166 |
| -of the rest of the relation. If this query returns a tuple, then the |
167 |
| -modified tuple passes the quals (and the query output is the suitably |
168 |
| -modified update tuple, if we're doing UPDATE). If no tuple is returned, |
169 |
| -then the modified tuple fails the quals, so we ignore it and continue the |
170 |
| -original query. (This is reasonably efficient for simple queries, but may |
171 |
| -be horribly slow for joins. A better design would be nice; one thought for |
172 |
| -future investigation is to treat the tuple substitution like a parameter, |
173 |
| -so that we can avoid rescanning unrelated nodes.) |
174 |
| - |
175 |
| -Note a fundamental bogosity of this approach: if the relation containing |
176 |
| -the original tuple is being used in a self-join, the other instance(s) of |
177 |
| -the relation will be treated as still containing the original tuple, whereas |
178 |
| -logical consistency would demand that the modified tuple appear in them too. |
179 |
| -But we'd have to actually substitute the modified tuple for the original, |
180 |
| -while still returning all the rest of the relation, to ensure consistent |
181 |
| -answers. Implementing this correctly is a task for future work. |
182 |
| - |
183 |
| -In UPDATE/DELETE, only the target relation needs to be handled this way, |
184 |
| -so only one special recheck query needs to execute at a time. In SELECT FOR |
185 |
| -UPDATE, there may be multiple relations flagged FOR UPDATE, so it's possible |
186 |
| -that while we are executing a recheck query for one modified tuple, we will |
187 |
| -hit another modified tuple in another relation. In this case we "stack up" |
188 |
| -recheck queries: a sub-recheck query is spawned in which both the first and |
189 |
| -second modified tuples will be returned as the only components of their |
190 |
| -relations. (In event of success, all these modified tuples will be locked.) |
191 |
| -Again, this isn't necessarily quite the right thing ... but in simple cases |
192 |
| -it works. Potentially, recheck queries could get nested to the depth of the |
193 |
| -number of FOR UPDATE/SHARE relations in the query. |
194 |
| - |
195 |
| -It should be noted also that UPDATE/DELETE expect at most one tuple to |
196 |
| -result from the modified query, whereas in the FOR UPDATE case it's possible |
197 |
| -for multiple tuples to result (since we could be dealing with a join in |
198 |
| -which multiple tuples join to the modified tuple). We want FOR UPDATE to |
199 |
| -lock all relevant tuples, so we process all tuples output by all the stacked |
200 |
| -recheck queries. |
| 163 | +To implement this checking, we actually re-run the query from scratch for |
| 164 | +each modified tuple (or set of tuples, for SELECT FOR UPDATE), with the |
| 165 | +relation scan nodes tweaked to return only the current tuples --- either |
| 166 | +the original ones, or the updated (and now locked) versions of the modified |
| 167 | +tuple(s). If this query returns a tuple, then the modified tuple(s) pass |
| 168 | +the quals (and the query output is the suitably modified update tuple, if |
| 169 | +we're doing UPDATE). If no tuple is returned, then the modified tuple(s) |
| 170 | +fail the quals, so we ignore the current result tuple and continue the |
| 171 | +original query. |
| 172 | + |
| 173 | +In UPDATE/DELETE, only the target relation needs to be handled this way. |
| 174 | +In SELECT FOR UPDATE, there may be multiple relations flagged FOR UPDATE, |
| 175 | +so we obtain lock on the current tuple version in each such relation before |
| 176 | +executing the recheck. |
| 177 | + |
| 178 | +It is also possible that there are relations in the query that are not |
| 179 | +to be locked (they are neither the UPDATE/DELETE target nor specified to |
| 180 | +be locked in SELECT FOR UPDATE/SHARE). When re-running the test query |
| 181 | +we want to use the same rows from these relations that were joined to |
| 182 | +the locked rows. For ordinary relations this can be implemented relatively |
| 183 | +cheaply by including the row TID in the join outputs and re-fetching that |
| 184 | +TID. (The re-fetch is expensive, but we're trying to optimize the normal |
| 185 | +case where no re-test is needed.) We have also to consider non-table |
| 186 | +relations, such as a ValuesScan or FunctionScan. For these, since there |
| 187 | +is no equivalent of TID, the only practical solution seems to be to include |
| 188 | +the entire row value in the join output row. |
| 189 | + |
| 190 | +We disallow set-returning functions in the targetlist of SELECT FOR UPDATE, |
| 191 | +so as to ensure that at most one tuple can be returned for any particular |
| 192 | +set of scan tuples. Otherwise we'd get duplicates due to the original |
| 193 | +query returning the same set of scan tuples multiple times. (Note: there |
| 194 | +is no explicit prohibition on SRFs in UPDATE, but the net effect will be |
| 195 | +that only the first result row of an SRF counts, because all subsequent |
| 196 | +rows will result in attempts to re-update an already updated target row. |
| 197 | +This is historical behavior and seems not worth changing.) |
0 commit comments