You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
good, you'll have to ram them down people's throats." -- Howard Aiken
342
342
343
343
344
+
From owner-pgsql-hackers@hub.org Tue Oct 19 10:31:10 1999
345
+
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
346
+
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA29087
347
+
for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:31:08 -0400 (EDT)
348
+
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id KAA27535 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:19:47 -0400 (EDT)
349
+
Received: from localhost (majordom@localhost)
350
+
by hub.org (8.9.3/8.9.3) with SMTP id KAA30328;
351
+
Tue, 19 Oct 1999 10:12:10 -0400 (EDT)
352
+
(envelope-from owner-pgsql-hackers)
353
+
Received: by hub.org (bulk_mailer v1.5); Tue, 19 Oct 1999 10:11:55 -0400
354
+
Received: (from majordom@localhost)
355
+
by hub.org (8.9.3/8.9.3) id KAA30030
356
+
for pgsql-hackers-outgoing; Tue, 19 Oct 1999 10:11:00 -0400 (EDT)
> 2. private cache holds uncommitted system tuples.
379
+
> 3. relpages of shared cache are updated immediately by
380
+
> phisical change and corresponding buffer pages are
381
+
> marked dirty.
382
+
> 4. on commit, the contents of uncommitted tuples except
383
+
> relpages,reltuples,... are copied to correponding tuples
384
+
> in shared cache and the combined contents are
385
+
> committed.
386
+
> If so,catalog cache invalidation would be no longer needed.
387
+
> But synchronization of the step 4. may be difficult.
388
+
389
+
I think the main problem is that relpages and reltuples shouldn't
390
+
be kept in pg_class columns at all, because they need to have
391
+
very different update behavior from the other pg_class columns.
392
+
393
+
The rest of pg_class is update-on-commit, and we can lock down any one
394
+
row in the normal MVCC way (if transaction A has modified a row and
395
+
transaction B also wants to modify it, B waits for A to commit or abort,
396
+
so it can know which version of the row to start from). Furthermore,
397
+
there can legitimately be several different values of a row in use in
398
+
different places: the latest committed, an uncommitted modification, and
399
+
one or more old values that are still being used by active transactions
400
+
because they were current when those transactions started. (BTW, the
401
+
present relcache is pretty bad about maintaining pure MVCC transaction
402
+
semantics like this, but it seems clear to me that that's the direction
403
+
we want to go in.)
404
+
405
+
relpages cannot operate this way. To be useful for avoiding lseeks,
406
+
relpages *must* change exactly when the physical file changes. It
407
+
matters not at all whether the particular transaction that extended the
408
+
file ultimately commits or not. Moreover there can be only one correct
409
+
value (per relation) across the whole system, because there is only one
410
+
length of the relation file.
411
+
412
+
If we want to take reltuples seriously and try to maintain it
413
+
on-the-fly, then I think it needs still a third behavior. Clearly
414
+
it cannot be updated using MVCC rules, or we lose all writer
415
+
concurrency (if A has added tuples to a rel, B would have to wait
416
+
for A to commit before it could update reltuples...). Furthermore
417
+
"updating" isn't a simple matter of storing what you think the new
418
+
value is; otherwise two transactions adding tuples in parallel would
419
+
leave the wrong answer after B commits and overwrites A's value.
420
+
I think it would work for each transaction to keep track of a net delta
421
+
in reltuples for each table it's changed (total tuples added less total
422
+
tuples deleted), and then atomically add that value to the table's
423
+
shared reltuples counter during commit. But that still leaves the
424
+
problem of how you use the counter during a transaction to get an
425
+
accurate answer to the question "If I scan this table now, how many tuples
426
+
will I see?" At the time the question is asked, the current shared
427
+
counter value might include the effects of transactions that have
428
+
committed since your transaction started, and therefore are not visible
429
+
under MVCC rules. I think getting the correct answer would involve
430
+
making an instantaneous copy of the current counter at the start of
431
+
your xact, and then adding your own private net-uncommitted-delta to
432
+
the saved shared counter value when asked the question. This doesn't
433
+
look real practical --- you'd have to save the reltuples counts of
434
+
*all* tables in the database at the start of each xact, on the off
435
+
chance that you might need them. Ugh. Perhaps someone has a better
436
+
idea. In any case, reltuples clearly needs different mechanisms than
437
+
the ordinary fields in pg_class do, because updating it will be a
438
+
performance bottleneck otherwise.
439
+
440
+
If we allow reltuples to be updated only by vacuum-like events, as
441
+
it is now, then I think keeping it in pg_class is still OK.
442
+
443
+
In short, it seems clear to me that relpages should be removed from
444
+
pg_class and kept somewhere else if we want to make it more reliable
445
+
than it is now, and the same for reltuples (but reltuples doesn't
446
+
behave the same as relpages, and probably ought to be handled
447
+
differently).
448
+
449
+
regards, tom lane
450
+
451
+
************
452
+
453
+
From owner-pgsql-hackers@hub.org Tue Oct 19 21:25:30 1999
454
+
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
455
+
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA28130
456
+
for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:25:26 -0400 (EDT)
457
+
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id VAA10512 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:15:28 -0400 (EDT)
458
+
Received: from localhost (majordom@localhost)
459
+
by hub.org (8.9.3/8.9.3) with SMTP id VAA50745;
460
+
Tue, 19 Oct 1999 21:07:23 -0400 (EDT)
461
+
(envelope-from owner-pgsql-hackers)
462
+
Received: by hub.org (bulk_mailer v1.5); Tue, 19 Oct 1999 21:07:01 -0400
463
+
Received: (from majordom@localhost)
464
+
by hub.org (8.9.3/8.9.3) id VAA50644
465
+
for pgsql-hackers-outgoing; Tue, 19 Oct 1999 21:06:06 -0400 (EDT)
0 commit comments