Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 6dd86c2

Browse files
Fix nbtsort.c's page space accounting.
Commit dd299df, which made heap TID a tiebreaker nbtree index column, introduced new rules on page space management to make suffix truncation safe. In general, suffix truncation needs to have a small amount of extra space available on the new left page when splitting a leaf page. This is needed in case it turns out that truncation cannot even "truncate away the heap TID column", resulting in a larger-than-firstright leaf high key with an explicit heap TID representation. Despite all this, CREATE INDEX/nbtsort.c did not account for the possible need for extra heap TID space on leaf pages when deciding whether or not a new item could fit on current page. This could lead to "failed to add item to the index page" errors when CREATE INDEX/nbtsort.c tried to finish off a leaf page that lacked space for a larger-than-firstright leaf high key (it only had space for firstright tuple, which was just short of what was needed following "truncation"). Several conditions needed to be met all at once for CREATE INDEX to fail. The problem was in the hard limit on what will fit on a page, which tends to be masked by the soft fillfactor-wise limit. The easiest way to recreate the problem seems to be a CREATE INDEX on a low cardinality text column, with tuples that are of non-uniform width, using a fillfactor of 100. To fix, bring nbtsort.c in line with nbtsplitloc.c, which already pessimistically assumes that all leaf page splits will have high keys that have a heap TID appended. Reported-By: Andreas Joseph Krogh Discussion: https://postgr.es/m/VisenaEmail.c5.3ee7fe277d514162.16a6d785bea@tc7-visena
1 parent dd69597 commit 6dd86c2

File tree

1 file changed

+42
-18
lines changed

1 file changed

+42
-18
lines changed

src/backend/access/nbtree/nbtsort.c

+42-18
Original file line numberDiff line numberDiff line change
@@ -841,6 +841,7 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
841841
OffsetNumber last_off;
842842
Size pgspc;
843843
Size itupsz;
844+
bool isleaf;
844845

845846
/*
846847
* This is a handy place to check for cancel interrupts during the btree
@@ -855,9 +856,12 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
855856
pgspc = PageGetFreeSpace(npage);
856857
itupsz = IndexTupleSize(itup);
857858
itupsz = MAXALIGN(itupsz);
859+
/* Leaf case has slightly different rules due to suffix truncation */
860+
isleaf = (state->btps_level == 0);
858861

859862
/*
860-
* Check whether the item can fit on a btree page at all.
863+
* Check whether the new item can fit on a btree page on current level at
864+
* all.
861865
*
862866
* Every newly built index will treat heap TID as part of the keyspace,
863867
* which imposes the requirement that new high keys must occasionally have
@@ -870,16 +874,29 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
870874
* the reserved space. This should never fail on internal pages.
871875
*/
872876
if (unlikely(itupsz > BTMaxItemSize(npage)))
873-
_bt_check_third_page(wstate->index, wstate->heap,
874-
state->btps_level == 0, npage, itup);
877+
_bt_check_third_page(wstate->index, wstate->heap, isleaf, npage,
878+
itup);
875879

876880
/*
877-
* Check to see if page is "full". It's definitely full if the item won't
878-
* fit. Otherwise, compare to the target freespace derived from the
879-
* fillfactor. However, we must put at least two items on each page, so
880-
* disregard fillfactor if we don't have that many.
881+
* Check to see if current page will fit new item, with space left over to
882+
* append a heap TID during suffix truncation when page is a leaf page.
883+
*
884+
* It is guaranteed that we can fit at least 2 non-pivot tuples plus a
885+
* high key with heap TID when finishing off a leaf page, since we rely on
886+
* _bt_check_third_page() rejecting oversized non-pivot tuples. On
887+
* internal pages we can always fit 3 pivot tuples with larger internal
888+
* page tuple limit (includes page high key).
889+
*
890+
* Most of the time, a page is only "full" in the sense that the soft
891+
* fillfactor-wise limit has been exceeded. However, we must always leave
892+
* at least two items plus a high key on each page before starting a new
893+
* page. Disregard fillfactor and insert on "full" current page if we
894+
* don't have the minimum number of items yet. (Note that we deliberately
895+
* assume that suffix truncation neither enlarges nor shrinks new high key
896+
* when applying soft limit.)
881897
*/
882-
if (pgspc < itupsz || (pgspc < state->btps_full && last_off > P_FIRSTKEY))
898+
if (pgspc < itupsz + (isleaf ? MAXALIGN(sizeof(ItemPointerData)) : 0) ||
899+
(pgspc < state->btps_full && last_off > P_FIRSTKEY))
883900
{
884901
/*
885902
* Finish off the page and write it out.
@@ -889,7 +906,6 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
889906
ItemId ii;
890907
ItemId hii;
891908
IndexTuple oitup;
892-
BTPageOpaque opageop = (BTPageOpaque) PageGetSpecialPointer(opage);
893909

894910
/* Create new page of same level */
895911
npage = _bt_blnewpage(state->btps_level);
@@ -910,14 +926,20 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
910926
_bt_sortaddtup(npage, ItemIdGetLength(ii), oitup, P_FIRSTKEY);
911927

912928
/*
913-
* Move 'last' into the high key position on opage
929+
* Move 'last' into the high key position on opage. _bt_blnewpage()
930+
* allocated empty space for a line pointer when opage was first
931+
* created, so this is a matter of rearranging already-allocated space
932+
* on page, and initializing high key line pointer. (Actually, leaf
933+
* pages must also swap oitup with a truncated version of oitup, which
934+
* is sometimes larger than oitup, though never by more than the space
935+
* needed to append a heap TID.)
914936
*/
915937
hii = PageGetItemId(opage, P_HIKEY);
916938
*hii = *ii;
917939
ItemIdSetUnused(ii); /* redundant */
918940
((PageHeader) opage)->pd_lower -= sizeof(ItemIdData);
919941

920-
if (P_ISLEAF(opageop))
942+
if (isleaf)
921943
{
922944
IndexTuple lastleft;
923945
IndexTuple truncated;
@@ -943,15 +965,13 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
943965
* tuple, it cannot just be copied in place (besides, we want
944966
* to actually save space on the leaf page). We delete the
945967
* original high key, and add our own truncated high key at the
946-
* same offset. It's okay if the truncated tuple is slightly
947-
* larger due to containing a heap TID value, since this case is
948-
* known to _bt_check_third_page(), which reserves space.
968+
* same offset.
949969
*
950970
* Note that the page layout won't be changed very much. oitup is
951971
* already located at the physical beginning of tuple space, so we
952972
* only shift the line pointer array back and forth, and overwrite
953-
* the latter portion of the space occupied by the original tuple.
954-
* This is fairly cheap.
973+
* the tuple space previously occupied by oitup. This is fairly
974+
* cheap.
955975
*/
956976
ii = PageGetItemId(opage, OffsetNumberPrev(last_off));
957977
lastleft = (IndexTuple) PageGetItem(opage, ii);
@@ -979,9 +999,9 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
979999
Assert((BTreeTupleGetNAtts(state->btps_minkey, wstate->index) <=
9801000
IndexRelationGetNumberOfKeyAttributes(wstate->index) &&
9811001
BTreeTupleGetNAtts(state->btps_minkey, wstate->index) > 0) ||
982-
P_LEFTMOST(opageop));
1002+
P_LEFTMOST((BTPageOpaque) PageGetSpecialPointer(opage)));
9831003
Assert(BTreeTupleGetNAtts(state->btps_minkey, wstate->index) == 0 ||
984-
!P_LEFTMOST(opageop));
1004+
!P_LEFTMOST((BTPageOpaque) PageGetSpecialPointer(opage)));
9851005
BTreeInnerTupleSetDownLink(state->btps_minkey, oblkno);
9861006
_bt_buildadd(wstate, state->btps_next, state->btps_minkey);
9871007
pfree(state->btps_minkey);
@@ -1018,6 +1038,10 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
10181038
}
10191039

10201040
/*
1041+
* By here, either original page is still the current page, or a new page
1042+
* was created that became the current page. Either way, the current page
1043+
* definitely has space for new item.
1044+
*
10211045
* If the new item is the first for its page, stash a copy for later. Note
10221046
* this will only happen for the first item on a level; on later pages,
10231047
* the first item for a page is copied from the prior page in the code

0 commit comments

Comments
 (0)