Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 58a359e

Browse files
committed
Speedup tuple deformation with additional function inlining
This adjusts slot_deform_heap_tuple() to add special-case loops to eliminate much of the branching that was done within the body of the main deform loop. Previously, while looping over each attribute to deform, slot_deform_heap_tuple() would always recheck if the given attribute was NULL by looking at HeapTupleHasNulls() and if so, went on to check the tuple's NULL bitmap. Since many tuples won't contain any NULLs, we can just check HeapTupleHasNulls() once and when there are no NULLs, use a more compact version of the deforming loop which contains no NULL checking code at all. The same is possible for the "slow" mode checking part of the loop. That variable was checked several times for each attribute, once to determine if the offset to the attribute value could be taken from the attcacheoff, and again to check if the offset could be cached for next time. These "slow" checks can mostly be eliminated by instead having multiple loops. Initially, we can start in the non-slow loop and break out of that loop if and only if we must stop caching the offset. This eliminates branching for both slow and non-slow deforming methods. The amount of code required for the no nulls / non-slow version is very small. It's possible to have separate loops like this due to the fact that once we move into slow mode, we never need to switch back into non-slow mode for a given tuple. We have the compiler take care of writing out the multiple required loops by having a pg_attribute_always_inline function which gets called various times passing in constant values for the "slow" and "hasnulls" parameters. This allows the compiler to eliminate const-false branches and remove comparisons for const-true ones. This commit has shown overall query performance increases of around 5-20% in deform-heavy OLAP-type workloads. Author: David Rowley Reviewed-by: Victor Yegorov Discussion: https://postgr.es/m/CAGnEbog92Og2CpC2S8=g_HozGsWtt_3kRS1sXjLz0jKSoCNfLw@mail.gmail.com Discussion: https://postgr.es/m/CAApHDvo9e0XG71WrefYaRv5n4xNPLK4k8LjD0mSR3c9KR2vi2Q@mail.gmail.com
1 parent d85ce01 commit 58a359e

File tree

1 file changed

+154
-54
lines changed

1 file changed

+154
-54
lines changed

src/backend/executor/execTuples.c

+154-54
Original file line numberDiff line numberDiff line change
@@ -991,54 +991,40 @@ tts_buffer_heap_store_tuple(TupleTableSlot *slot, HeapTuple tuple,
991991
}
992992

993993
/*
994-
* slot_deform_heap_tuple
995-
* Given a TupleTableSlot, extract data from the slot's physical tuple
996-
* into its Datum/isnull arrays. Data is extracted up through the
997-
* natts'th column (caller must ensure this is a legal column number).
994+
* slot_deform_heap_tuple_internal
995+
* An always inline helper function for use in slot_deform_heap_tuple to
996+
* allow the compiler to emit specialized versions of this function for
997+
* various combinations of "slow" and "hasnulls". For example, if a
998+
* given tuple has no nulls, then we needn't check "hasnulls" for every
999+
* attribute that we're deforming. The caller can just call this
1000+
* function with hasnulls set to constant-false and have the compiler
1001+
* remove the constant-false branches and emit more optimal code.
9981002
*
999-
* This is essentially an incremental version of heap_deform_tuple:
1000-
* on each call we extract attributes up to the one needed, without
1001-
* re-computing information about previously extracted attributes.
1002-
* slot->tts_nvalid is the number of attributes already extracted.
1003+
* Returns the next attnum to deform, which can be equal to natts when the
1004+
* function manages to deform all requested attributes. *offp is an input and
1005+
* output parameter which is the byte offset within the tuple to start deforming
1006+
* from which, on return, gets set to the offset where the next attribute
1007+
* should be deformed from. *slowp is set to true when subsequent deforming
1008+
* of this tuple must use a version of this function with "slow" passed as
1009+
* true.
10031010
*
1004-
* This is marked as always inline, so the different offp for different types
1005-
* of slots gets optimized away.
1011+
* Callers cannot assume when we return "attnum" (i.e. all requested
1012+
* attributes have been deformed) that slow mode isn't required for any
1013+
* additional deforming as the final attribute may have caused a switch to
1014+
* slow mode.
10061015
*/
1007-
static pg_attribute_always_inline void
1008-
slot_deform_heap_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
1009-
int natts)
1016+
static pg_attribute_always_inline int
1017+
slot_deform_heap_tuple_internal(TupleTableSlot *slot, HeapTuple tuple,
1018+
int attnum, int natts, bool slow,
1019+
bool hasnulls, uint32 *offp, bool *slowp)
10101020
{
10111021
TupleDesc tupleDesc = slot->tts_tupleDescriptor;
10121022
Datum *values = slot->tts_values;
10131023
bool *isnull = slot->tts_isnull;
10141024
HeapTupleHeader tup = tuple->t_data;
1015-
bool hasnulls = HeapTupleHasNulls(tuple);
1016-
int attnum;
10171025
char *tp; /* ptr to tuple data */
1018-
uint32 off; /* offset in tuple data */
10191026
bits8 *bp = tup->t_bits; /* ptr to null bitmap in tuple */
1020-
bool slow; /* can we use/set attcacheoff? */
1021-
1022-
/* We can only fetch as many attributes as the tuple has. */
1023-
natts = Min(HeapTupleHeaderGetNatts(tuple->t_data), natts);
1024-
1025-
/*
1026-
* Check whether the first call for this tuple, and initialize or restore
1027-
* loop state.
1028-
*/
1029-
attnum = slot->tts_nvalid;
1030-
if (attnum == 0)
1031-
{
1032-
/* Start from the first attribute */
1033-
off = 0;
1034-
slow = false;
1035-
}
1036-
else
1037-
{
1038-
/* Restore state from previous execution */
1039-
off = *offp;
1040-
slow = TTS_SLOW(slot);
1041-
}
1027+
bool slownext = false;
10421028

10431029
tp = (char *) tup + tup->t_hoff;
10441030

@@ -1050,14 +1036,20 @@ slot_deform_heap_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
10501036
{
10511037
values[attnum] = (Datum) 0;
10521038
isnull[attnum] = true;
1053-
slow = true; /* can't use attcacheoff anymore */
1054-
continue;
1039+
if (!slow)
1040+
{
1041+
*slowp = true;
1042+
return attnum + 1;
1043+
}
1044+
else
1045+
continue;
10551046
}
10561047

10571048
isnull[attnum] = false;
10581049

1050+
/* calculate the offset of this attribute */
10591051
if (!slow && thisatt->attcacheoff >= 0)
1060-
off = thisatt->attcacheoff;
1052+
*offp = thisatt->attcacheoff;
10611053
else if (thisatt->attlen == -1)
10621054
{
10631055
/*
@@ -1066,31 +1058,140 @@ slot_deform_heap_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
10661058
* pad bytes in any case: then the offset will be valid for either
10671059
* an aligned or unaligned value.
10681060
*/
1069-
if (!slow &&
1070-
off == att_nominal_alignby(off, thisatt->attalignby))
1071-
thisatt->attcacheoff = off;
1061+
if (!slow && *offp == att_nominal_alignby(*offp, thisatt->attalignby))
1062+
thisatt->attcacheoff = *offp;
10721063
else
10731064
{
1074-
off = att_pointer_alignby(off, thisatt->attalignby, -1,
1075-
tp + off);
1076-
slow = true;
1065+
*offp = att_pointer_alignby(*offp,
1066+
thisatt->attalignby,
1067+
-1,
1068+
tp + *offp);
1069+
1070+
if (!slow)
1071+
slownext = true;
10771072
}
10781073
}
10791074
else
10801075
{
10811076
/* not varlena, so safe to use att_nominal_alignby */
1082-
off = att_nominal_alignby(off, thisatt->attalignby);
1077+
*offp = att_nominal_alignby(*offp, thisatt->attalignby);
10831078

10841079
if (!slow)
1085-
thisatt->attcacheoff = off;
1080+
thisatt->attcacheoff = *offp;
1081+
}
1082+
1083+
values[attnum] = fetchatt(thisatt, tp + *offp);
1084+
1085+
*offp = att_addlength_pointer(*offp, thisatt->attlen, tp + *offp);
1086+
1087+
/* check if we need to switch to slow mode */
1088+
if (!slow)
1089+
{
1090+
/*
1091+
* We're unable to deform any further if the above code set
1092+
* 'slownext', or if this isn't a fixed-width attribute.
1093+
*/
1094+
if (slownext || thisatt->attlen <= 0)
1095+
{
1096+
*slowp = true;
1097+
return attnum + 1;
1098+
}
10861099
}
1100+
}
10871101

1088-
values[attnum] = fetchatt(thisatt, tp + off);
1102+
return natts;
1103+
}
10891104

1090-
off = att_addlength_pointer(off, thisatt->attlen, tp + off);
1105+
/*
1106+
* slot_deform_heap_tuple
1107+
* Given a TupleTableSlot, extract data from the slot's physical tuple
1108+
* into its Datum/isnull arrays. Data is extracted up through the
1109+
* natts'th column (caller must ensure this is a legal column number).
1110+
*
1111+
* This is essentially an incremental version of heap_deform_tuple:
1112+
* on each call we extract attributes up to the one needed, without
1113+
* re-computing information about previously extracted attributes.
1114+
* slot->tts_nvalid is the number of attributes already extracted.
1115+
*
1116+
* This is marked as always inline, so the different offp for different types
1117+
* of slots gets optimized away.
1118+
*/
1119+
static pg_attribute_always_inline void
1120+
slot_deform_heap_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
1121+
int natts)
1122+
{
1123+
bool hasnulls = HeapTupleHasNulls(tuple);
1124+
int attnum;
1125+
uint32 off; /* offset in tuple data */
1126+
bool slow; /* can we use/set attcacheoff? */
1127+
1128+
/* We can only fetch as many attributes as the tuple has. */
1129+
natts = Min(HeapTupleHeaderGetNatts(tuple->t_data), natts);
10911130

1092-
if (thisatt->attlen <= 0)
1093-
slow = true; /* can't use attcacheoff anymore */
1131+
/*
1132+
* Check whether the first call for this tuple, and initialize or restore
1133+
* loop state.
1134+
*/
1135+
attnum = slot->tts_nvalid;
1136+
if (attnum == 0)
1137+
{
1138+
/* Start from the first attribute */
1139+
off = 0;
1140+
slow = false;
1141+
}
1142+
else
1143+
{
1144+
/* Restore state from previous execution */
1145+
off = *offp;
1146+
slow = TTS_SLOW(slot);
1147+
}
1148+
1149+
/*
1150+
* If 'slow' isn't set, try deforming using deforming code that does not
1151+
* contain any of the extra checks required for non-fixed offset
1152+
* deforming. During deforming, if or when we find a NULL or a variable
1153+
* length attribute, we'll switch to a deforming method which includes the
1154+
* extra code required for non-fixed offset deforming, a.k.a slow mode.
1155+
* Because this is performance critical, we inline
1156+
* slot_deform_heap_tuple_internal passing the 'slow' and 'hasnull'
1157+
* parameters as constants to allow the compiler to emit specialized code
1158+
* with the known-const false comparisons and subsequent branches removed.
1159+
*/
1160+
if (!slow)
1161+
{
1162+
/* Tuple without any NULLs? We can skip doing any NULL checking */
1163+
if (!hasnulls)
1164+
attnum = slot_deform_heap_tuple_internal(slot,
1165+
tuple,
1166+
attnum,
1167+
natts,
1168+
false, /* slow */
1169+
false, /* hasnulls */
1170+
&off,
1171+
&slow);
1172+
else
1173+
attnum = slot_deform_heap_tuple_internal(slot,
1174+
tuple,
1175+
attnum,
1176+
natts,
1177+
false, /* slow */
1178+
true, /* hasnulls */
1179+
&off,
1180+
&slow);
1181+
}
1182+
1183+
/* If there's still work to do then we must be in slow mode */
1184+
if (attnum < natts)
1185+
{
1186+
/* XXX is it worth adding a separate call when hasnulls is false? */
1187+
attnum = slot_deform_heap_tuple_internal(slot,
1188+
tuple,
1189+
attnum,
1190+
natts,
1191+
true, /* slow */
1192+
hasnulls,
1193+
&off,
1194+
&slow);
10941195
}
10951196

10961197
/*
@@ -1104,7 +1205,6 @@ slot_deform_heap_tuple(TupleTableSlot *slot, HeapTuple tuple, uint32 *offp,
11041205
slot->tts_flags &= ~TTS_FLAG_SLOW;
11051206
}
11061207

1107-
11081208
const TupleTableSlotOps TTSOpsVirtual = {
11091209
.base_slot_size = sizeof(VirtualTupleTableSlot),
11101210
.init = tts_virtual_init,

0 commit comments

Comments
 (0)