Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 8870917

Browse files
committed
Apply auto-vectorization to the inner loop of numeric multiplication.
Compile numeric.c with -ftree-vectorize where available, and adjust the innermost loop of mul_var() so that it is amenable to being auto-vectorized. (Mainly, that involves making it process the arrays left-to-right not right-to-left.) Applying -ftree-vectorize actually makes numeric.o smaller, at least with my compiler (gcc 8.3.1 on x86_64), and it's a little faster too. Independently of that, fixing the inner loop to be vectorizable also makes things a bit faster. But doing both is a huge win for multiplications with lots of digits. For me, the numeric regression test is the same speed to within measurement noise, but numeric_big is a full 45% faster. We also looked into applying -funroll-loops, but that makes numeric.o bloat quite a bit, and the additional speed improvement is very marginal. Amit Khandekar, reviewed and edited a little by me Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
1 parent 695de5d commit 8870917

File tree

2 files changed

+15
-3
lines changed

2 files changed

+15
-3
lines changed

src/backend/utils/adt/Makefile

+3
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,9 @@ clean distclean maintainer-clean:
125125

126126
like.o: like.c like_match.c
127127

128+
# Some code in numeric.c benefits from auto-vectorization
129+
numeric.o: CFLAGS += ${CFLAGS_VECTORIZE}
130+
128131
varlena.o: varlena.c levenshtein.c
129132

130133
include $(top_srcdir)/src/backend/common.mk

src/backend/utils/adt/numeric.c

+12-3
Original file line numberDiff line numberDiff line change
@@ -8191,6 +8191,7 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
81918191
int res_weight;
81928192
int maxdigits;
81938193
int *dig;
8194+
int *dig_i1_2;
81948195
int carry;
81958196
int maxdig;
81968197
int newdig;
@@ -8327,10 +8328,18 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result,
83278328
*
83288329
* As above, digits of var2 can be ignored if they don't contribute,
83298330
* so we only include digits for which i1+i2+2 <= res_ndigits - 1.
8331+
*
8332+
* This inner loop is the performance bottleneck for multiplication,
8333+
* so we want to keep it simple enough so that it can be
8334+
* auto-vectorized. Accordingly, process the digits left-to-right
8335+
* even though schoolbook multiplication would suggest right-to-left.
8336+
* Since we aren't propagating carries in this loop, the order does
8337+
* not matter.
83308338
*/
8331-
for (i2 = Min(var2ndigits - 1, res_ndigits - i1 - 3), i = i1 + i2 + 2;
8332-
i2 >= 0; i2--)
8333-
dig[i--] += var1digit * var2digits[i2];
8339+
i = Min(var2ndigits - 1, res_ndigits - i1 - 3);
8340+
dig_i1_2 = &dig[i1 + 2];
8341+
for (i2 = 0; i2 <= i; i2++)
8342+
dig_i1_2[i2] += var1digit * var2digits[i2];
83348343
}
83358344

83368345
/*

0 commit comments

Comments
 (0)