Re: POC: GROUP BY optimization
От | Andrey Lepikhov |
---|---|
Тема | Re: POC: GROUP BY optimization |
Дата | |
Msg-id | e3602ccb-e643-2e79-ed2c-1175a80533a1@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Список | pgsql-hackers |
On 20/7/2023 18:46, Tomas Vondra wrote: > On 7/20/23 08:37, Andrey Lepikhov wrote: >> On 3/10/2022 21:56, Tom Lane wrote: >>> Revert "Optimize order of GROUP BY keys". >>> >>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and >>> several follow-on fixes. >>> ... >>> Since we're hard up against the release deadline for v15, let's >>> revert these changes for now. We can always try again later. >> >> It may be time to restart the project. As a first step, I rebased the >> patch on the current master. It wasn't trivial because of some latest >> optimizations (a29eab, 1349d27 and 8d83a5d). >> Now, Let's repeat the review and rewrite the current path according to >> the reasons uttered in the revert commit. > 1) procost = 1.0 - I guess we could make this more realistic by doing > some microbenchmarks and tuning the costs for the most expensive cases. Ok, some thoughts on this part of the task. As I see, we have not so many different operators: 26 with fixed width and 16 with variable width: SELECT o.oid,oprcode,typname,typlen FROM pg_operator o JOIN pg_type t ON (oprleft = t.oid) WHERE (oprname='<') AND oprleft=oprright AND typlen>0 ORDER BY o.oid; SELECT o.oid,oprcode,typname,typlen FROM pg_operator o JOIN pg_type t ON (oprleft = t.oid) WHERE (oprname='<') AND oprleft=oprright AND typlen<0 ORDER BY o.oid; Benchmarking procedure of types with fixed length can be something like below: CREATE OR REPLACE FUNCTION pass_sort(typ regtype) RETURNS TABLE ( nrows integer, exec_time float ) AS $$ DECLARE data json; step integer; BEGIN SET work_mem='1GB'; FOR step IN 0..3 LOOP SELECT pow(100, step)::integer INTO nrows; DROP TABLE IF EXISTS test CASCADE; EXECUTE format('CREATE TABLE test AS SELECT gs::%s AS x FROM generate_series(1,%s) AS gs;', typ, nrows); EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, FORMAT JSON) SELECT * FROM test ORDER BY (x) DESC INTO data; SELECT data->0->'Execution Time' INTO exec_time; RETURN NEXT; END LOOP; END; $$ LANGUAGE plpgsql VOLATILE; Execution of SELECT * FROM pass_sort('integer'); shows quite linear grow of the execution time. So, using '2.0 * cpu_operator_cost' as a cost for the integer type (as a basis) we can calculate costs for other operators. Variable-width types, i think, could require more complex technique to check dependency on the length. Does this way look worthwhile? -- regards, Andrey Lepikhov Postgres Professional
В списке pgsql-hackers по дате отправления: