Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-04Extract make_SAOP_expr() function from match_orclause_to_indexcol()Alexander Korotkov
This commit extracts the code to generate ScalarArrayOpExpr on top of the list of expressions from match_orclause_to_indexcol() into a separate function make_SAOP_expr(). This function was extracted to be used in optimization for conversion of 'x IN (VALUES ...)' to 'x = ANY ...'. make_SAOP_expr() is placed in clauses.c file as only two additional headers were needed there compared with other places. Discussion: https://postgr.es/m/0184212d-1248-4f1f-a42d-f5cb1c1976d2%40tantorlabs.com Author: Alena Rybakina <a.rybakina@postgrespro.ru> Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Ivan Kush <ivan.kush@tantorlabs.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-02-17Implement Self-Join EliminationAlexander Korotkov
The Self-Join Elimination (SJE) feature removes an inner join of a plain table to itself in the query tree if it is proven that the join can be replaced with a scan without impacting the query result. Self-join and inner relation get replaced with the outer in query, equivalence classes, and planner info structures. Also, the inner restrictlist moves to the outer one with the removal of duplicated clauses. Thus, this optimization reduces the length of the range table list (this especially makes sense for partitioned relations), reduces the number of restriction clauses and, in turn, selectivity estimations, and potentially improves total planner prediction for the query. This feature is dedicated to avoiding redundancy, which can appear after pull-up transformations or the creation of an EquivalenceClass-derived clause like the below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3); SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x); SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x; In the future, we could also reduce redundancy caused by subquery pull-up after unnecessary outer join removal in cases like the one below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x); Also, it can drastically help to join partitioned tables, removing entries even before their expansion. The SJE proof is based on innerrel_is_unique() machinery. We can remove a self-join when for each outer row: 1. At most, one inner row matches the join clause; 2. Each matched inner row must be (physically) the same as the outer one; 3. Inner and outer rows have the same row mark. In this patch, we use the next approach to identify a self-join: 1. Collect all merge-joinable join quals which look like a.x = b.x; 2. Add to the list above the baseretrictinfo of the inner table; 3. Check innerrel_is_unique() for the qual list. If it returns false, skip this pair of joining tables; 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the possibility of self-join elimination, the inner and outer clauses must match exactly. The relation replacement procedure is not trivial and is partly combined with the one used to remove useless left joins. Tests covering this feature were added to join.sql. Some of the existing regression tests changed due to self-join removal logic. Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Simon Riggs <simon@2ndquadrant.com> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org> Reviewed-by: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Hywel Carver <hywel@skillerwhale.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec> Reviewed-by: Michał Kłeczek <michal@kleczek.org> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
2025-01-16Add OLD/NEW support to RETURNING in DML queries.Dean Rasheed
This allows the RETURNING list of INSERT/UPDATE/DELETE/MERGE queries to explicitly return old and new values by using the special aliases "old" and "new", which are automatically added to the query (if not already defined) while parsing its RETURNING list, allowing things like: RETURNING old.colname, new.colname, ... RETURNING old.*, new.* Additionally, a new syntax is supported, allowing the names "old" and "new" to be changed to user-supplied alias names, e.g.: RETURNING WITH (OLD AS o, NEW AS n) o.colname, n.colname, ... This is useful when the names "old" and "new" are already defined, such as inside trigger functions, allowing backwards compatibility to be maintained -- the interpretation of any existing queries that happen to already refer to relations called "old" or "new", or use those as aliases for other relations, is not changed. For an INSERT, old values will generally be NULL, and for a DELETE, new values will generally be NULL, but that may change for an INSERT with an ON CONFLICT ... DO UPDATE clause, or if a query rewrite rule changes the command type. Therefore, we put no restrictions on the use of old and new in any DML queries. Dean Rasheed, reviewed by Jian He and Jeff Davis. Discussion: https://postgr.es/m/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
2025-01-01Update copyright for 2025Bruce Momjian
Backpatch-through: 13
2024-11-26Reordering DISTINCT keys to match input path's pathkeysRichard Guo
The ordering of DISTINCT items is semantically insignificant, so we can reorder them as needed. In fact, in the parser, we absorb the sorting semantics of the sortClause as much as possible into the distinctClause, ensuring that one clause is a prefix of the other. This can help avoid a possible need to re-sort. In this commit, we attempt to adjust the DISTINCT keys to match the input path's pathkeys. This can likewise help avoid re-sorting, or allow us to use incremental-sort to save efforts. For DISTINCT ON expressions, the parser already ensures that they match the initial ORDER BY expressions. When reordering the DISTINCT keys, we must ensure that the resulting pathkey list matches the initial distinctClause pathkeys. This introduces a new GUC, enable_distinct_reordering, which allows the optimization to be disabled if needed. Author: Richard Guo Reviewed-by: Andrei Lepikhov Discussion: https://postgr.es/m/CAMbWs48dR26cCcX0f=8bja2JKQPcU64136kHk=xekHT9xschiQ@mail.gmail.com
2024-09-10Introduce an RTE for the grouping stepRichard Guo
If there are subqueries in the grouping expressions, each of these subqueries in the targetlist and HAVING clause is expanded into distinct SubPlan nodes. As a result, only one of these SubPlan nodes would be converted to reference to the grouping key column output by the Agg node; others would have to get evaluated afresh. This is not efficient, and with grouping sets this can cause wrong results issues in cases where they should go to NULL because they are from the wrong grouping set. Furthermore, during re-evaluation, these SubPlan nodes might use nulled column values from grouping sets, which is not correct. This issue is not limited to subqueries. For other types of expressions that are part of grouping items, if they are transformed into another form during preprocessing, they may fail to match lower target items. This can also lead to wrong results with grouping sets. To fix this issue, we introduce a new kind of RTE representing the output of the grouping step, with columns that are the Vars or expressions being grouped on. In the parser, we replace the grouping expressions in the targetlist and HAVING clause with Vars referencing this new RTE, so that the output of the parser directly expresses the semantic requirement that the grouping expressions be gotten from the grouping output rather than computed some other way. In the planner, we first preprocess all the columns of this new RTE and then replace any Vars in the targetlist and HAVING clause that reference this new RTE with the underlying grouping expressions, so that we will have only one instance of a SubPlan node for each subquery contained in the grouping expressions. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com
2024-04-09revert: Transform OR clauses to ANY expressionAlexander Korotkov
This commit reverts 72bd38cc99 due to implementation and design issues. Reported-by: Tom Lane Discussion: https://postgr.es/m/3604469.1712628736%40sss.pgh.pa.us
2024-04-07Transform OR clauses to ANY expressionAlexander Korotkov
Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) on the preliminary stage of optimization when we are still working with the expression tree. Here Cn is a n-th constant expression, 'expr' is non-constant expression, 'op' is an operator which returns boolean result and has a commuter (for the case of reverse order of constant and non-constant parts of the expression, like 'Cn op expr'). Sometimes it can lead to not optimal plan. This is why there is a or_to_any_transform_limit GUC. It specifies a threshold value of length of arguments in an OR expression that triggers the OR-to-ANY transformation. Generally, more groupable OR arguments mean that transformation will be more likely to win than to lose. Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru Author: Alena Rybakina <lena.ribackina@yandex.ru> Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com>
2024-01-04Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
2023-12-19Prevent integer overflow when forming tuple width estimates.Tom Lane
It's at least theoretically possible to overflow int32 when adding up column width estimates to make a row width estimate. (The bug example isn't terribly convincing as a real use-case, but perhaps wide joins would provide a more plausible route to trouble.) This'd lead to assertion failures or silly planner behavior. To forestall it, make the relevant functions compute their running sums in int64 arithmetic and then clamp to int32 range at the end. We can reasonably assume that MaxAllocSize is a hard limit on actual tuple width, so clamping to that is simply a correction for dubious input values, and there's no need to go as far as widening width variables to int64 everywhere. Per bug #18247 from RekGRpth. There've been no reports of this issue arising in practical cases, so I feel no need to back-patch. Richard Guo and Tom Lane Discussion: https://postgr.es/m/18247-11ac477f02954422@postgresql.org
2023-11-16Ensure we preprocess expressions before checking their volatility.Tom Lane
contain_mutable_functions and contain_volatile_functions give reliable answers only after expression preprocessing (specifically eval_const_expressions). Some places understand this, but some did not get the memo --- which is not entirely their fault, because the problem is documented only in places far away from those functions. Introduce wrapper functions that allow doing the right thing easily, and add commentary in hopes of preventing future mistakes from copy-and-paste of code that's only conditionally safe. Two actual bugs of this ilk are fixed here. We failed to preprocess column GENERATED expressions before checking mutability, so that the code could fail to detect the use of a volatile function default-argument expression, or it could reject a polymorphic function that is actually immutable on the datatype of interest. Likewise, column DEFAULT expressions weren't preprocessed before determining if it's safe to apply the attmissingval mechanism. A false negative would just result in an unnecessary table rewrite, but a false positive could allow the attmissingval mechanism to be used in a case where it should not be, resulting in unexpected initial values in a new column. In passing, re-order the steps in ComputePartitionAttrs so that its checks for invalid column references are done before applying expression_planner, rather than after. The previous coding would not complain if a partition expression contains a disallowed column reference that gets optimized away by constant folding, which seems to me to be a behavior we do not want. Per bug #18097 from Jim Keener. Back-patch to all supported versions. Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
2023-10-26Add trailing commas to enum definitionsPeter Eisentraut
Since C99, there can be a trailing comma after the last value in an enum definition. A lot of new code has been introducing this style on the fly. Some new patches are now taking an inconsistent approach to this. Some add the last comma on the fly if they add a new last value, some are trying to preserve the existing style in each place, some are even dropping the last comma if there was one. We could nudge this all in a consistent direction if we just add the trailing commas everywhere once. I omitted a few places where there was a fixed "last" value that will always stay last. I also skipped the header files of libpq and ecpg, in case people want to use those with older compilers. There were also a small number of cases where the enum type wasn't used anywhere (but the enum values were), which ended up confusing pgindent a bit, so I left those alone. Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
2023-02-15Rename force_parallel_mode to debug_parallel_queryDavid Rowley
force_parallel_mode is meant to be used to allow us to exercise the parallel query infrastructure to ensure that it's working as we expect. It seems some users think this GUC is for forcing the query planner into picking a parallel plan regardless of the costs. A quick look at the documentation would have made them realize that they were wrong, but the GUC is likely too conveniently named which, evidently, seems to often result in users expecting that it forces the planner into usefully parallelizing queries. Here we rename the GUC to something which casual users are less likely to mistakenly think is what they need to make their query run more quickly. For now, the old name can still be used. We'll revisit if the old name mapping can be removed once the buildfarm configs are all updated. Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAApHDvrsOi92_uA7PEaHZMH-S4Xv+MGhQWA+GrP8b1kjpS1HjQ@mail.gmail.com
2023-01-30Make Vars be outer-join-aware.Tom Lane
Traditionally we used the same Var struct to represent the value of a table column everywhere in parse and plan trees. This choice predates our support for SQL outer joins, and it's really a pretty bad idea with outer joins, because the Var's value can depend on where it is in the tree: it might go to NULL above an outer join. So expression nodes that are equal() per equalfuncs.c might not represent the same value, which is a huge correctness hazard for the planner. To improve this, decorate Var nodes with a bitmapset showing which outer joins (identified by RTE indexes) may have nulled them at the point in the parse tree where the Var appears. This allows us to trust that equal() Vars represent the same value. A certain amount of klugery is still needed to cope with cases where we re-order two outer joins, but it's possible to make it work without sacrificing that core principle. PlaceHolderVars receive similar decoration for the same reason. In the planner, we include these outer join bitmapsets into the relids that an expression is considered to depend on, and in consequence also add outer-join relids to the relids of join RelOptInfos. This allows us to correctly perceive whether an expression can be calculated above or below a particular outer join. This change affects FDWs that want to plan foreign joins. They *must* follow suit when labeling foreign joins in order to match with the core planner, but for many purposes (if postgres_fdw is any guide) they'd prefer to consider only base relations within the join. To support both requirements, redefine ForeignScan.fs_relids as base+OJ relids, and add a new field fs_base_relids that's set up by the core planner. Large though it is, this commit just does the minimum necessary to install the new mechanisms and get check-world passing again. Follow-up patches will perform some cleanup. (The README additions and comments mention some stuff that will appear in the follow-up.) Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-08-03Fix incorrect tests for SRFs in relation_can_be_sorted_early().Tom Lane
Commit fac1b470a thought we could check for set-returning functions by testing only the top-level node in an expression tree. This is wrong in itself, and to make matters worse it encouraged others to make the same mistake, by exporting tlist.c's special-purpose IS_SRF_CALL() as a widely-visible macro. I can't find any evidence that anyone's taken the bait, but it was only a matter of time. Use expression_returns_set() instead, and stuff the IS_SRF_CALL() genie back in its bottle, this time with a warning label. I also added a couple of cross-reference comments. After a fair amount of fooling around, I've despaired of making a robust test case that exposes the bug reliably, so no test case here. (Note that the test case added by fac1b470a is itself broken, in that it doesn't notice if you remove the code change. The repro given by the bug submitter currently doesn't fail either in v15 or HEAD, though I suspect that may indicate an unrelated bug.) Per bug #17564 from Martijn van Oosterhout. Back-patch to v13, as the faulty patch was. Discussion: https://postgr.es/m/17564-c7472c2f90ef2da3@postgresql.org
2022-05-21Avoid overflow hazard when clamping group counts to "long int".Tom Lane
Several places in the planner tried to clamp a double value to fit in a "long" by doing (long) Min(x, (double) LONG_MAX); This is subtly incorrect, because it casts LONG_MAX to double and potentially back again. If long is 64 bits then the double value is inexact, and the platform might round it up to LONG_MAX+1 resulting in an overflow and an undesirably negative output. While it's not hard to rewrite the expression into a safe form, let's put it into a common function to reduce the risk of someone doing it wrong in future. In principle this is a bug fix, but since the problem could only manifest with group count estimates exceeding 2^63, it seems unlikely that anyone has actually hit this or will do so anytime soon. We're fixing it mainly to satisfy fuzzer-type tools. That being the case, a HEAD-only fix seems sufficient. Andrey Lepikhov Discussion: https://postgr.es/m/ebbc2efb-7ef9-bf2f-1ada-d6ec48f70e58@postgrespro.ru
2022-04-08Apply PGDLLIMPORT markings broadly.Robert Haas
Up until now, we've had a policy of only marking certain variables in the PostgreSQL header files with PGDLLIMPORT, but now we've decided to mark them all. This means that extensions running on Windows should no longer operate at a disadvantage as compared to extensions running on Linux: if the variable is present in a header file, it should be accessible. Discussion: http://postgr.es/m/CA+TgmoYanc1_FSfimhgiWSqVyP5KKmh5NP2BWNwDhO8Pg2vGYQ@mail.gmail.com
2022-03-24Invent recursive_worktable_factor GUC to replace hard-wired constant.Tom Lane
Up to now, the planner estimated the size of a recursive query's worktable as 10 times the size of the non-recursive term. It's hard to see how to do significantly better than that automatically, but we can give users control over the multiplier to allow tuning for specific use-cases. The default behavior remains the same. Simon Riggs Discussion: https://postgr.es/m/CANbhV-EuaLm4H3g0+BSTYHEGxJj3Kht0R+rJ8vT57Dejnh=_nA@mail.gmail.com
2022-01-08Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-06-10Reconsider the handling of procedure OUT parameters.Tom Lane
Commit 2453ea142 redefined pg_proc.proargtypes to include the types of OUT parameters, for procedures only. While that had some advantages for implementing the SQL-spec behavior of DROP PROCEDURE, it was pretty disastrous from a number of other perspectives. Notably, since the primary key of pg_proc is name + proargtypes, this made it possible to have multiple procedures with identical names + input arguments and differing output argument types. That would make it impossible to call any one of the procedures by writing just NULL (or "?", or any other data-type-free notation) for the output argument(s). The change also seems likely to cause grave confusion for client applications that examine pg_proc and expect the traditional definition of proargtypes. Hence, revert the definition of proargtypes to what it was, and undo a number of complications that had been added to support that. To support the SQL-spec behavior of DROP PROCEDURE, when there are no argmode markers in the command's parameter list, we perform the lookup both ways (that is, matching against both proargtypes and proallargtypes), succeeding if we get just one unique match. In principle this could result in ambiguous-function failures that would not happen when using only one of the two rules. However, overloading of procedure names is thought to be a pretty rare usage, so this shouldn't cause many problems in practice. Postgres-specific code such as pg_dump can defend against any possibility of such failures by being careful to specify argmodes for all procedure arguments. This also fixes a few other bugs in the area of CALL statements with named parameters, and improves the documentation a little. catversion bump forced because the representation of procedures with OUT arguments changes. Discussion: https://postgr.es/m/3742981.1621533210@sss.pgh.pa.us
2021-04-08Speedup ScalarArrayOpExpr evaluationDavid Rowley
ScalarArrayOpExprs with "useOr=true" and a set of Consts on the righthand side have traditionally been evaluated by using a linear search over the array. When these arrays contain large numbers of elements then this linear search could become a significant part of execution time. Here we add a new method of evaluating ScalarArrayOpExpr expressions to allow them to be evaluated by first building a hash table containing each element, then on subsequent evaluations, we just probe that hash table to determine if there is a match. The planner is in charge of determining when this optimization is possible and it enables it by setting hashfuncid in the ScalarArrayOpExpr. The executor will only perform the hash table evaluation when the hashfuncid is set. This means that not all cases are optimized. For example CHECK constraints containing an IN clause won't go through the planner, so won't get the hashfuncid set. We could maybe do something about that at some later date. The reason we're not doing it now is from fear that we may slow down cases where the expression is evaluated only once. Those cases can be common, for example, a single row INSERT to a table with a CHECK constraint containing an IN clause. In the planner, we enable this when there are suitable hash functions for the ScalarArrayOpExpr's operator and only when there is at least MIN_ARRAY_SIZE_FOR_HASHED_SAOP elements in the array. The threshold is currently set to 9. Author: James Coleman, David Rowley Reviewed-by: David Rowley, Tomas Vondra, Heikki Linnakangas Discussion: https://postgr.es/m/CAAaqYe8x62+=wn0zvNKCj55tPpg-JBHzhZFFc6ANovdqFw7-dA@mail.gmail.com
2021-01-21Fix pull_varnos' miscomputation of relids set for a PlaceHolderVar.Tom Lane
Previously, pull_varnos() took the relids of a PlaceHolderVar as being equal to the relids in its contents, but that fails to account for the possibility that we have to postpone evaluation of the PHV due to outer joins. This could result in a malformed plan. The known cases end up triggering the "failed to assign all NestLoopParams to plan nodes" sanity check in createplan.c, but other symptoms may be possible. The right value to use is the join level we actually intend to evaluate the PHV at. We can get that from the ph_eval_at field of the associated PlaceHolderInfo. However, there are some places that call pull_varnos() before the PlaceHolderInfos have been created; in that case, fall back to the conservative assumption that the PHV will be evaluated at its syntactic level. (In principle this might result in missing some legal optimization, but I'm not aware of any cases where it's an issue in practice.) Things are also a bit ticklish for calls occurring during deconstruct_jointree(), but AFAICS the ph_eval_at fields should have reached their final values by the time we need them. The main problem in making this work is that pull_varnos() has no way to get at the PlaceHolderInfos. We can fix that easily, if a bit tediously, in HEAD by passing it the planner "root" pointer. In the back branches that'd cause an unacceptable API/ABI break for extensions, so leave the existing entry points alone and add new ones with the additional parameter. (If an old entry point is called and encounters a PHV, it'll fall back to using the syntactic level, again possibly missing some valid optimization.) Back-patch to v12. The computation is surely also wrong before that, but it appears that we cannot reach a bad plan thanks to join order restrictions imposed on the subquery that the PlaceHolderVar came from. The error only became reachable when commit 4be058fe9 allowed trivial subqueries to be collapsed out completely, eliminating their join order restrictions. Per report from Stephan Springl. Discussion: https://postgr.es/m/171041.1610849523@sss.pgh.pa.us
2021-01-02Update copyright for 2021Bruce Momjian
Backpatch-through: 9.5
2020-12-21Disallow SRFs when considering sorts below Gather MergeTomas Vondra
While we do allow SRFs in ORDER BY, scan/join processing should not consider such cases - such sorts should only happen via final Sort atop a ProjectSet. So make sure we don't try adding such sorts below Gather Merge, just like we do for expressions that are volatile and/or not parallel safe. Backpatch to PostgreSQL 13, where this code was introduced as part of the Incremental Sort patch. Author: James Coleman Reviewed-by: Tomas Vondra Backpatch-through: 13 Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=H5M8LSchj5e5OxpTAg@mail.gmail.com Discussion: https://postgr.es/m/295524.1606246314%40sss.pgh.pa.us
2020-12-03Improve estimation of OR clauses using extended statistics.Dean Rasheed
Formerly we only applied extended statistics to an OR clause as part of the clauselist_selectivity() code path for an OR clause appearing in an implicitly-ANDed list of clauses. This meant that it could only use extended statistics if all sub-clauses of the OR clause were covered by a single extended statistics object. Instead, teach clause_selectivity() how to apply extended statistics to an OR clause by handling its ORed list of sub-clauses in a similar manner to an implicitly-ANDed list of sub-clauses, but with different combination rules. This allows one or more extended statistics objects to be used to estimate all or part of the list of sub-clauses. Any remaining sub-clauses are then treated as if they are independent. Additionally, to avoid double-application of extended statistics, this introduces "extended" versions of clause_selectivity() and clauselist_selectivity(), which include an option to ignore extended statistics. This replaces the old clauselist_selectivity_simple() function which failed to completely ignore extended statistics when called from the extended statistics code. A known limitation of the current infrastructure is that an AND clause under an OR clause is not treated as compatible with extended statistics (because we don't build RestrictInfos for such sub-AND clauses). Thus, for example, "(a=1 AND b=1) OR (a=2 AND b=2)" will currently be treated as two independent AND clauses (each of which may be estimated using extended statistics), but extended statistics will not currently be used to account for any possible overlap between those clauses. Improving that is left as a task for the future. Original patch by Tomas Vondra, with additional improvements by me. Discussion: https://postgr.es/m/20200113230008.g67iyk4cs3xbnjju@development
2020-03-30Allow the planner-related functions and hook to accept the query string.Fujii Masao
This commit adds query_string argument into the planner-related functions and hook and allows us to pass the query string to them. Currently there is no user of the query string passed. But the upcoming patch for the planning counters will add the planning hook function into pg_stat_statements and the function will need the query string. So this change will be necessary for that patch. Also this change is useful for some extensions that want to use the query string in their planner hook function. Author: Pascal Legrand, Julien Rouhaud Reviewed-by: Yoshikazu Imai, Tom Lane, Fujii Masao Discussion: https://postgr.es/m/CAOBaU_bU1m3_XF5qKYtSj1ua4dxd=FWDyh2SH4rSJAUUfsGmAQ@mail.gmail.com Discussion: https://postgr.es/m/1583789487074-0.post@n3.nabble.com
2020-01-01Update copyrights for 2020Bruce Momjian
Backpatch-through: update all files in master, backpatch legal files through 9.4
2019-05-22Phase 2 pgindent run for v12.Tom Lane
Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-03-27Add support for multivariate MCV listsTomas Vondra
Introduce a third extended statistic type, supported by the CREATE STATISTICS command - MCV lists, a generalization of the statistic already built and used for individual columns. Compared to the already supported types (n-distinct coefficients and functional dependencies), MCV lists are more complex, include column values and allow estimation of much wider range of common clauses (equality and inequality conditions, IS NULL, IS NOT NULL etc.). Similarly to the other types, a new pseudo-type (pg_mcv_list) is used. Author: Tomas Vondra Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
2019-02-12Allow extensions to generate lossy index conditions.Tom Lane
For a long time, indxpath.c has had the ability to extract derived (lossy) index conditions from certain operators such as LIKE. For just as long, it's been obvious that we really ought to make that capability available to extensions. This commit finally accomplishes that, by adding another API for planner support functions that lets them create derived index conditions for their functions. As proof of concept, the hardwired "special index operator" code formerly present in indxpath.c is pushed out to planner support functions attached to LIKE and other relevant operators. A weak spot in this design is that an extension needs to know OIDs for the operators, datatypes, and opfamilies involved in the transformation it wants to make. The core-code prototypes use hard-wired OID references but extensions don't have that option for their own operators etc. It's usually possible to look up the required info, but that may be slow and inconvenient. However, improving that situation is a separate task. I want to do some additional refactorization around selfuncs.c, but that also seems like a separate task. Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
2019-01-29Rename nodes/relation.h to nodes/pathnodes.h.Tom Lane
The old name of this file was never a very good indication of what it was for. Now that there's also access/relation.h, we have a potential confusion hazard as well, so let's rename it to something more apropos. Per discussion, "pathnodes.h" is reasonable, since a good fraction of the file is Path node definitions. While at it, tweak a couple of other headers that were gratuitously importing relation.h into modules that don't need it. Discussion: https://postgr.es/m/7719.1548688728@sss.pgh.pa.us
2019-01-29Refactor planner's header files.Tom Lane
Create a new header optimizer/optimizer.h, which exposes just the planner functions that can be used "at arm's length", without need to access Paths or the other planner-internal data structures defined in nodes/relation.h. This is intended to provide the whole planner API seen by most of the rest of the system; although FDWs still need to use additional stuff, and more thought is also needed about just what selfuncs.c should rely on. The main point of doing this now is to limit the amount of new #include baggage that will be needed by "planner support functions", which I expect to introduce later, and which will be in relevant datatype modules rather than anywhere near the planner. This commit just moves relevant declarations into optimizer.h from other header files (a couple of which go away because everything got moved), and adjusts #include lists to match. There's further cleanup that could be done if we want to decide that some stuff being exposed by optimizer.h doesn't belong in the planner at all, but I'll leave that for another day. Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us