|
| 1 | +Subselect notes from Vadim. |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | +From owner-pgsql-hackers@hub.org Fri Feb 13 09:01:19 1998 |
| 6 | +Received: from renoir.op.net (root@renoir.op.net [209.152.193.4]) |
| 7 | + by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA11576 |
| 8 | + for <maillist@candle.pha.pa.us>; Fri, 13 Feb 1998 09:01:17 -0500 (EST) |
| 9 | +Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id IAA09761 for <maillist@candle.pha.pa.us>; Fri, 13 Feb 1998 08:41:22 -0500 (EST) |
| 10 | +Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id IAA08135; Fri, 13 Feb 1998 08:40:17 -0500 (EST) |
| 11 | +Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 13 Feb 1998 08:38:42 -0500 (EST) |
| 12 | +Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id IAA06646 for pgsql-hackers-outgoing; Fri, 13 Feb 1998 08:38:35 -0500 (EST) |
| 13 | +Received: from dune.krasnet.ru (dune.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id IAA04568 for <hackers@postgreSQL.org>; Fri, 13 Feb 1998 08:37:16 -0500 (EST) |
| 14 | +Received: from sable.krasnoyarsk.su (dune.krasnet.ru [193.125.44.86]) |
| 15 | + by dune.krasnet.ru (8.8.7/8.8.7) with ESMTP id UAA13717 |
| 16 | + for <hackers@postgreSQL.org>; Fri, 13 Feb 1998 20:51:03 +0700 (KRS) |
| 17 | + (envelope-from vadim@sable.krasnoyarsk.su) |
| 18 | +Message-ID: <34E44FBA.D64E7997@sable.krasnoyarsk.su> |
| 19 | +Date: Fri, 13 Feb 1998 20:50:50 +0700 |
| 20 | +From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su> |
| 21 | +Organization: ITTS (Krasnoyarsk) |
| 22 | +X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386) |
| 23 | +MIME-Version: 1.0 |
| 24 | +To: PostgreSQL Developers List <hackers@postgreSQL.org> |
| 25 | +Subject: [HACKERS] Subselects are in CVS... |
| 26 | +Content-Type: text/plain; charset=us-ascii |
| 27 | +Content-Transfer-Encoding: 7bit |
| 28 | +Sender: owner-pgsql-hackers@hub.org |
| 29 | +Precedence: bulk |
| 30 | +Status: OR |
| 31 | + |
| 32 | +This is some implementation notes and opened issues... |
| 33 | + |
| 34 | +First, implementation uses new type of parameters - PARAM_EXEC - to deal |
| 35 | +with correlation Vars. When query_planner() is called, it first tries to |
| 36 | +replace all upper queries Var referenced in current query with Param of |
| 37 | +this type. Some global variables are used to keep mapping of Vars to |
| 38 | +Params and Params to Vars. |
| 39 | + |
| 40 | +After this, all current query' SubLinks are processed: for each SubLink |
| 41 | +found in query' qual union_planner() (old planner() function) will be |
| 42 | +called to plan corresponding subselect (union_planner() calls |
| 43 | +query_planner() for "simple" query and supports UNIONs). After subselect |
| 44 | +are planned, optimizer knows about is this correlated, un-correlated or |
| 45 | +_undirect_ correlated (references some grand-parent Vars but no parent |
| 46 | +ones: uncorrelated from the parent' point of view) query. |
| 47 | + |
| 48 | +For uncorrelated and undirect correlated subqueries of EXPRession or |
| 49 | +EXISTS type SubLinks will be replaced with "normal" clauses from |
| 50 | +SubLink->Oper list (I changed this list to be list of EXPR nodes, |
| 51 | +not just Oper ones). Right sides of these nodes are replaced with |
| 52 | +PARAM_EXEC parameters. This is second use of new parameter type. |
| 53 | +At run-time these parameters get value from result of subquery |
| 54 | +evaluation (i.e. - from target list of subquery). Execution plan of |
| 55 | +subquery itself becomes init plan of parent query. InitPlan knows |
| 56 | +what parameters are to get values from subquery' results and will be |
| 57 | +executed "on-demand" (for query select * from table where x > 0 and |
| 58 | +y > (select max(a) from table_a) subquery will not be executed at all |
| 59 | +if there are no tuples with x > 0 _and_ y is not used in index scan). |
| 60 | + |
| 61 | +SubLinks for subqueries of all other types are transformed into |
| 62 | +new type of Expr node - SUBPLAN_EXPR. Expr->args are just correlation |
| 63 | +variables from _parent_ query. Expr->oper is new SubPlan node. |
| 64 | + |
| 65 | +This node is used for InitPlan too. It keeps subquery range table, |
| 66 | +indices of Params which are to get value from _parent_ query Vars |
| 67 | +(i.e. - from Expr->args), indices of Params into which subquery' |
| 68 | +results are to be substituted (this is for InitPlans), SubLink |
| 69 | +and subquery' execution plan. |
| 70 | + |
| 71 | +Plan node was changed to know about dependencies on Params from |
| 72 | +parent queries and InitPlans, to keep list of changed Params |
| 73 | +(from the above) and so be re-scanned if this list is not NULL. |
| 74 | +Also, added list of InitPlans (actually, all of them for current |
| 75 | +query are in topmost plan node now) and other SubPlans (from |
| 76 | +plan->qual) - to initialize them and let them know about changed |
| 77 | +Params (from the list of their "interests"). |
| 78 | + |
| 79 | +After all SubLinks are processed, query_planner() calls qual' |
| 80 | +canonificator and does "normal" work. By using Params optimizer |
| 81 | +is mostly unchanged. |
| 82 | + |
| 83 | +Well, Executor. To get subplans re-evaluated without ExecutorStart() |
| 84 | +and ExecutorEnd() (without opening and closing relations and indices |
| 85 | +and without many palloc() and pfree() - this is what SQL-funcs does |
| 86 | +on each call) ExecReScan() now supports most of Plan types... |
| 87 | + |
| 88 | +Explanation of EXPLAIN. |
| 89 | + |
| 90 | +vac=> explain select * from tmp where x >= (select max(x2) from test2 |
| 91 | +where y2 = y and exists (select * from tempx where tx = x)); |
| 92 | +NOTICE: QUERY PLAN: |
| 93 | + |
| 94 | +Seq Scan on tmp (cost=40.03 size=101 width=8) |
| 95 | + SubPlan |
| 96 | + ^^^^^^^ subquery is in Seq Scan' qual, its plan is below |
| 97 | + -> Aggregate (cost=2.05 size=0 width=0) |
| 98 | + InitPlan |
| 99 | + ^^^^^^^^ EXISTS subsubquery is InitPlan of subquery |
| 100 | + -> Seq Scan on tempx (cost=4.33 size=1 width=4) |
| 101 | + -> Result (cost=2.05 size=0 width=0) |
| 102 | + ^^^^^^ EXISTS subsubquery was transformed into Param |
| 103 | + and so we have Result node here |
| 104 | + -> Index Scan on test2 (cost=2.05 size=1 width=4) |
| 105 | + |
| 106 | + |
| 107 | +Opened issues. |
| 108 | + |
| 109 | +1. No read permissions checking (easy, just not done yet). |
| 110 | +2. readfuncs.c can't read subplan-s (easy, not critical, because of |
| 111 | + we currently nowhere use ascii representation of execution plans). |
| 112 | +3. ExecReScan() doesn't support all plan types. At least support for |
| 113 | + MergeJoin has to be implemented. |
| 114 | +4. Memory leaks in ExecReScan(). |
| 115 | +5. I need in advice: if subquery introduced with NOT IN doesn't return |
| 116 | + any tuples then qualification is failed, yes ? |
| 117 | +6. Regression tests !!!!!!!!!!!!!!!!!!!! |
| 118 | + (Could we use data/queries from MySQL' crash.me ? |
| 119 | + Copyright-ed ? Could they give us rights ?) |
| 120 | +7. Performance. |
| 121 | + - Should be good when subquery is transformed into InitPlan. |
| 122 | + - Something should be done for uncorrelated subqueries introduced |
| 123 | + with ANY/ALL - keep thinking. Currently, subplan will be re-scanned |
| 124 | + for each parent tuple - very slow... |
| 125 | + |
| 126 | +Results of some test. TMP is table with x,y (int4-s), x in 0-9, |
| 127 | +y = 100 - x, 1000 tuples (10 duplicates of each tuple). TEST2 is table |
| 128 | +with x2, y2 (int4-s), x2 in 1-99, y2 = 100 -x2, 10000 tuples (100 dups). |
| 129 | + |
| 130 | + Trying |
| 131 | + |
| 132 | +select * from tmp where x >= (select max(x2) from test2 where y2 = y); |
| 133 | + |
| 134 | + and |
| 135 | + |
| 136 | +begin; |
| 137 | +select y as ty, max(x2) as mx into table tsub from test2, tmp |
| 138 | +where y2 = y group by ty; |
| 139 | +vacuum tsub; |
| 140 | +select x, y from tmp, tsub where x >= mx and y = ty; |
| 141 | +drop table tsub; |
| 142 | +end; |
| 143 | + |
| 144 | + Without index on test2(y2): |
| 145 | + |
| 146 | +SubSelect -> 320 sec |
| 147 | +Using temp table -> 32 sec |
| 148 | + |
| 149 | + Having index |
| 150 | + |
| 151 | +SubSelect -> 17 sec (2M of memory) |
| 152 | +Using temp table -> 32 sec (12M of memory: -S 8192) |
| 153 | + |
| 154 | +Vadim |
| 155 | + |
| 156 | + |
0 commit comments