Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 1e7c4bb

Browse files
committed
Change unknown-type literals to type text in SELECT and RETURNING lists.
Previously, we left such literals alone if the query or subquery had no properties forcing a type decision to be made (such as an ORDER BY or DISTINCT clause using that output column). This meant that "unknown" could be an exposed output column type, which has never been a great idea because it could result in strange failures later on. For example, an outer query that tried to do any operations on an unknown-type subquery output would generally fail with some weird error like "failed to find conversion function from unknown to text" or "could not determine which collation to use for string comparison". Also, if the case occurred in a CREATE VIEW's query then the view would have an unknown-type column, causing similar failures in queries trying to use the view. To fix, at the tail end of parse analysis of a query, forcibly convert any remaining "unknown" literals in its SELECT or RETURNING list to type text. However, provide a switch to suppress that, and use it in the cases of SELECT inside a set operation or INSERT command. In those cases we already had type resolution rules that make use of context information from outside the subquery proper, and we don't want to change that behavior. Also, change creation of an unknown-type column in a relation from a warning to a hard error. The error should be unreachable now in CREATE VIEW or CREATE MATVIEW, but it's still possible to explicitly say "unknown" in CREATE TABLE or CREATE (composite) TYPE. We want to forbid that because it's nothing but a foot-gun. This change creates a pg_upgrade failure case: a matview that contains an unknown-type column can't be pg_upgraded, because reparsing the matview's defining query will now decide that the column is of type text, which doesn't match the cstring-like storage that the old materialized column would actually have. Add a checking pass to detect that. While at it, we can detect tables or composite types that would fail, essentially for free. Those would fail safely anyway later on, but we might as well fail earlier. This patch is by me, but it owes something to previous investigations by Rahila Syed. Also thanks to Ashutosh Bapat and Michael Paquier for review. Discussion: https://postgr.es/m/CAH2L28uwwbL9HUM-WR=hromW1Cvamkn7O-g8fPY2m=_7muJ0oA@mail.gmail.com
1 parent 123f03b commit 1e7c4bb

26 files changed

+386
-36
lines changed

doc/src/sgml/ref/create_view.sgml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -251,9 +251,10 @@ CREATE VIEW [ <replaceable>schema</> . ] <replaceable>view_name</> AS WITH RECUR
251251
<programlisting>
252252
CREATE VIEW vista AS SELECT 'Hello World';
253253
</programlisting>
254-
is bad form in two ways: the column name defaults to <literal>?column?</>,
255-
and the column data type defaults to <type>unknown</>. If you want a
256-
string literal in a view's result, use something like:
254+
is bad form because the column name defaults to <literal>?column?</>;
255+
also, the column data type defaults to <type>text</>, which might not
256+
be what you wanted. Better style for a string literal in a view's
257+
result is something like:
257258
<programlisting>
258259
CREATE VIEW vista AS SELECT text 'Hello World' AS hello;
259260
</programlisting>

doc/src/sgml/typeconv.sgml

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -984,7 +984,8 @@ domain's base type for all subsequent steps.
984984
<para>
985985
If all inputs are of type <type>unknown</type>, resolve as type
986986
<type>text</type> (the preferred type of the string category).
987-
Otherwise, <type>unknown</type> inputs are ignored.
987+
Otherwise, <type>unknown</type> inputs are ignored for the purposes
988+
of the remaining rules.
988989
</para>
989990
</step>
990991

@@ -1076,6 +1077,53 @@ but <type>integer</> can be implicitly cast to <type>real</>, the union
10761077
result type is resolved as <type>real</>.
10771078
</para>
10781079
</example>
1080+
</sect1>
1081+
1082+
<sect1 id="typeconv-select">
1083+
<title><literal>SELECT</literal> Output Columns</title>
1084+
1085+
<indexterm zone="typeconv-select">
1086+
<primary>SELECT</primary>
1087+
<secondary>determination of result type</secondary>
1088+
</indexterm>
1089+
1090+
<para>
1091+
The rules given in the preceding sections will result in assignment
1092+
of non-<type>unknown</> data types to all expressions in a SQL query,
1093+
except for unspecified-type literals that appear as simple output
1094+
columns of a <command>SELECT</> command. For example, in
1095+
1096+
<screen>
1097+
SELECT 'Hello World';
1098+
</screen>
1099+
1100+
there is nothing to identify what type the string literal should be
1101+
taken as. In this situation <productname>PostgreSQL</> will fall back
1102+
to resolving the literal's type as <type>text</>.
1103+
</para>
1104+
1105+
<para>
1106+
When the <command>SELECT</> is one arm of a <literal>UNION</>
1107+
(or <literal>INTERSECT</> or <literal>EXCEPT</>) construct, or when it
1108+
appears within <command>INSERT ... SELECT</>, this rule is not applied
1109+
since rules given in preceding sections take precedence. The type of an
1110+
unspecified-type literal can be taken from the other <literal>UNION</> arm
1111+
in the first case, or from the destination column in the second case.
1112+
</para>
1113+
1114+
<para>
1115+
<literal>RETURNING</> lists are treated the same as <command>SELECT</>
1116+
output lists for this purpose.
1117+
</para>
1118+
1119+
<note>
1120+
<para>
1121+
Prior to <productname>PostgreSQL</> 10, this rule did not exist, and
1122+
unspecified-type literals in a <command>SELECT</> output list were
1123+
left as type <type>unknown</>. That had assorted bad consequences,
1124+
so it's been changed.
1125+
</para>
1126+
</note>
10791127

10801128
</sect1>
10811129
</chapter>

src/backend/catalog/heap.c

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -490,18 +490,8 @@ CheckAttributeType(const char *attname,
490490
char att_typtype = get_typtype(atttypid);
491491
Oid att_typelem;
492492

493-
if (atttypid == UNKNOWNOID)
494-
{
495-
/*
496-
* Warn user, but don't fail, if column to be created has UNKNOWN type
497-
* (usually as a result of a 'retrieve into' - jolly)
498-
*/
499-
ereport(WARNING,
500-
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
501-
errmsg("column \"%s\" has type %s", attname, "unknown"),
502-
errdetail("Proceeding with relation creation anyway.")));
503-
}
504-
else if (att_typtype == TYPTYPE_PSEUDO)
493+
if (atttypid == UNKNOWNOID ||
494+
att_typtype == TYPTYPE_PSEUDO)
505495
{
506496
/*
507497
* Refuse any attempt to create a pseudo-type column, except for a

src/backend/parser/analyze.c

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -156,13 +156,15 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
156156
Query *
157157
parse_sub_analyze(Node *parseTree, ParseState *parentParseState,
158158
CommonTableExpr *parentCTE,
159-
bool locked_from_parent)
159+
bool locked_from_parent,
160+
bool resolve_unknowns)
160161
{
161162
ParseState *pstate = make_parsestate(parentParseState);
162163
Query *query;
163164

164165
pstate->p_parent_cte = parentCTE;
165166
pstate->p_locked_from_parent = locked_from_parent;
167+
pstate->p_resolve_unknowns = resolve_unknowns;
166168

167169
query = transformStmt(pstate, parseTree);
168170

@@ -570,10 +572,17 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt)
570572
* otherwise the behavior of SELECT within INSERT might be different
571573
* from a stand-alone SELECT. (Indeed, Postgres up through 6.5 had
572574
* bugs of just that nature...)
575+
*
576+
* The sole exception is that we prevent resolving unknown-type
577+
* outputs as TEXT. This does not change the semantics since if the
578+
* column type matters semantically, it would have been resolved to
579+
* something else anyway. Doing this lets us resolve such outputs as
580+
* the target column's type, which we handle below.
573581
*/
574582
sub_pstate->p_rtable = sub_rtable;
575583
sub_pstate->p_joinexprs = NIL; /* sub_rtable has no joins */
576584
sub_pstate->p_namespace = sub_namespace;
585+
sub_pstate->p_resolve_unknowns = false;
577586

578587
selectQuery = transformStmt(sub_pstate, stmt->selectStmt);
579588

@@ -1269,6 +1278,10 @@ transformSelectStmt(ParseState *pstate, SelectStmt *stmt)
12691278
pstate->p_windowdefs,
12701279
&qry->targetList);
12711280

1281+
/* resolve any still-unresolved output columns as being type text */
1282+
if (pstate->p_resolve_unknowns)
1283+
resolveTargetListUnknowns(pstate, qry->targetList);
1284+
12721285
qry->rtable = pstate->p_rtable;
12731286
qry->jointree = makeFromExpr(pstate->p_joinlist, qual);
12741287

@@ -1843,11 +1856,19 @@ transformSetOperationTree(ParseState *pstate, SelectStmt *stmt,
18431856
/*
18441857
* Transform SelectStmt into a Query.
18451858
*
1859+
* This works the same as SELECT transformation normally would, except
1860+
* that we prevent resolving unknown-type outputs as TEXT. This does
1861+
* not change the subquery's semantics since if the column type
1862+
* matters semantically, it would have been resolved to something else
1863+
* anyway. Doing this lets us resolve such outputs using
1864+
* select_common_type(), below.
1865+
*
18461866
* Note: previously transformed sub-queries don't affect the parsing
18471867
* of this sub-query, because they are not in the toplevel pstate's
18481868
* namespace list.
18491869
*/
1850-
selectQuery = parse_sub_analyze((Node *) stmt, pstate, NULL, false);
1870+
selectQuery = parse_sub_analyze((Node *) stmt, pstate,
1871+
NULL, false, false);
18511872

18521873
/*
18531874
* Check for bogus references to Vars on the current query level (but
@@ -2350,6 +2371,10 @@ transformReturningList(ParseState *pstate, List *returningList)
23502371
/* mark column origins */
23512372
markTargetListOrigins(pstate, rlist);
23522373

2374+
/* resolve any still-unresolved output columns as being type text */
2375+
if (pstate->p_resolve_unknowns)
2376+
resolveTargetListUnknowns(pstate, rlist);
2377+
23532378
/* restore state */
23542379
pstate->p_next_resno = save_next_resno;
23552380

src/backend/parser/parse_clause.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -471,7 +471,8 @@ transformRangeSubselect(ParseState *pstate, RangeSubselect *r)
471471
* Analyze and transform the subquery.
472472
*/
473473
query = parse_sub_analyze(r->subquery, pstate, NULL,
474-
isLockedRefname(pstate, r->alias->aliasname));
474+
isLockedRefname(pstate, r->alias->aliasname),
475+
true);
475476

476477
/* Restore state */
477478
pstate->p_lateral_active = false;

src/backend/parser/parse_cte.c

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -241,7 +241,7 @@ analyzeCTE(ParseState *pstate, CommonTableExpr *cte)
241241
/* Analysis not done already */
242242
Assert(!IsA(cte->ctequery, Query));
243243

244-
query = parse_sub_analyze(cte->ctequery, pstate, cte, false);
244+
query = parse_sub_analyze(cte->ctequery, pstate, cte, false, true);
245245
cte->ctequery = (Node *) query;
246246

247247
/*
@@ -393,11 +393,10 @@ analyzeCTETargetList(ParseState *pstate, CommonTableExpr *cte, List *tlist)
393393

394394
/*
395395
* If the CTE is recursive, force the exposed column type of any
396-
* "unknown" column to "text". This corresponds to the fact that
397-
* SELECT 'foo' UNION SELECT 'bar' will ultimately produce text. We
398-
* might see "unknown" as a result of an untyped literal in the
399-
* non-recursive term's select list, and if we don't convert to text
400-
* then we'll have a mismatch against the UNION result.
396+
* "unknown" column to "text". We must deal with this here because
397+
* we're called on the non-recursive term before there's been any
398+
* attempt to force unknown output columns to some other type. We
399+
* have to resolve unknowns before looking at the recursive term.
401400
*
402401
* The column might contain 'foo' COLLATE "bar", so don't override
403402
* collation if it's already set.

src/backend/parser/parse_expr.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1846,7 +1846,7 @@ transformSubLink(ParseState *pstate, SubLink *sublink)
18461846
/*
18471847
* OK, let's transform the sub-SELECT.
18481848
*/
1849-
qtree = parse_sub_analyze(sublink->subselect, pstate, NULL, false);
1849+
qtree = parse_sub_analyze(sublink->subselect, pstate, NULL, false, true);
18501850

18511851
/*
18521852
* Check that we got a SELECT. Anything else should be impossible given

src/backend/parser/parse_node.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ make_parsestate(ParseState *parentParseState)
5151

5252
/* Fill in fields that don't start at null/false/zero */
5353
pstate->p_next_resno = 1;
54+
pstate->p_resolve_unknowns = true;
5455

5556
if (parentParseState)
5657
{

src/backend/parser/parse_target.c

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -288,13 +288,42 @@ transformExpressionList(ParseState *pstate, List *exprlist,
288288
}
289289

290290

291+
/*
292+
* resolveTargetListUnknowns()
293+
* Convert any unknown-type targetlist entries to type TEXT.
294+
*
295+
* We do this after we've exhausted all other ways of identifying the output
296+
* column types of a query.
297+
*/
298+
void
299+
resolveTargetListUnknowns(ParseState *pstate, List *targetlist)
300+
{
301+
ListCell *l;
302+
303+
foreach(l, targetlist)
304+
{
305+
TargetEntry *tle = (TargetEntry *) lfirst(l);
306+
Oid restype = exprType((Node *) tle->expr);
307+
308+
if (restype == UNKNOWNOID)
309+
{
310+
tle->expr = (Expr *) coerce_type(pstate, (Node *) tle->expr,
311+
restype, TEXTOID, -1,
312+
COERCION_IMPLICIT,
313+
COERCE_IMPLICIT_CAST,
314+
-1);
315+
}
316+
}
317+
}
318+
319+
291320
/*
292321
* markTargetListOrigins()
293322
* Mark targetlist columns that are simple Vars with the source
294323
* table's OID and column number.
295324
*
296-
* Currently, this is done only for SELECT targetlists, since we only
297-
* need the info if we are going to send it to the frontend.
325+
* Currently, this is done only for SELECT targetlists and RETURNING lists,
326+
* since we only need the info if we are going to send it to the frontend.
298327
*/
299328
void
300329
markTargetListOrigins(ParseState *pstate, List *targetlist)

src/bin/pg_upgrade/check.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,10 @@ check_and_dump_old_cluster(bool live_check)
9999
check_for_reg_data_type_usage(&old_cluster);
100100
check_for_isn_and_int8_passing_mismatch(&old_cluster);
101101

102+
/* Pre-PG 10 allowed tables with 'unknown' type columns */
103+
if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
104+
old_9_6_check_for_unknown_data_type_usage(&old_cluster);
105+
102106
/* 9.5 and below should not have roles starting with pg_ */
103107
if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
104108
check_for_pg_role_prefix(&old_cluster);

src/bin/pg_upgrade/pg_upgrade.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -442,6 +442,7 @@ void pg_putenv(const char *var, const char *val);
442442
void new_9_0_populate_pg_largeobject_metadata(ClusterInfo *cluster,
443443
bool check_mode);
444444
void old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster);
445+
void old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster);
445446

446447
/* parallel.c */
447448
void parallel_exec_prog(const char *log_file, const char *opt_log_file,

src/bin/pg_upgrade/version.c

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,3 +185,100 @@ old_9_3_check_for_line_data_type_usage(ClusterInfo *cluster)
185185
else
186186
check_ok();
187187
}
188+
189+
190+
/*
191+
* old_9_6_check_for_unknown_data_type_usage()
192+
* 9.6 -> 10
193+
* It's no longer allowed to create tables or views with "unknown"-type
194+
* columns. We do not complain about views with such columns, because
195+
* they should get silently converted to "text" columns during the DDL
196+
* dump and reload; it seems unlikely to be worth making users do that
197+
* by hand. However, if there's a table with such a column, the DDL
198+
* reload will fail, so we should pre-detect that rather than failing
199+
* mid-upgrade. Worse, if there's a matview with such a column, the
200+
* DDL reload will silently change it to "text" which won't match the
201+
* on-disk storage (which is like "cstring"). So we *must* reject that.
202+
* Also check composite types, in case they are used for table columns.
203+
* We needn't check indexes, because "unknown" has no opclasses.
204+
*/
205+
void
206+
old_9_6_check_for_unknown_data_type_usage(ClusterInfo *cluster)
207+
{
208+
int dbnum;
209+
FILE *script = NULL;
210+
bool found = false;
211+
char output_path[MAXPGPATH];
212+
213+
prep_status("Checking for invalid \"unknown\" user columns");
214+
215+
snprintf(output_path, sizeof(output_path), "tables_using_unknown.txt");
216+
217+
for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
218+
{
219+
PGresult *res;
220+
bool db_used = false;
221+
int ntups;
222+
int rowno;
223+
int i_nspname,
224+
i_relname,
225+
i_attname;
226+
DbInfo *active_db = &cluster->dbarr.dbs[dbnum];
227+
PGconn *conn = connectToServer(cluster, active_db->db_name);
228+
229+
res = executeQueryOrDie(conn,
230+
"SELECT n.nspname, c.relname, a.attname "
231+
"FROM pg_catalog.pg_class c, "
232+
" pg_catalog.pg_namespace n, "
233+
" pg_catalog.pg_attribute a "
234+
"WHERE c.oid = a.attrelid AND "
235+
" NOT a.attisdropped AND "
236+
" a.atttypid = 'pg_catalog.unknown'::pg_catalog.regtype AND "
237+
" c.relkind IN ('r', 'c', 'm') AND "
238+
" c.relnamespace = n.oid AND "
239+
/* exclude possible orphaned temp tables */
240+
" n.nspname !~ '^pg_temp_' AND "
241+
" n.nspname !~ '^pg_toast_temp_' AND "
242+
" n.nspname NOT IN ('pg_catalog', 'information_schema')");
243+
244+
ntups = PQntuples(res);
245+
i_nspname = PQfnumber(res, "nspname");
246+
i_relname = PQfnumber(res, "relname");
247+
i_attname = PQfnumber(res, "attname");
248+
for (rowno = 0; rowno < ntups; rowno++)
249+
{
250+
found = true;
251+
if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
252+
pg_fatal("could not open file \"%s\": %s\n", output_path,
253+
strerror(errno));
254+
if (!db_used)
255+
{
256+
fprintf(script, "Database: %s\n", active_db->db_name);
257+
db_used = true;
258+
}
259+
fprintf(script, " %s.%s.%s\n",
260+
PQgetvalue(res, rowno, i_nspname),
261+
PQgetvalue(res, rowno, i_relname),
262+
PQgetvalue(res, rowno, i_attname));
263+
}
264+
265+
PQclear(res);
266+
267+
PQfinish(conn);
268+
}
269+
270+
if (script)
271+
fclose(script);
272+
273+
if (found)
274+
{
275+
pg_log(PG_REPORT, "fatal\n");
276+
pg_fatal("Your installation contains the \"unknown\" data type in user tables. This\n"
277+
"data type is no longer allowed in tables, so this cluster cannot currently\n"
278+
"be upgraded. You can remove the problem tables and restart the upgrade.\n"
279+
"A list of the problem columns is in the file:\n"
280+
" %s\n\n", output_path);
281+
}
282+
else
283+
check_ok();
284+
}

0 commit comments

Comments
 (0)