Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 0a3ab45

Browse files
jianhe-funCommitfest Bot
authored and
Commitfest Bot
committed
COPY (on_error set_null)
Extent "on_error action", introduce new option: on_error set_null. Current grammar makes us unable to use "on_error null". if we did it, then in all the COPY command options's value, null will become reserved to non-reserved words. so we choose "on_error set_null". Any data type conversion errors during the COPY FROM process will result in the affected column being set to NULL. This only applies when using the non-binary format for COPY FROM. However, the not-null constraint will still be enforced. If a column has a not-null constraint, successful (on_error set_null) action will cause not-null constraint violation. This also applies to column type is domain with not-null constraint. A regression test for a domain with a not-null constraint has been added. Author: Jian He <jian.universality@gmail.com> Author: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Jim Jones <jim.jones@uni-muenster.de> "David G. Johnston" <david.g.johnston@gmail.com> Yugo NAGATA <nagata@sraoss.co.jp> torikoshia <torikoshia@oss.nttdata.com> Masahiko Sawada <sawada.mshk@gmail.com> discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=BP3d1_aSfe_+Q@mail.gmail.com
1 parent 470273d commit 0a3ab45

File tree

9 files changed

+277
-52
lines changed

9 files changed

+277
-52
lines changed

doc/src/sgml/ref/copy.sgml

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -394,23 +394,36 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
394394
Specifies how to behave when encountering an error converting a column's
395395
input value into its data type.
396396
An <replaceable class="parameter">error_action</replaceable> value of
397-
<literal>stop</literal> means fail the command, while
398-
<literal>ignore</literal> means discard the input row and continue with the next one.
397+
<literal>stop</literal> means fail the command,
398+
<literal>ignore</literal> means discard the input row and continue with the next one,
399+
and <literal>set_null</literal> means replace columns containing invalid
400+
input values with <literal>NULL</literal> and move to the next field.
399401
The default is <literal>stop</literal>.
400402
</para>
401403
<para>
402-
The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
404+
The <literal>ignore</literal> and <literal>set_null</literal>
405+
options are applicable only for <command>COPY FROM</command>
403406
when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
404407
</para>
408+
<para>
409+
For <literal>ignore</literal> option, a <literal>NOTICE</literal> message
410+
containing the ignored row count is emitted at the end of the <command>COPY
411+
FROM</command> if at least one row was discarded.
412+
For <literal>set_null</literal> option,
413+
a <literal>NOTICE</literal> message indicating the number of rows
414+
where invalid input values were replaced with null is emitted
415+
at the end of the <command>COPY FROM</command> if at least one row was replaced.
416+
</para>
405417
<para>
406-
A <literal>NOTICE</literal> message containing the ignored row count is
407-
emitted at the end of the <command>COPY FROM</command> if at least one
408-
row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
409-
<literal>verbose</literal>, a <literal>NOTICE</literal> message
418+
When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
419+
for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
410420
containing the line of the input file and the column name whose input
411-
conversion has failed is emitted for each discarded row.
412-
When it is set to <literal>silent</literal>, no message is emitted
413-
regarding ignored rows.
421+
conversion has failed is emitted for each discarded row;
422+
for <literal>set_null</literal> option, a <literal>NOTICE</literal>
423+
message containing the line of the input file and the column name where
424+
value was replaced with <literal>NULL</literal> for each input conversion
425+
failure.
426+
When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
414427
</para>
415428
</listitem>
416429
</varlistentry>
@@ -458,7 +471,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
458471
</para>
459472
<para>
460473
This is currently used in <command>COPY FROM</command> command when
461-
<literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
474+
<literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
475+
or <literal>set_null</literal>.
462476
</para>
463477
</listitem>
464478
</varlistentry>

src/backend/commands/copy.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
403403
parser_errposition(pstate, def->location)));
404404

405405
/*
406-
* Allow "stop", or "ignore" values.
406+
* Allow "stop", "ignore", "set_null" values.
407407
*/
408408
if (pg_strcasecmp(sval, "stop") == 0)
409409
return COPY_ON_ERROR_STOP;
410410
if (pg_strcasecmp(sval, "ignore") == 0)
411411
return COPY_ON_ERROR_IGNORE;
412+
if (pg_strcasecmp(sval, "set_null") == 0)
413+
return COPY_ON_ERROR_SET_NULL;
412414

413415
ereport(ERROR,
414416
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
918920
(errcode(ERRCODE_SYNTAX_ERROR),
919921
errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
920922

921-
if (opts_out->reject_limit && !opts_out->on_error)
923+
if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
922924
ereport(ERROR,
923925
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
924926
/*- translator: first and second %s are the names of COPY option, e.g.

src/backend/commands/copyfrom.c

Lines changed: 32 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1467,14 +1467,22 @@ CopyFrom(CopyFromState cstate)
14671467
/* Done, clean up */
14681468
error_context_stack = errcallback.previous;
14691469

1470-
if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
1471-
cstate->num_errors > 0 &&
1470+
if (cstate->num_errors > 0 &&
14721471
cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
1473-
ereport(NOTICE,
1474-
errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
1475-
"%" PRIu64 " rows were skipped due to data type incompatibility",
1476-
cstate->num_errors,
1477-
cstate->num_errors));
1472+
{
1473+
if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
1474+
ereport(NOTICE,
1475+
errmsg_plural("%" PRIu64 " row was skipped due to data type incompatibility",
1476+
"%" PRIu64 " rows were skipped due to data type incompatibility",
1477+
cstate->num_errors,
1478+
cstate->num_errors));
1479+
else if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
1480+
ereport(NOTICE,
1481+
errmsg_plural("invalid values in %" PRIu64 " row was replaced with null due to data type incompatibility",
1482+
"invalid values in %" PRIu64 " rows were replaced with null due to data type incompatibility",
1483+
cstate->num_errors,
1484+
cstate->num_errors));
1485+
}
14781486

14791487
if (bistate != NULL)
14801488
FreeBulkInsertState(bistate);
@@ -1614,6 +1622,19 @@ BeginCopyFrom(ParseState *pstate,
16141622
}
16151623
}
16161624

1625+
if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
1626+
{
1627+
int attr_count = list_length(cstate->attnumlist);
1628+
1629+
cstate->domain_with_constraint = (bool *) palloc0(attr_count * sizeof(bool));
1630+
foreach_int(attno, cstate->attnumlist)
1631+
{
1632+
int i = foreach_current_index(attno);
1633+
Form_pg_attribute att = TupleDescAttr(tupDesc, attno - 1);
1634+
cstate->domain_with_constraint[i] = DomainHasConstraints(att->atttypid);
1635+
}
1636+
}
1637+
16171638
/* Set up soft error handler for ON_ERROR */
16181639
if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
16191640
{
@@ -1622,10 +1643,11 @@ BeginCopyFrom(ParseState *pstate,
16221643
cstate->escontext->error_occurred = false;
16231644

16241645
/*
1625-
* Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
1626-
* options later
1646+
* Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_SET_NULL.
1647+
* We'll add other options later
16271648
*/
1628-
if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
1649+
if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
1650+
cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
16291651
cstate->escontext->details_wanted = false;
16301652
}
16311653
else

src/backend/commands/copyfromparse.c

Lines changed: 102 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -947,6 +947,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
947947
int fldct;
948948
int fieldno;
949949
char *string;
950+
bool current_row_erroneous = false;
950951

951952
tupDesc = RelationGetDescr(cstate->rel);
952953
attr_count = list_length(cstate->attnumlist);
@@ -1024,7 +1025,8 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
10241025
}
10251026

10261027
/*
1027-
* If ON_ERROR is specified with IGNORE, skip rows with soft errors
1028+
* If ON_ERROR is specified with IGNORE, skip rows with soft errors.
1029+
* If ON_ERROR is specified with set_null, try to replace with null.
10281030
*/
10291031
else if (!InputFunctionCallSafe(&in_functions[m],
10301032
string,
@@ -1035,47 +1037,119 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
10351037
{
10361038
Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
10371039

1038-
cstate->num_errors++;
1039-
1040-
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
1040+
if (cstate->opts.on_error == COPY_ON_ERROR_SET_NULL)
10411041
{
10421042
/*
1043-
* Since we emit line number and column info in the below
1044-
* notice message, we suppress error context information other
1045-
* than the relation name.
1046-
*/
1047-
Assert(!cstate->relname_only);
1048-
cstate->relname_only = true;
1043+
* we use it to count number of rows (not fields!) that
1044+
* successfully applied on_error set_null.
1045+
*/
1046+
if (!current_row_erroneous)
1047+
current_row_erroneous = true;
1048+
1049+
cstate->escontext->error_occurred = false;
1050+
Assert(cstate->domain_with_constraint != NULL);
10491051

1050-
if (cstate->cur_attval)
1052+
/*
1053+
* when column type is domain with constraints, we may
1054+
* need another InputFunctionCallSafe to error out domain
1055+
* constraint violation.
1056+
*/
1057+
if (!cstate->domain_with_constraint[m] ||
1058+
InputFunctionCallSafe(&in_functions[m],
1059+
NULL,
1060+
typioparams[m],
1061+
att->atttypmod,
1062+
(Node *) cstate->escontext,
1063+
&values[m]))
10511064
{
1052-
char *attval;
1053-
1054-
attval = CopyLimitPrintoutLength(cstate->cur_attval);
1055-
ereport(NOTICE,
1056-
errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
1057-
cstate->cur_lineno,
1058-
cstate->cur_attname,
1059-
attval));
1060-
pfree(attval);
1065+
nulls[m] = true;
1066+
values[m] = (Datum) 0;
1067+
1068+
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
1069+
{
1070+
char *attval;
1071+
1072+
/*
1073+
* Since we emit line number and column info in the below
1074+
* notice message, we suppress error context information other
1075+
* than the relation name.
1076+
*/
1077+
Assert(!cstate->relname_only);
1078+
Assert(cstate->cur_attval);
1079+
1080+
cstate->relname_only = true;
1081+
attval = CopyLimitPrintoutLength(cstate->cur_attval);
1082+
ereport(NOTICE,
1083+
errmsg("setting to null due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
1084+
cstate->cur_lineno,
1085+
cstate->cur_attname,
1086+
attval));
1087+
pfree(attval);
1088+
1089+
/* reset relname_only */
1090+
cstate->relname_only = false;
1091+
}
1092+
1093+
cstate->cur_attname = NULL;
1094+
continue;
10611095
}
1096+
else if (string == NULL)
1097+
ereport(ERROR,
1098+
errcode(ERRCODE_NOT_NULL_VIOLATION),
1099+
errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
1100+
errdatatype(typioparams[m]));
10621101
else
1063-
ereport(NOTICE,
1064-
errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
1065-
cstate->cur_lineno,
1066-
cstate->cur_attname));
1067-
1068-
/* reset relname_only */
1069-
cstate->relname_only = false;
1102+
ereport(ERROR,
1103+
errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
1104+
errmsg("invalid input value for domain %s: \"%s\"",
1105+
format_type_be(typioparams[m]), string));
10701106
}
1107+
else if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
1108+
{
1109+
cstate->num_errors++;
10711110

1072-
return true;
1111+
if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
1112+
{
1113+
/*
1114+
* Since we emit line number and column info in the below
1115+
* notice message, we suppress error context information other
1116+
* than the relation name.
1117+
*/
1118+
Assert(!cstate->relname_only);
1119+
cstate->relname_only = true;
1120+
1121+
if (cstate->cur_attval)
1122+
{
1123+
char *attval;
1124+
1125+
attval = CopyLimitPrintoutLength(cstate->cur_attval);
1126+
ereport(NOTICE,
1127+
errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"",
1128+
cstate->cur_lineno,
1129+
cstate->cur_attname,
1130+
attval));
1131+
pfree(attval);
1132+
}
1133+
else
1134+
ereport(NOTICE,
1135+
errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input",
1136+
cstate->cur_lineno,
1137+
cstate->cur_attname));
1138+
1139+
/* reset relname_only */
1140+
cstate->relname_only = false;
1141+
}
1142+
return true;
1143+
}
10731144
}
10741145

10751146
cstate->cur_attname = NULL;
10761147
cstate->cur_attval = NULL;
10771148
}
10781149

1150+
if (current_row_erroneous)
1151+
cstate->num_errors++;
1152+
10791153
Assert(fieldno == attr_count);
10801154

10811155
return true;

src/bin/psql/tab-complete.in.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3297,7 +3297,7 @@ match_previous_words(int pattern_id,
32973297

32983298
/* Complete COPY <sth> FROM filename WITH (ON_ERROR */
32993299
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "ON_ERROR"))
3300-
COMPLETE_WITH("stop", "ignore");
3300+
COMPLETE_WITH("stop", "ignore", "set_null");
33013301

33023302
/* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
33033303
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY"))

src/include/commands/copy.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
3838
{
3939
COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */
4040
COPY_ON_ERROR_IGNORE, /* ignore errors */
41+
COPY_ON_ERROR_SET_NULL, /* set error field to null */
4142
} CopyOnErrorChoice;
4243

4344
/*

src/include/commands/copyfrom_internal.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,12 @@ typedef struct CopyFromStateData
108108
* att */
109109
bool *defaults; /* if DEFAULT marker was found for
110110
* corresponding att */
111+
/*
112+
* Set to true if the corresponding att data type is domain with constraint.
113+
* normally this field is NULL, except when on_error is specified as SET_NULL.
114+
*/
115+
bool *domain_with_constraint;
116+
111117
bool volatile_defexprs; /* is any of defexprs volatile? */
112118
List *range_table; /* single element list of RangeTblEntry */
113119
List *rteperminfos; /* single element list of RTEPermissionInfo */

0 commit comments

Comments
 (0)