Age | Commit message (Collapse) | Author |
|
As pointed out by Tom Lane, the patch introduced fragile and invasive
design around plan invalidation handling when locking of prunable
partitions was deferred from plancache.c to the executor. In
particular, it violated assumptions about CachedPlan immutability and
altered executor APIs in ways that are difficult to justify given the
added complexity and overhead.
This also removes the firstResultRels field added to PlannedStmt in
commit 28317de72, which was intended to support deferred locking of
certain ModifyTable result relations.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/605328.1747710381@sss.pgh.pa.us
|
|
Previously, "COPY table TO" command worked only with plain tables and
did not support materialized views, even when they were populated and
had physical storage. To copy rows from materialized views,
"COPY (query) TO" command had to be used, instead.
This commit extends "COPY table TO" to support populated materialized
views directly, improving usability and performance, as "COPY table TO"
is generally faster than "COPY (query) TO". Note that copying from
unpopulated materialized views will still result in an error.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Reviewed-by: Vignesh C <vignesh21@gmail.com>
Reviewed-by: Fujii Masao <masao.fujii@gmail.com>
Discussion: https://postgr.es/m/CACJufxHVxnyRYy67hiPePNCPwVBMzhTQ6FaL9_Te5On9udG=yg@mail.gmail.com
|
|
The change improves efficiency by eliminating unnecessary copying of
CopyFormatOptions.
The coverity also complained about inefficiencies caused by
pass-by-value.
Oversight in 7717f6300 and 2e4127b6d.
Reported-by: Junwang Zhao <zhjwpku@gmail.com>
Reported-by: Tom Lane <tgl@sss.pgh.pa.us> (per reports from coverity)
Author: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAEG8a3L6YCpPksTQMzjD_CvwDEhW3D_t=5md9BvvdOs5k+TA=Q@mail.gmail.com
|
|
This commit introduces a new CopyToRoutine struct, which is a set of
callback routines to copy tuples in a specific format. It also makes
the existing formats (text, CSV, and binary) utilize these format
callbacks.
This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.
Additionally, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.
Author: Sutou Kouhei <kou@clear-code.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
|
|
Before executing a cached generic plan, AcquireExecutorLocks() in
plancache.c locks all relations in a plan's range table to ensure the
plan is safe for execution. However, this locks runtime-prunable
relations that will later be pruned during "initial" runtime pruning,
introducing unnecessary overhead.
This commit defers locking for such relations to executor startup and
ensures that if the CachedPlan is invalidated due to concurrent DDL
during this window, replanning is triggered. Deferring these locks
avoids unnecessary locking overhead for pruned partitions, resulting
in significant speedup, particularly when many partitions are pruned
during initial runtime pruning.
* Changes to locking when executing generic plans:
AcquireExecutorLocks() now locks only unprunable relations, that is,
those found in PlannedStmt.unprunableRelids (introduced in commit
cbc127917e), to avoid locking runtime-prunable partitions
unnecessarily. The remaining locks are taken by
ExecDoInitialPruning(), which acquires them only for partitions that
survive pruning.
This deferral does not affect the locks required for permission
checking in InitPlan(), which takes place before initial pruning.
ExecCheckPermissions() now includes an Assert to verify that all
relations undergoing permission checks, none of which can be in the
set of runtime-prunable relations, are properly locked.
* Plan invalidation handling:
Deferring locks introduces a window where prunable relations may be
altered by concurrent DDL, invalidating the plan. A new function,
ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle
invalidation caused by deferred locking. If invalidation occurs,
ExecutorStartCachedPlan() updates CachedPlan using the new
UpdateCachedPlan() function and retries execution with the updated
plan. To ensure all code paths that may be affected by this handle
invalidation properly, all callers of ExecutorStart that may execute a
PlannedStmt from a CachedPlan have been updated to use
ExecutorStartCachedPlan() instead.
UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new
CachedPlan.stmt_context, created as a child of CachedPlan.context,
allows freeing old PlannedStmts while preserving the CachedPlan
structure and its statement list. This ensures that loops over
statements in upstream callers of ExecutorStartCachedPlan() remain
intact.
ExecutorStart() and ExecutorStart_hook implementations now return a
boolean value indicating whether plan initialization succeeded with a
valid PlanState tree in QueryDesc.planstate, or false otherwise, in
which case QueryDesc.planstate is NULL. Hook implementations are
required to call standard_ExecutorStart() at the beginning, and if it
returns false, they should do the same without proceeding.
* Testing:
To verify these changes, the delay_execution module tests scenarios
where cached plans become invalid due to changes in prunable relations
after deferred locks.
* Note to extension authors:
ExecutorStart_hook implementations must verify plan validity after
calling standard_ExecutorStart(), as explained earlier. For example:
if (prev_ExecutorStart)
plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (!plan_valid)
return false;
<extension-code>
return true;
Extensions accessing child relations, especially prunable partitions,
via ExecGetRangeTableRelation() must now ensure their RT indexes are
present in es_unpruned_relids (introduced in commit cbc127917e), or
they will encounter an error. This is a strict requirement after this
change, as only relations in that set are locked.
The idea of deferring some locks to executor startup, allowing locks
for prunable partitions to be skipped, was first proposed by Tom Lane.
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions)
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
|
|
Backpatch-through: 13
|
|
Our parallel-mode code only works when we are executing a query
in full, so ExecutePlan must disable parallel mode when it is
asked to do partial execution. The previous logic for this
involved passing down a flag (variously named execute_once or
run_once) from callers of ExecutorRun or PortalRun. This is
overcomplicated, and unsurprisingly some of the callers didn't
get it right, since it requires keeping state that not all of
them have handy; not to mention that the requirements for it were
undocumented. That led to assertion failures in some corner
cases. The only state we really need for this is the existing
QueryDesc.already_executed flag, so let's just put all the
responsibility in ExecutePlan. (It could have been done in
ExecutorRun too, leading to a slightly shorter patch -- but if
there's ever more than one caller of ExecutePlan, it seems better
to have this logic in the subroutine than the callers.)
This makes those ExecutorRun/PortalRun parameters unnecessary.
In master it seems okay to just remove them, returning the
API for those functions to what it was before parallelism.
Such an API break is clearly not okay in stable branches,
but for them we can just leave the parameters in place after
documenting that they do nothing.
Per report from Yugo Nagata, who also reviewed and tested
this patch. Back-patch to all supported branches.
Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp
|
|
If the query is rewritten into a NOTIFY command by a DO INSTEAD
rule, we'd get an assertion failure, or in non-assert builds
issue a rather confusing error message. Improve that.
Also fix a longstanding grammar mistake in a nearby error message.
Per bug #18664 from Alexander Lakhin. Back-patch to all supported
branches.
Tender Wang and Tom Lane
Discussion: https://postgr.es/m/18664-ffd0ebc2386598df@postgresql.org
|
|
Since backslash is (typically) not special in CSV data, we should
not be treating \. as special either. The server historically did
this to keep CSV and TEXT modes more alike and to support V2 protocol;
but V2 protocol is long dead, and the inconsistency with CSV standards
is annoying. Remove that behavior in CopyReadLineText, and make some
minor consequent code simplifications.
On the client side, we need to fix psql so that it does not check
for \. except when reading data from STDIN (that is, the script
source). We must do that regardless of TEXT/CSV mode or there is
no way to end the COPY short of script EOF. Also, be careful
not to send the \. to the server in that case.
This is a small compatibility break in that other applications
beside psql may need similar adjustment. Also, using an older
version of psql with a v18 server may result in misbehavior
during CSV-mode COPY IN.
Daniel Vérité, reviewed by vignesh C, Robert Haas, and myself
Discussion: https://postgr.es/m/ed659f37-a9dd-42a7-82b9-0da562cc4006@manitou-mail.org
|
|
The handling of binary and text formats are quite different here, so
it's more clear to check for the format first and have two separate
loops.
Author: jian he <jian.universality@gmail.com>
Reviewed-by: Ilia Evdokimov, Junwang Zhao
Discussion: https://www.postgresql.org/message-id/CACJufxFzHCeFBQF0M%2BSgk_NwknWuQ4oU7tS1isVeBrbhcKOHkg@mail.gmail.com
|
|
I also took the liberty of changing
errmsg("COPY DEFAULT only available using COPY FROM")
to
errmsg("COPY %s cannot be used with %s", "DEFAULT", "COPY TO")
because the original wording is unlike all other messages that indicate
option incompatibility. This message was added by commit 9f8377f7a279
(16-era), in whose development thread there was no discussion on this
point.
Backpatch to 17.
|
|
This allows a RETURNING clause to be appended to a MERGE query, to
return values based on each row inserted, updated, or deleted. As with
plain INSERT, UPDATE, and DELETE commands, the returned values are
based on the new contents of the target table for INSERT and UPDATE
actions, and on its old contents for DELETE actions. Values from the
source relation may also be returned.
As with INSERT/UPDATE/DELETE, the output of MERGE ... RETURNING may be
used as the source relation for other operations such as WITH queries
and COPY commands.
Additionally, a special function merge_action() is provided, which
returns 'INSERT', 'UPDATE', or 'DELETE', depending on the action
executed for each row. The merge_action() function can be used
anywhere in the RETURNING list, including in arbitrary expressions and
subqueries, but it is an error to use it anywhere outside of a MERGE
query's RETURNING list.
Dean Rasheed, reviewed by Isaac Morland, Vik Fearing, Alvaro Herrera,
Gurjeet Singh, Jian He, Jeff Davis, Merlin Moncure, Peter Eisentraut,
and Wolfgang Walther.
Discussion: http://postgr.es/m/CAEZATCWePEGQR5LBn-vD6SfeLZafzEm2Qy_L_Oky2=qw2w3Pzg@mail.gmail.com
|
|
as determined by include-what-you-use (IWYU)
While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that. In some cases, a more
specific #include replaces another less specific one.
Some manual adjustments of the automatic result:
- IWYU currently doesn't know about includes that provide global
variable declarations (like -Wmissing-variable-declarations), so
those includes are being kept manually.
- All includes for port(ability) headers are being kept for now, to
play it safe.
- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
patch from exploding in size.
Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.
As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.
Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
|
|
Issue introduced by b619852086ed.
Per buildfarm member koel.
|
|
This commit fixes an oversight introduced in c61a2f58418e, where COPY TO
would attempt to do encoding conversions even if the encodings of the
client and the server matched for multi-byte encodings. All conversions
go through pg_any_to_server() that makes the conversion a no-op when the
encodings of the client and the server match, even for multi-byte
encodings.
The logic was fine, but setting CopyToStateData->need_transcoding would
cause strlen() to be called for nothing for each attribute of all the
rows copied, and that was showing high in some profiles (more attributes
make that easier to reach). This change improves the runtime of some
worst-case COPY TO queries by 15%~ (number present at least here).
This is a performance improvement, so no backpatch is done out of
caution as this is not a regression.
Reported-by: Andres Freund
Analyzed-by: Andres Freund
Author: Michael Paquier
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/20240206020504.edijzczkgd25ek6z@awork3.anarazel.de
|
|
This has come up in 2889fd23be56, reverted later on, and is still useful
on its own to reduce a bit the differences between the code paths
dedicated to CSV and text.
Discussion: https://postgr.es/m/ZcCKwAeFrlOqPBuN@paquier.xyz
|
|
This reverts commit 2889fd23be56, following a discussion with Andres
Freund as this callback, being called once per attribute when sending a
relation's row, can involve a lot of indirect function calls (more
attributes to deal with means more impact). The effects of a dispatch
at this level would become more visible when improving the per-row code
execution of COPY TO, impacting future potential performance
improvements.
Discussion: https://postgr.es/m/20240206014125.qofww7ew3dx3v3uk@awork3.anarazel.de
|
|
These routines are used by the text and CSV formats to send an output
representation of a string, applying quotes if required. This is
similar to 95fb5b49024a, reducing the number of "if" branches that need
to be checked on a per-row basis when sending representation of fields
in text or CSV mode.
While on it, this simplifies the signature of CopyAttributeOutCSV() as
it is possible to know that an attribute is alone on a line thanks to
CopyToState. Headers should not use quotes, even if forced at query
level.
Extracted from a larger patch by the same author.
Author: Sutou Kouhei
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
|
|
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz
Backpatch-through: 12
|
|
Since C99, there can be a trailing comma after the last value in an
enum definition. A lot of new code has been introducing this style on
the fly. Some new patches are now taking an inconsistent approach to
this. Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one. We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.
I omitted a few places where there was a fixed "last" value that will
always stay last. I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers. There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.
Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
|
|
These options already exist, but you need to specify a column list for
them, which can be cumbersome. We already have the possibility of all
columns for FORCE QUOTE, so this is simply extending that facility to
FORCE_NULL and FORCE_NOT_NULL.
Author: Zhang Mingli
Reviewed-By: Richard Guo, Kyatoro Horiguchi, Michael Paquier.
Discussion: https://postgr.es/m/CACJufxEnVqzOFtqhexF2+AwOKFrV8zHOY3y=p+gPK6eB14pn_w@mail.gmail.com
|
|
This commit introduces descriptively-named macros for the
identifiers used in wire protocol messages. These new macros are
placed in a new header file so that they can be easily used by
third-party code.
Author: Dave Cramer
Reviewed-by: Alvaro Herrera, Tatsuo Ishii, Peter Smith, Robert Haas, Tom Lane, Peter Eisentraut, Michael Paquier
Discussion: https://postgr.es/m/CADK3HHKbBmK-PKf1bPNFoMC%2BoBt%2BpD9PH8h5nvmBQskEHm-Ehw%40mail.gmail.com
|
|
In ancient times, these belonged to arguments or fields that were
actually of type long, but now they are not anymore, so this "L"
decoration is just confusing. (Some other 0L and other "L" constants
remain, where they are actually associated with a long type.)
|
|
The COPY documentation is quite clear that "COPY relation TO" copies
rows from only the named table, not any inheritance children it may
have. However, if you enabled row-level security on the table then
this stopped being true, because the code forgot to apply the ONLY
modifier in the "SELECT ... FROM relation" query that it constructs
in order to allow RLS predicates to be attached. Fix that.
Report and patch by Antonin Houska (comment adjustments and test case
by me). Back-patch to all supported branches.
Discussion: https://postgr.es/m/3472.1675251957@antos
|
|
Backpatch-through: 11
|
|
These two options are only available with COPY FROM, so the extra logic
in charge of checking the validity of the attributes given has no
purpose.
Author: Zhang Mingli
Reviewed-by: Richard Guo, Kyotaro Horiguchi
Discussion: https://postgr.es/m/F28F0B5A-766F-4D33-BF44-43B3A052D833@gmail.com
|
|
This is useful as a way for extensions to process COPY TO rows in the
way they see fit (say auditing, analytics, backend, etc.) without the
need to invoke an external process running as the OS user running the
backend through PROGRAM that requires superuser rights. COPY FROM
already provides a similar callback for logical replication. For COPY
TO, the callback is triggered when we are ready to send a row in
CopySendEndOfRow(), which is the same code path as when sending a row
to a frontend or a pipe/file.
A small test module, test_copy_callbacks, is added to provide some
coverage for this facility.
Author: Bilva Sanaba, Nathan Bossart
Discussion: https://postgr.es/m/253C21D1-FCEB-41D9-A2AF-E6517015B7D7@amazon.com
|
|
Run pgindent, pgperltidy, and reformat-dat-files.
I manually fixed a couple of comments that pgindent uglified.
|
|
These are useless and distracting. We wouldn't have written the code
with them to begin with, so there's no reason to keep them.
Author: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch
|
|
There are three parallel ways to call parse/analyze: with fixed
parameters, with variable parameters, and by supplying your own parser
callback. Some of the involved functions were confusingly named and
made this API structure more confusing. This patch renames some
functions to make this clearer:
parse_analyze() -> parse_analyze_fixedparams()
pg_analyze_and_rewrite() -> pg_analyze_and_rewrite_fixedparams()
(Otherwise one might think this variant doesn't accept parameters, but
in fact all three ways accept parameters.)
pg_analyze_and_rewrite_params() -> pg_analyze_and_rewrite_withcb()
(Before, and also when considering pg_analyze_and_rewrite(), one might
think this is the only way to pass parameters. Moreover, the parser
callback doesn't necessarily need to parse only parameters, it's just
one of the things it could do.)
parse_fixed_parameters() -> setup_parse_fixed_parameters()
parse_variable_parameters() -> setup_parse_variable_parameters()
(These functions don't actually do any parsing, they just set up
callbacks to use during parsing later.)
This patch also adds some const decorations to the fixed-parameters
API, so the distinction from the variable-parameters API is more
clear.
Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://www.postgresql.org/message-id/flat/c67ce276-52b4-0239-dc0e-39875bf81840@enterprisedb.com
|
|
c532d15 has split the logic of COPY commands into multiple files, one
change being to move the internals of BeginCopy() to BeginCopyTo().
Originally the code was written so as we'd switch back-and-forth between
the current execution memory context and the dedicated memory context
for the COPY command, and this refactoring has introduced an extra
switch to the current memory context from the COPY context once
BeginCopyTo() is done with the past logic coming from BeginCopy().
The code was correctly doing the analyze, rewrite and planning phases in
the COPY context, but it was not assigning "copy_file" (FILE* used when
copying to a source file) and "filename" in the COPY context, making the
COPY status data inconsistent.
Author: Bharath Rupireddy
Reviewed-by: Japin Li
Discussion: https://postgr.es/m/CALj2ACWvVa69foi9jhHFY=2BuHxAoYboyE+vXQTARwxZcJnVrQ@mail.gmail.com
Backpatch-through: 14
|
|
The COPY CSV format supports the HEADER option to output a header
line. This patch adds the same option to the default text format. On
input, the HEADER option causes the first line to be skipped, same as
with CSV.
Author: Rémi Lapeyre <remi.lapeyre@lenstra.fr>
Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
|
|
|
|
Backpatch-through: 10
|
|
In the "simple Query" code path, it's fine for parse analysis or
execution of a utility statement to scribble on the statement's node
tree, since that'll just be thrown away afterwards. However it's
not fine if the node tree is in the plan cache, as then it'd be
corrupted for subsequent executions. Up to now we've dealt with
that by having individual utility-statement functions apply
copyObject() if they were going to modify the tree. But that's
prone to errors of omission. Bug #17053 from Charles Samborski
shows that CREATE/ALTER DOMAIN didn't get this memo, and can
crash if executed repeatedly from plan cache.
In the back branches, we'll just apply a narrow band-aid for that,
but in HEAD it seems prudent to have a more principled fix that
will close off the possibility of other similar bugs in future.
Hence, let's hoist the responsibility for doing copyObject up into
ProcessUtility from its children, thus ensuring that it happens for
all utility statement types.
Also, modify ProcessUtility's API so that its callers can tell it
whether a copy step is necessary. It turns out that in all cases,
the immediate caller knows whether the node tree is transient, so
this doesn't involve a huge amount of code thrashing. In this way,
while we lose a little bit in the execute-from-cache code path due
to sometimes copying node trees that wouldn't be mutated anyway,
we gain something in the simple-Query code path by not copying
throwaway node trees. Statements that are complex enough to be
expensive to copy are almost certainly ones that would have to be
copied anyway, so the loss in the cache code path shouldn't be much.
(Note that this whole problem applies only to utility statements.
Optimizable statements don't have the issue because we long ago made
the executor treat Plan trees as read-only. Perhaps someday we will
make utility statement execution act likewise, but I'm not holding
my breath.)
Discussion: https://postgr.es/m/931771.1623893989@sss.pgh.pa.us
Discussion: https://postgr.es/m/17053-3ca3f501bbc212b4@postgresql.org
|
|
Also "make reformat-dat-files".
The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
|
|
|
|
The command (TO or FROM), its type (file, pipe, program or callback),
and the number of tuples excluded by a WHERE clause in COPY FROM are
added to the progress reporting already available.
The column "lines_processed" is renamed to "tuples_processed" to
disambiguate the meaning of this column in the cases of CSV and BINARY
COPY and to be more consistent with the other catalog progress views.
Bump catalog version, again.
Author: Matthias van de Meent
Reviewed-by: Michael Paquier, Justin Pryzby, Bharath Rupireddy, Josef
Šimánek, Tomas Vondra
Discussion: https://postgr.es/m/CAEze2WiOcgdH4aQA8NtZq-4dgvnJzp8PohdeKchPkhMY-jWZXA@mail.gmail.com
|
|
Protocol version 3 was introduced in PostgreSQL 7.4. There shouldn't be
many clients or servers left out there without version 3 support. But as
a courtesy, I kept just enough of the old protocol support that we can
still send the "unsupported protocol version" error in v2 format, so that
old clients can display the message properly. Likewise, libpq still
understands v2 ErrorResponse messages when establishing a connection.
The impetus to do this now is that I'm working on a patch to COPY
FROM, to always prefetch some data. We cannot do that safely with the
old protocol, because it requires parsing the input one byte at a time
to detect the end-of-copy marker.
Reviewed-by: Tom Lane, Alvaro Herrera, John Naylor
Discussion: https://www.postgresql.org/message-id/9ec25819-0a8a-d51a-17dc-4150bb3cca3b%40iki.fi
|
|
This commit introduces a view pg_stat_progress_copy, reporting progress
of COPY commands. This allows rough estimates how far a running COPY
progressed, with the caveat that the total number of bytes may not be
available in some cases (e.g. when the input comes from the client).
Author: Josef Šimánek
Reviewed-by: Fujii Masao, Bharath Rupireddy, Vignesh C, Matthias van de Meent
Discussion: https://postgr.es/m/CAFp7QwqMGEi4OyyaLEK9DR0+E+oK3UtA4bEjDVCa4bNkwUY2PQ@mail.gmail.com
Discussion: https://postgr.es/m/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg+FEKet4XaTGSW=LitKQ@mail.gmail.com
|
|
Backpatch-through: 9.5
|
|
Copy.c has grown really large. Split it into more manageable parts:
- copy.c now contains only a few functions that are common to COPY FROM
and COPY TO.
- copyto.c contains code for COPY TO.
- copyfrom.c contains code for initializing COPY FROM, and inserting the
tuples to the correct table.
- copyfromparse.c contains code for reading from the client/file/program,
and parsing the input text/CSV/binary format into tuples.
All of these parts are fairly complicated, and fairly independent of each
other. There is a patch being discussed to implement parallel COPY FROM,
which will add a lot of new code to the COPY FROM path, and another patch
which would allow INSERTs to use the same multi-insert machinery as COPY
FROM, both of which will require refactoring that code. With those two
patches, there's going to be a lot of code churn in copy.c anyway, so now
seems like a good time to do this refactoring.
The CopyStateData struct is also split. All the formatting options, like
FORMAT, QUOTE, ESCAPE, are put in a new CopyFormatOption struct, which
is used by both COPY FROM and TO. Other state data are kept in separate
CopyFromStateData and CopyToStateData structs.
Reviewed-by: Soumyadeep Chakraborty, Erik Rijkers, Vignesh C, Andres Freund
Discussion: https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
|