Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 9d5f071

Browse files
committed
Make PostgresNode.pm check server status more carefully.
PostgresNode blithely ignored the exit status of pg_ctl, and in general made no effort to be sure that the server was running when it should be. This caused it to miss server crashes, which is a serious shortcoming in a test scaffold. Make it complain if pg_ctl fails, and modify the start and stop logic to complain if the server doesn't start, or doesn't stop, when expected. Also, have it turn off the "restart_after_crash" configuration parameter in created clusters, as bitter experience has shown that leaving that on can mask crashes too. We might at some point need variant functions that allow for, eg, server start failure to be expected. But no existing test case appears to want that, and it surely shouldn't be the default behavior. Note that this *will* break the buildfarm, as it will expose known bugs that the previous testing failed to. I'm committing it despite that, to verify that we get the expected failures in the buildfarm not just in manual testing. Back-patch into 9.6 where PostgresNode was introduced. (The 9.6 branch is not expected to show any failures.) Discussion: https://postgr.es/m/21432.1492886428@sss.pgh.pa.us
1 parent 551cc9a commit 9d5f071

File tree

1 file changed

+26
-16
lines changed

1 file changed

+26
-16
lines changed

src/test/perl/PostgresNode.pm

Lines changed: 26 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -402,6 +402,7 @@ sub init
402402
open my $conf, ">>$pgdata/postgresql.conf";
403403
print $conf "\n# Added by PostgresNode.pm\n";
404404
print $conf "fsync = off\n";
405+
print $conf "restart_after_crash = off\n";
405406
print $conf "log_statement = all\n";
406407
print $conf "port = $port\n";
407408

@@ -636,18 +637,19 @@ sub start
636637
my $port = $self->port;
637638
my $pgdata = $self->data_dir;
638639
my $name = $self->name;
640+
BAIL_OUT("node \"$name\" is already running") if defined $self->{_pid};
639641
print("### Starting node \"$name\"\n");
640642
my $ret = TestLib::system_log('pg_ctl', '-w', '-D', $self->data_dir, '-l',
641643
$self->logfile, 'start');
642644

643645
if ($ret != 0)
644646
{
645-
print "# pg_ctl failed; logfile:\n";
647+
print "# pg_ctl start failed; logfile:\n";
646648
print TestLib::slurp_file($self->logfile);
647-
BAIL_OUT("pg_ctl failed");
649+
BAIL_OUT("pg_ctl start failed");
648650
}
649651

650-
$self->_update_pid;
652+
$self->_update_pid(1);
651653
}
652654

653655
=pod
@@ -656,6 +658,10 @@ sub start
656658
657659
Stop the node using pg_ctl -m $mode and wait for it to stop.
658660
661+
Note: if the node is already known stopped, this does nothing.
662+
However, if we think it's running and it's not, it's important for
663+
this to fail. Otherwise, tests might fail to detect server crashes.
664+
659665
=cut
660666

661667
sub stop
@@ -667,9 +673,8 @@ sub stop
667673
$mode = 'fast' unless defined $mode;
668674
return unless defined $self->{_pid};
669675
print "### Stopping node \"$name\" using mode $mode\n";
670-
TestLib::system_log('pg_ctl', '-D', $pgdata, '-m', $mode, 'stop');
671-
$self->{_pid} = undef;
672-
$self->_update_pid;
676+
TestLib::system_or_bail('pg_ctl', '-D', $pgdata, '-m', $mode, 'stop');
677+
$self->_update_pid(0);
673678
}
674679

675680
=pod
@@ -687,7 +692,7 @@ sub reload
687692
my $pgdata = $self->data_dir;
688693
my $name = $self->name;
689694
print "### Reloading node \"$name\"\n";
690-
TestLib::system_log('pg_ctl', '-D', $pgdata, 'reload');
695+
TestLib::system_or_bail('pg_ctl', '-D', $pgdata, 'reload');
691696
}
692697

693698
=pod
@@ -706,9 +711,9 @@ sub restart
706711
my $logfile = $self->logfile;
707712
my $name = $self->name;
708713
print "### Restarting node \"$name\"\n";
709-
TestLib::system_log('pg_ctl', '-D', $pgdata, '-w', '-l', $logfile,
710-
'restart');
711-
$self->_update_pid;
714+
TestLib::system_or_bail('pg_ctl', '-D', $pgdata, '-w', '-l', $logfile,
715+
'restart');
716+
$self->_update_pid(1);
712717
}
713718

714719
=pod
@@ -727,7 +732,8 @@ sub promote
727732
my $logfile = $self->logfile;
728733
my $name = $self->name;
729734
print "### Promoting node \"$name\"\n";
730-
TestLib::system_log('pg_ctl', '-D', $pgdata, '-l', $logfile, 'promote');
735+
TestLib::system_or_bail('pg_ctl', '-D', $pgdata, '-l', $logfile,
736+
'promote');
731737
}
732738

733739
# Internal routine to enable streaming replication on a standby node.
@@ -805,22 +811,26 @@ archive_command = '$copy_command'
805811
# Internal method
806812
sub _update_pid
807813
{
808-
my $self = shift;
814+
my ($self, $is_running) = @_;
809815
my $name = $self->name;
810816

811817
# If we can open the PID file, read its first line and that's the PID we
812-
# want. If the file cannot be opened, presumably the server is not
813-
# running; don't be noisy in that case.
814-
if (open my $pidfile, $self->data_dir . "/postmaster.pid")
818+
# want.
819+
if (open my $pidfile, '<', $self->data_dir . "/postmaster.pid")
815820
{
816821
chomp($self->{_pid} = <$pidfile>);
817822
print "# Postmaster PID for node \"$name\" is $self->{_pid}\n";
818823
close $pidfile;
824+
825+
# If we found a pidfile when there shouldn't be one, complain.
826+
BAIL_OUT("postmaster.pid unexpectedly present") unless $is_running;
819827
return;
820828
}
821829

822830
$self->{_pid} = undef;
823-
print "# No postmaster PID\n";
831+
print "# No postmaster PID for node \"$name\"\n";
832+
# Complain if we expected to find a pidfile.
833+
BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running;
824834
}
825835

826836
=pod

0 commit comments

Comments
 (0)