Intermittent SIGSEGV, v1.26.2 #4588

klemenStanic · 2025-02-10T20:54:35Z

We are experiencing intermittent SIGSEGV signals, followed by abrupt restart of the dragonfly systemd service.
This issue first happened in v1.20.1, which made us upgrade to v1.26.2.

After upgrading multiple servers to v1.26.2, this issue became more frequent.
We are running the each dragonflydb instance as a standalone server with no replication / HA. The servers are dedicated to running Dragonfly.
At the time of the crash, there is still more than 40GB of RAM available.

We are using the following settings:

--pidfile=/var/run/dragonfly/dragonfly.pid
--log_dir=/var/log/dragonfly
--dir=/mnt/dragonfly_storage
--max_log_size=200
--version_check=true
--maxmemory=240gb
--bind=10.0.1.55
--cache_mode=true
--dbnum=1024
--snapshot_cron=0 21 * * *
--dbfilename=dump

We are on fully up-to-date Ubuntu 24.04, kernel 6.8.0-52-generic.

When the issue occurs, journalctl shows:

Feb 10 21:01:53 *** dragonfly[25637]: *** SIGSEGV received at time=1739217713 on cpu 9 ***
Feb 10 21:01:53 *** dragonfly[25637]: PC: @     0x61e0ed6dd37a  (unknown)  dfly::SliceSnapshot::OnDbChange()
Feb 10 21:08:24 *** systemd[1]: dragonfly.service: Main process exited, code=dumped, status=11/SEGV
Feb 10 21:08:24 *** systemd[1]: dragonfly.service: Failed with result 'core-dump'.
Feb 10 21:08:24 *** systemd[1]: dragonfly.service: Consumed 13h 40min 53.800s CPU time, 249.6G memory peak, 0B memory swap peak.
Feb 10 21:08:25 *** systemd[1]: dragonfly.service: Scheduled restart job, restart counter is at 1.

dmesg -T:

[Mon Feb 10 21:04:41 2025] INFO: task dragonfly:25637 blocked for more than 122 seconds.
[Mon Feb 10 21:04:41 2025]       Not tainted 6.8.0-52-generic #53-Ubuntu
[Mon Feb 10 21:04:41 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Feb 10 21:04:41 2025] task:dragonfly       state:D stack:0     pid:25637 tgid:25637 ppid:1      flags:0x00004002
[Mon Feb 10 21:04:41 2025] Call Trace:
[Mon Feb 10 21:04:41 2025]  <TASK>
[Mon Feb 10 21:04:41 2025]  __schedule+0x27c/0x6b0
[Mon Feb 10 21:04:41 2025]  schedule+0x33/0x110
[Mon Feb 10 21:04:41 2025]  do_exit+0x117/0x530
[Mon Feb 10 21:04:41 2025]  do_group_exit+0x35/0x90
[Mon Feb 10 21:04:41 2025]  get_signal+0x96e/0x9b0
[Mon Feb 10 21:04:41 2025]  arch_do_signal_or_restart+0x39/0x120
[Mon Feb 10 21:04:41 2025]  syscall_exit_to_user_mode+0x206/0x260
[Mon Feb 10 21:04:41 2025]  do_syscall_64+0x8c/0x180
[Mon Feb 10 21:04:41 2025]  ? __f_unlock_pos+0x12/0x20
[Mon Feb 10 21:04:41 2025]  ? ksys_write+0xe6/0x100
[Mon Feb 10 21:04:41 2025]  ? syscall_exit_to_user_mode+0x86/0x260
[Mon Feb 10 21:04:41 2025]  ? do_syscall_64+0x8c/0x180
[Mon Feb 10 21:04:41 2025]  ? irqentry_exit_to_user_mode+0x7b/0x260
[Mon Feb 10 21:04:41 2025]  ? irqentry_exit+0x43/0x50
[Mon Feb 10 21:04:41 2025]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[Mon Feb 10 21:04:41 2025] RIP: 0033:0x79b5f3498d71
[Mon Feb 10 21:04:41 2025] RSP: 002b:000061e1257ae200 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[Mon Feb 10 21:04:41 2025] RAX: fffffffffffffe00 RBX: 000061e1257ae5d0 RCX: 000079b5f3498d71
[Mon Feb 10 21:04:41 2025] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 000061e1257ae5f8
[Mon Feb 10 21:04:41 2025] RBP: 000061e1257ae240 R08: 0000000000000000 R09: 00000000ffffffff
[Mon Feb 10 21:04:41 2025] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[Mon Feb 10 21:04:41 2025] R13: 0000000000000000 R14: 000061e1257ae5a8 R15: 000061e1257ae5f8
[Mon Feb 10 21:04:41 2025]  </TASK>

No useful data in dragonflydb error/warn/info logs.

The text was updated successfully, but these errors were encountered:

romange · 2025-02-11T08:03:37Z

The issue happens during the snapshotting. I would still love to have the INFO log and can you please configure the system to create a core file so the next time it crashes you could share it with us?

romange · 2025-02-11T08:08:38Z

Feel free to send with the instructions via a DM on discord or email.

kostasrim · 2025-02-11T08:11:01Z

+1 for the coredumps and info logs

Looking at the systemd looks like dfly::SliceSnapshot::OnDbChange()

adiholden assigned kostasrim Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent SIGSEGV, v1.26.2 #4588

Intermittent SIGSEGV, v1.26.2 #4588

klemenStanic commented Feb 10, 2025

romange commented Feb 11, 2025

romange commented Feb 11, 2025

kostasrim commented Feb 11, 2025

Intermittent SIGSEGV, v1.26.2 #4588

Intermittent SIGSEGV, v1.26.2 #4588

Comments

klemenStanic commented Feb 10, 2025

romange commented Feb 11, 2025

romange commented Feb 11, 2025

kostasrim commented Feb 11, 2025