Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent SIGSEGV, v1.26.2 #4588

Open
klemenStanic opened this issue Feb 10, 2025 · 3 comments
Open

Intermittent SIGSEGV, v1.26.2 #4588

klemenStanic opened this issue Feb 10, 2025 · 3 comments
Assignees

Comments

@klemenStanic
Copy link

We are experiencing intermittent SIGSEGV signals, followed by abrupt restart of the dragonfly systemd service.
This issue first happened in v1.20.1, which made us upgrade to v1.26.2.

After upgrading multiple servers to v1.26.2, this issue became more frequent.
We are running the each dragonflydb instance as a standalone server with no replication / HA. The servers are dedicated to running Dragonfly.
At the time of the crash, there is still more than 40GB of RAM available.

We are using the following settings:

--pidfile=/var/run/dragonfly/dragonfly.pid
--log_dir=/var/log/dragonfly
--dir=/mnt/dragonfly_storage
--max_log_size=200
--version_check=true
--maxmemory=240gb
--bind=10.0.1.55
--cache_mode=true
--dbnum=1024
--snapshot_cron=0 21 * * *
--dbfilename=dump

We are on fully up-to-date Ubuntu 24.04, kernel 6.8.0-52-generic.

When the issue occurs, journalctl shows:

Feb 10 21:01:53 *** dragonfly[25637]: *** SIGSEGV received at time=1739217713 on cpu 9 ***
Feb 10 21:01:53 *** dragonfly[25637]: PC: @     0x61e0ed6dd37a  (unknown)  dfly::SliceSnapshot::OnDbChange()
Feb 10 21:08:24 *** systemd[1]: dragonfly.service: Main process exited, code=dumped, status=11/SEGV
Feb 10 21:08:24 *** systemd[1]: dragonfly.service: Failed with result 'core-dump'.
Feb 10 21:08:24 *** systemd[1]: dragonfly.service: Consumed 13h 40min 53.800s CPU time, 249.6G memory peak, 0B memory swap peak.
Feb 10 21:08:25 *** systemd[1]: dragonfly.service: Scheduled restart job, restart counter is at 1.

dmesg -T:

[Mon Feb 10 21:04:41 2025] INFO: task dragonfly:25637 blocked for more than 122 seconds.
[Mon Feb 10 21:04:41 2025]       Not tainted 6.8.0-52-generic #53-Ubuntu
[Mon Feb 10 21:04:41 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Feb 10 21:04:41 2025] task:dragonfly       state:D stack:0     pid:25637 tgid:25637 ppid:1      flags:0x00004002
[Mon Feb 10 21:04:41 2025] Call Trace:
[Mon Feb 10 21:04:41 2025]  <TASK>
[Mon Feb 10 21:04:41 2025]  __schedule+0x27c/0x6b0
[Mon Feb 10 21:04:41 2025]  schedule+0x33/0x110
[Mon Feb 10 21:04:41 2025]  do_exit+0x117/0x530
[Mon Feb 10 21:04:41 2025]  do_group_exit+0x35/0x90
[Mon Feb 10 21:04:41 2025]  get_signal+0x96e/0x9b0
[Mon Feb 10 21:04:41 2025]  arch_do_signal_or_restart+0x39/0x120
[Mon Feb 10 21:04:41 2025]  syscall_exit_to_user_mode+0x206/0x260
[Mon Feb 10 21:04:41 2025]  do_syscall_64+0x8c/0x180
[Mon Feb 10 21:04:41 2025]  ? __f_unlock_pos+0x12/0x20
[Mon Feb 10 21:04:41 2025]  ? ksys_write+0xe6/0x100
[Mon Feb 10 21:04:41 2025]  ? syscall_exit_to_user_mode+0x86/0x260
[Mon Feb 10 21:04:41 2025]  ? do_syscall_64+0x8c/0x180
[Mon Feb 10 21:04:41 2025]  ? irqentry_exit_to_user_mode+0x7b/0x260
[Mon Feb 10 21:04:41 2025]  ? irqentry_exit+0x43/0x50
[Mon Feb 10 21:04:41 2025]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[Mon Feb 10 21:04:41 2025] RIP: 0033:0x79b5f3498d71
[Mon Feb 10 21:04:41 2025] RSP: 002b:000061e1257ae200 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[Mon Feb 10 21:04:41 2025] RAX: fffffffffffffe00 RBX: 000061e1257ae5d0 RCX: 000079b5f3498d71
[Mon Feb 10 21:04:41 2025] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 000061e1257ae5f8
[Mon Feb 10 21:04:41 2025] RBP: 000061e1257ae240 R08: 0000000000000000 R09: 00000000ffffffff
[Mon Feb 10 21:04:41 2025] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[Mon Feb 10 21:04:41 2025] R13: 0000000000000000 R14: 000061e1257ae5a8 R15: 000061e1257ae5f8
[Mon Feb 10 21:04:41 2025]  </TASK>

No useful data in dragonflydb error/warn/info logs.

@romange
Copy link
Collaborator

romange commented Feb 11, 2025

The issue happens during the snapshotting. I would still love to have the INFO log and can you please configure the system to create a core file so the next time it crashes you could share it with us?

@romange
Copy link
Collaborator

romange commented Feb 11, 2025

Feel free to send with the instructions via a DM on discord or email.

@kostasrim
Copy link
Contributor

+1 for the coredumps and info logs

Looking at the systemd looks like dfly::SliceSnapshot::OnDbChange()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants