renobles Posted February 8, 2019 Share Posted February 8, 2019 Recently my server has started having a kernel panic around once every 12 hours or so. Only way to get it back up is a hard reboot. I've enable the troubleshooting mode in CA Fix Common Problems, and captured the output of the panic (below). Any ideas? Feb 8 02:21:51 Tower kernel: general protection fault: 0000 [#1] SMP PTI Feb 8 02:21:51 Tower kernel: CPU: 3 PID: 17930 Comm: sleep Not tainted 4.18.20-unRAID #1 Feb 8 02:21:51 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q77M vPro, BIOS P1.00 04/06/2012 Feb 8 02:21:51 Tower kernel: RIP: 0010:__schedule+0x541/0x542 Feb 8 02:21:51 Tower kernel: Code: 74 08 4c 89 e7 e8 c2 17 a3 ff 48 8b 45 d0 65 48 33 04 25 28 00 00 00 74 05 e8 28 59 a1 ff 58 5a 5b 41 5c 41 5d 41 5e 41 5f 5d <c3> 65 48 8b 04 25 00 5c 01 00 48 8b 50 10 48 85 d2 74 42 48 83 b8 Feb 8 02:21:51 Tower kernel: RSP: 0018:ffffc9000fbd7e50 EFLAGS: 00010246 Feb 8 02:21:51 Tower kernel: RAX: ffff880212da7d80 RBX: ffffc9000fbd7ea8 RCX: ffff880214dff600 Feb 8 02:21:51 Tower kernel: RDX: 13ee998aded91800 RSI: 000000006c708540 RDI: ffff88021e3a0c00 Feb 8 02:21:51 Tower kernel: RBP: ffffc9000fbd7e98 R08: 000077ff80000000 R09: 0000000000000000 Feb 8 02:21:51 Tower kernel: R10: 0000000000000004 R11: ffff88021e3a0c80 R12: ffff880214dfec00 Feb 8 02:21:51 Tower kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Feb 8 02:21:51 Tower kernel: FS: 0000150f6c708540(0000) GS:ffff88021e380000(0000) knlGS:0000000000000000 Feb 8 02:21:51 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 8 02:21:51 Tower kernel: CR2: 000014bf648da000 CR3: 0000000117b00001 CR4: 00000000001606e0 Feb 8 02:21:51 Tower kernel: Call Trace: Feb 8 02:21:51 Tower kernel: ? do_nanosleep+0x81/0x161 Feb 8 02:21:51 Tower kernel: ? hrtimer_nanosleep+0x99/0xf9 Feb 8 02:21:51 Tower kernel: ? hrtimer_init+0x2/0x2 Feb 8 02:21:51 Tower kernel: ? __se_sys_nanosleep+0x79/0x94 Feb 8 02:21:51 Tower kernel: ? do_syscall_64+0x57/0xe6 Feb 8 02:21:51 Tower kernel: ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Feb 8 02:21:51 Tower kernel: Modules linked in: veth xt_nat ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd mpt3sas e1000e glue_helper raid_class scsi_transport_sas intel_cstate intel_uncore ahci i2c_i801 intel_rapl_perf i2c_core libahci video backlight ie31200_edac button pcc_cpufreq Feb 8 02:21:51 Tower kernel: ---[ end trace 636252fd9269676d ]--- Feb 8 02:21:51 Tower kernel: RIP: 0010:__schedule+0x541/0x542 Feb 8 02:21:51 Tower kernel: Code: 74 08 4c 89 e7 e8 c2 17 a3 ff 48 8b 45 d0 65 48 33 04 25 28 00 00 00 74 05 e8 28 59 a1 ff 58 5a 5b 41 5c 41 5d 41 5e 41 5f 5d <c3> 65 48 8b 04 25 00 5c 01 00 48 8b 50 10 48 85 d2 74 42 48 83 b8 Feb 8 02:21:51 Tower kernel: RSP: 0018:ffffc9000fbd7e50 EFLAGS: 00010246 Feb 8 02:21:51 Tower kernel: RAX: ffff880212da7d80 RBX: ffffc9000fbd7ea8 RCX: ffff880214dff600 Feb 8 02:21:51 Tower kernel: RDX: 13ee998aded91800 RSI: 000000006c708540 RDI: ffff88021e3a0c00 Feb 8 02:21:51 Tower kernel: RBP: ffffc9000fbd7e98 R08: 000077ff80000000 R09: 0000000000000000 Feb 8 02:21:51 Tower kernel: R10: 0000000000000004 R11: ffff88021e3a0c80 R12: ffff880214dfec00 Feb 8 02:21:51 Tower kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Feb 8 02:21:51 Tower kernel: FS: 0000150f6c708540(0000) GS:ffff88021e380000(0000) knlGS:0000000000000000 Feb 8 02:21:51 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 8 02:21:51 Tower kernel: CR2: 000014bf648da000 CR3: 0000000117b00001 CR4: 00000000001606e0 Feb 8 02:21:51 Tower kernel: general protection fault: 0000 [#2] SMP PTI Feb 8 02:21:51 Tower kernel: CPU: 3 PID: 17937 Comm: awk Tainted: G D 4.18.20-unRAID #1 Feb 8 02:21:51 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q77M vPro, BIOS P1.00 04/06/2012 Feb 8 02:21:51 Tower kernel: RIP: 0010:show_map_vma.isra.5+0x99/0x134 Feb 8 02:21:51 Tower kernel: Code: d8 81 4c 89 e6 48 89 ef e8 76 1c fd ff eb 67 48 8b 83 90 00 00 00 48 85 c0 75 0f 48 89 df e8 56 ae eb ff 48 85 c0 75 7b eb 18 <48> 8b 40 58 48 85 c0 74 e8 48 89 df e8 4d 83 87 00 48 85 c0 75 63 Feb 8 02:21:51 Tower kernel: RSP: 0018:ffffc9000fbe7da0 EFLAGS: 00010206 Feb 8 02:21:51 Tower kernel: RAX: 2000000000000000 RBX: ffff880212852b40 RCX: 0000000000000e06 Feb 8 02:21:51 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff88008ee0a980 Feb 8 02:21:51 Tower kernel: RBP: ffff88008ee0a980 R08: 0000000000000000 R09: 0000000000000001 Feb 8 02:21:51 Tower kernel: R10: 0000000000000000 R11: ffff88006e396e04 R12: 0000000000000000 Feb 8 02:21:51 Tower kernel: R13: ffff880210ed72c0 R14: ffff8801ed17cb00 R15: 0000000000000dd6 Feb 8 02:21:51 Tower kernel: FS: 0000151d42013a80(0000) GS:ffff88021e380000(0000) knlGS:0000000000000000 Feb 8 02:21:51 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 8 02:21:51 Tower kernel: CR2: 00000000006a83f8 CR3: 0000000117b00003 CR4: 00000000001606e0 Feb 8 02:21:51 Tower kernel: Call Trace: Feb 8 02:21:51 Tower kernel: show_pid_map+0xd/0x1d Feb 8 02:21:51 Tower kernel: seq_read+0x2a1/0x38b Feb 8 02:21:51 Tower kernel: __vfs_read+0x2e/0x133 Feb 8 02:21:51 Tower kernel: ? vm_mmap_pgoff+0xa4/0xe2 Feb 8 02:21:51 Tower kernel: vfs_read+0x9a/0x11f Feb 8 02:21:51 Tower kernel: ksys_read+0x58/0xa6 Feb 8 02:21:51 Tower kernel: do_syscall_64+0x57/0xe6 Feb 8 02:21:51 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Feb 8 02:21:51 Tower kernel: RIP: 0033:0x151d423485e1 Feb 8 02:21:51 Tower kernel: Code: fe ff ff 50 48 8d 3d 2e 27 0a 00 e8 49 21 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 19 c1 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48 Feb 8 02:21:51 Tower kernel: RSP: 002b:00007ffe761b8e18 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 Feb 8 02:21:51 Tower kernel: RAX: ffffffffffffffda RBX: 00007ffe761b8e50 RCX: 0000151d423485e1 Feb 8 02:21:51 Tower kernel: RDX: 0000000000002000 RSI: 0000151d42f06000 RDI: 0000000000000004 Feb 8 02:21:51 Tower kernel: RBP: 00007ffe761b8eec R08: 00000000ffffffff R09: 0000000000000000 Feb 8 02:21:51 Tower kernel: R10: 000000000000000a R11: 0000000000000246 R12: 0000000000000004 Feb 8 02:21:51 Tower kernel: R13: 0000000000001000 R14: 00007ffe761b8ef0 R15: 0000000000002000 Feb 8 02:21:51 Tower kernel: Modules linked in: veth xt_nat ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd mpt3sas e1000e glue_helper raid_class scsi_transport_sas intel_cstate intel_uncore ahci i2c_i801 intel_rapl_perf i2c_core libahci video backlight ie31200_edac button pcc_cpufreq Feb 8 02:21:51 Tower kernel: ---[ end trace 636252fd9269676e ]--- Feb 8 02:21:51 Tower kernel: RIP: 0010:__schedule+0x541/0x542 Feb 8 02:21:51 Tower kernel: Code: 74 08 4c 89 e7 e8 c2 17 a3 ff 48 8b 45 d0 65 48 33 04 25 28 00 00 00 74 05 e8 28 59 a1 ff 58 5a 5b 41 5c 41 5d 41 5e 41 5f 5d <c3> 65 48 8b 04 25 00 5c 01 00 48 8b 50 10 48 85 d2 74 42 48 83 b8 Feb 8 02:21:51 Tower kernel: RSP: 0018:ffffc9000fbd7e50 EFLAGS: 00010246 Feb 8 02:21:51 Tower kernel: RAX: ffff880212da7d80 RBX: ffffc9000fbd7ea8 RCX: ffff880214dff600 Feb 8 02:21:51 Tower kernel: RDX: 13ee998aded91800 RSI: 000000006c708540 RDI: ffff88021e3a0c00 Feb 8 02:21:51 Tower kernel: RBP: ffffc9000fbd7e98 R08: 000077ff80000000 R09: 0000000000000000 Feb 8 02:21:51 Tower kernel: R10: 0000000000000004 R11: ffff88021e3a0c80 R12: ffff880214dfec00 Feb 8 02:21:51 Tower kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Feb 8 02:21:51 Tower kernel: FS: 0000151d42013a80(0000) GS:ffff88021e380000(0000) knlGS:0000000000000000 Feb 8 02:21:51 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 8 02:21:51 Tower kernel: CR2: 00000000006a83f8 CR3: 0000000117b00003 CR4: 00000000001606e0 Feb 8 02:22:01 Tower kernel: general protection fault: 0000 [#3] SMP PTI Feb 8 02:22:01 Tower kernel: CPU: 3 PID: 17947 Comm: monitor Tainted: G D 4.18.20-unRAID #1 Feb 8 02:22:01 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q77M vPro, BIOS P1.00 04/06/2012 Feb 8 02:22:01 Tower kernel: RIP: 0010:vma_interval_tree_insert+0x2d/0x7b Feb 8 02:22:01 Tower kernel: Code: 08 48 89 f8 49 89 f0 45 31 c9 48 2b 17 4c 8b 97 98 00 00 00 48 c1 ea 0c 49 8d 7c 12 ff ba 01 00 00 00 49 8b 08 48 85 c9 74 1f <48> 39 79 18 73 04 48 89 79 18 4c 3b 51 40 4c 8d 41 10 72 06 4c 8d Feb 8 02:22:01 Tower kernel: RSP: 0018:ffffc9000fbf7d68 EFLAGS: 00010206 Feb 8 02:22:01 Tower kernel: RAX: ffff880211664cc0 RBX: ffff8801fac03da8 RCX: 2000000000000000 Feb 8 02:22:01 Tower kernel: RDX: 0000000000000000 RSI: ffff8801fac03dc8 RDI: 0000000000000000 Feb 8 02:22:01 Tower kernel: RBP: ffff880210ed50c0 R08: ffff8801ee169120 R09: ffff8801ee169118 Feb 8 02:22:01 Tower kernel: R10: 0000000000000000 R11: ffff8802116646e0 R12: ffff880211664cc0 Feb 8 02:22:01 Tower kernel: R13: ffff8802116646e0 R14: ffff8802116646f0 R15: 0000000000000000 Feb 8 02:22:01 Tower kernel: FS: 0000151d67f93740(0000) GS:ffff88021e380000(0000) knlGS:0000000000000000 Feb 8 02:22:01 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 8 02:22:01 Tower kernel: CR2: 00000000011ec028 CR3: 0000000193198003 CR4: 00000000001606e0 Feb 8 02:22:01 Tower kernel: Call Trace: Feb 8 02:22:01 Tower kernel: vma_link+0x63/0x7e Feb 8 02:22:01 Tower kernel: mmap_region+0x313/0x412 Feb 8 02:22:01 Tower kernel: do_mmap+0x3e9/0x43f Feb 8 02:22:01 Tower kernel: vm_mmap_pgoff+0x99/0xe2 Feb 8 02:22:01 Tower kernel: ksys_mmap_pgoff+0x6c/0x94 Feb 8 02:22:01 Tower kernel: do_syscall_64+0x57/0xe6 Feb 8 02:22:01 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Feb 8 02:22:01 Tower kernel: RIP: 0033:0x151d6bd850a3 Feb 8 02:22:01 Tower kernel: Code: 54 41 89 d4 55 48 89 fd 53 4c 89 cb 48 85 ff 74 56 49 89 d9 45 89 f8 45 89 f2 44 89 e2 4c 89 ee 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7d 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 2e 0f Feb 8 02:22:01 Tower kernel: RSP: 002b:00007ffeedbfd858 EFLAGS: 00000246 ORIG_RAX: 0000000000000009 Feb 8 02:22:01 Tower kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000151d6bd850a3 Feb 8 02:22:01 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000022 RDI: 0000000000000000 Feb 8 02:22:01 Tower kernel: RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000000 Feb 8 02:22:01 Tower kernel: R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000001 Feb 8 02:22:01 Tower kernel: R13: 0000000000000022 R14: 0000000000000002 R15: 0000000000000004 Feb 8 02:22:01 Tower kernel: Modules linked in: veth xt_nat ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs md_mod x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd mpt3sas e1000e glue_helper raid_class scsi_transport_sas intel_cstate intel_uncore ahci i2c_i801 intel_rapl_perf i2c_core libahci video backlight ie31200_edac button pcc_cpufreq Feb 8 02:22:01 Tower kernel: ---[ end trace 636252fd9269676f ]--- Feb 8 02:22:01 Tower kernel: RIP: 0010:__schedule+0x541/0x542 Feb 8 02:22:01 Tower kernel: Code: 74 08 4c 89 e7 e8 c2 17 a3 ff 48 8b 45 d0 65 48 33 04 25 28 00 00 00 74 05 e8 28 59 a1 ff 58 5a 5b 41 5c 41 5d 41 5e 41 5f 5d <c3> 65 48 8b 04 25 00 5c 01 00 48 8b 50 10 48 85 d2 74 42 48 83 b8 Feb 8 02:22:01 Tower kernel: RSP: 0018:ffffc9000fbd7e50 EFLAGS: 00010246 Feb 8 02:22:01 Tower kernel: RAX: ffff880212da7d80 RBX: ffffc9000fbd7ea8 RCX: ffff880214dff600 Feb 8 02:22:01 Tower kernel: RDX: 13ee998aded91800 RSI: 000000006c708540 RDI: ffff88021e3a0c00 Feb 8 02:22:01 Tower kernel: RBP: ffffc9000fbd7e98 R08: 000077ff80000000 R09: 0000000000000000 Feb 8 02:22:01 Tower kernel: R10: 0000000000000004 R11: ffff88021e3a0c80 R12: ffff880214dfec00 Feb 8 02:22:01 Tower kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Feb 8 02:22:01 Tower kernel: FS: 0000151d67f93740(0000) GS:ffff88021e380000(0000) knlGS:0000000000000000 Feb 8 02:22:01 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 8 02:22:01 Tower kernel: CR2: 00000000011ec028 CR3: 0000000193198003 CR4: 00000000001606e0 Link to comment
John_M Posted February 8, 2019 Share Posted February 8, 2019 The first thing I'd do is run MemTest for 24 hours or so. Link to comment
renobles Posted February 8, 2019 Author Share Posted February 8, 2019 Good call. I'll get that started now. Link to comment
renobles Posted February 10, 2019 Author Share Posted February 10, 2019 So, the memtest passed for 24 hours without any issue. What else can I try? Link to comment
John_M Posted February 10, 2019 Share Posted February 10, 2019 You could post your diagnostics zip (Tools -> Diagnostics). It might not show anything suspicious but we'd have information about your hardware and software configuration. Is this a server that has previously worked reliably but has recently started to have problems? Link to comment
renobles Posted February 10, 2019 Author Share Posted February 10, 2019 Thanks for the quick response. Attached is the diagnostics file. The server has generally been quite stable. As far as I can remember this is the first time it has had a kernel panic and hard frozen. About a month ago, there were some controller issues, and I put in a LSI controller and moved all disks to it. Since then, no issues until the first kernel panic about mid-week. tower-diagnostics-20190209-2240.zip Link to comment
John_M Posted February 10, 2019 Share Posted February 10, 2019 I don't see anything alarming in your diagnostics. The only thing out of the ordinary is this at the very end of your syslog: Feb 9 22:41:49 Tower kernel: sd 7:0:2:0: attempting task abort! scmd(00000000d4352fa5) Feb 9 22:41:49 Tower kernel: sd 7:0:2:0: [sdd] tag#0 CDB: opcode=0x85 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 Feb 9 22:41:49 Tower kernel: scsi target7:0:2: handle(0x000a), sas_address(0x4433221101000000), phy(1) Feb 9 22:41:49 Tower kernel: scsi target7:0:2: enclosure logical id(0x500605b001600880), slot(2) Feb 9 22:41:49 Tower kernel: sd 7:0:2:0: task abort: SUCCESS scmd(00000000d4352fa5) Feb 9 22:41:49 Tower kernel: sd 7:0:2:0: Power-on or device reset occurred which coincides with your request for the diagnostics dump. Now, /dev/sdd is your Crucial BX SSD, which hasn't produced a SMART report. I'm guessing that the reset was possibly as a result of the failed SMART request. So, while it may not be a problem, it would be worth investigating. I don't have any Crucial SSDs in any system I've ever built but I vaguely remember reports on this forum about them sometimes behaving oddly so it might be worth a bit of research. It might be worth moving it to a motherboard SATA port. Your SanDisk SSD is fine and I've used them a lot and never had any issues. Link to comment
JorgeB Posted February 10, 2019 Share Posted February 10, 2019 32 minutes ago, John_M said: It might be worth moving it to a motherboard SATA port. Yeah, it should be moved, if for nothing else so that trim can work. Link to comment
renobles Posted February 14, 2019 Author Share Posted February 14, 2019 Thanks for the responses. I moved that SSD over to the motherboard, and haven't had a lockup since then. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.