SamuraiMarv Posted July 1 Share Posted July 1 At the end of May I migrated my server to new hardware and ran into some issues covered here. After removing the GPU I thought the issue was fixed as I could successfully run a parity check that finished in the normal time of about 24 hours. At midnight my normal monthly parity check started but appears to be stuck at 6.8%, after taking a look at the logs I don't see anything jumping out at me. I've attached my diag file for review. Any help would be greatly appreciated. edi-diagnostics-20240701-0935.zip Quote Link to comment
JorgeB Posted July 1 Share Posted July 1 Unraid driver crashed, this is almost always a hardware issue, but sometimes it can be a kernel/hardware compatibility issue, I would try upgrading to beta 7.0.0, it includes a much newer kernel, if it keeps happening it's almost certainly hardware. Quote Link to comment
SamuraiMarv Posted July 1 Author Share Posted July 1 11 minutes ago, JorgeB said: Unraid driver crashed, this is almost always a hardware issue, but sometimes it can be a kernel/hardware compatibility issue, I would try upgrading to beta 7.0.0, it includes a much newer kernel, if it keeps happening it's almost certainly hardware. Are there any beta changes I should be aware or worried about before upgrading? I'm running a Asus z690 Maximus Hero from other posts I've seen it seems a lot of people are having issues with this board and Unraid. Does the newer version included better support for newish motherboards? I know there isn't really a compatibility list maintained anymore or anything like that just wondering in general if the beta has better support for newer hardware. Quote Link to comment
JorgeB Posted July 1 Share Posted July 1 48 minutes ago, SamuraiMarv said: Are there any beta changes I should be aware or worried about before upgrading? Should be mostly OK, bu read the release notes. Quote Link to comment
SamuraiMarv Posted July 1 Author Share Posted July 1 40 minutes ago, JorgeB said: Should be mostly OK, bu read the release notes. I was able to run a parity check before after my last issue and it worked fine twice. My concern is that I upgrade to the beta run another parity check have it work assume things are good and then next month the same thing happens. Is there a way for me to truly confirm the issue is resolved after upgrading? Quote Link to comment
JorgeB Posted July 1 Share Posted July 1 38 minutes ago, SamuraiMarv said: Is there a way for me to truly confirm the issue is resolved after upgrading? You would need to run it a few times, but based on the description, it suggests more an intermittent hardware issue, if it was kernel related I would expect to fail every time. Quote Link to comment
SamuraiMarv Posted July 1 Author Share Posted July 1 1 hour ago, JorgeB said: You would need to run it a few times, but based on the description, it suggests more an intermittent hardware issue, if it was kernel related I would expect to fail every time. Thank you I've upgraded to the beta and don't see any errors in the logs and will run a couple parity checks. Quote Link to comment
SamuraiMarv Posted July 1 Author Share Posted July 1 1 hour ago, JorgeB said: You would need to run it a few times, but based on the description, it suggests more an intermittent hardware issue, if it was kernel related I would expect to fail every time. Looks like I spoke too soon, its failed again and I See a lot of memory errors in the logs. edi-diagnostics-20240701-1525.zip Quote Link to comment
JorgeB Posted July 2 Share Posted July 2 16 hours ago, JorgeB said: it includes a much newer kernel, if it keeps happening it's almost certainly hardware. Quote Link to comment
SamuraiMarv Posted July 2 Author Share Posted July 2 5 hours ago, JorgeB said: Ok anything specific I can check hardware wise? I"m not getting any errors or warnings in the log until I do a parity check unlike with the old version where I'd get an AER warning about twice every 24 hours. I've already done a RAM check to make my memory was good. Can't think of anything else to check. Should I swap motherboards? Since it seems the z690 Maximus hero has a lot of reported issues with Unraid? Obviously I'd rather not swap hardware, but if it comes to that I'm willing too. Just trying to get ideas of other things to check since I've already ran a memory test, no new motherboard bios either. Quote Link to comment
JorgeB Posted July 2 Share Posted July 2 If you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM, if issues continue board or CPU would be the next suspects. Quote Link to comment
SamuraiMarv Posted July 2 Author Share Posted July 2 1 hour ago, JorgeB said: If you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM, if issues continue board or CPU would be the next suspects. Even with doing a 24+ hour mem test that passed you this it could still be the RAM? I'll try the single stick at a time after this current parity check either passes or stalls. So far it's running without any issues only difference is this time I stopped all dockers and VM's. No errors in the logs since the last reboot and it seems to be running at the normal speed as well. I'll see what happens. There is a weird issue though, the "MAIN" screen no longer shows any of my disc information? It was all there prior to starting the parity check and they all still show up in the DASHBOARD tab so I'm guessing its a UI glitch with 7.0? Quote Link to comment
SamuraiMarv Posted July 2 Author Share Posted July 2 Just got this block of a kernal issue, and the parity check has stalled. I'll reboot and try doing one stick at a time. Though since this is talking about a kernel/driver bug is it a kernel issue? I'm more than willing to test anything out. For now I'll reboot and start removing mem sticks. I've also attached another diag, as far as I can tell there were no warnings or errors before this kernel issue. Jul 2 09:50:04 EDI webgui: Successful login user root from 10.0.0.141 Jul 2 09:55:17 EDI kernel: ------------[ cut here ]------------ Jul 2 09:55:17 EDI kernel: kernel BUG at drivers/md/unraid.c:1617! Jul 2 09:55:17 EDI kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI Jul 2 09:55:17 EDI kernel: CPU: 6 PID: 7994 Comm: unraidd0 Tainted: P O 6.8.12-Unraid #3 Jul 2 09:55:17 EDI kernel: Hardware name: ASUS System Product Name/ROG MAXIMUS Z690 HERO, BIOS 3603 05/28/2024 Jul 2 09:55:17 EDI kernel: RIP: 0010:unraidd+0x1174/0x1265 [md_mod] Jul 2 09:55:17 EDI kernel: Code: 00 83 3d 30 20 00 00 03 7e 16 41 8b 57 98 89 e9 48 c7 c7 21 43 76 a1 48 8b 73 20 e8 6e 6f 97 df 41 f6 87 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 46 18 41 c7 47 b0 00 10 00 00 49 8b 57 10 Jul 2 09:55:17 EDI kernel: RSP: 0018:ffffc9000156fdb0 EFLAGS: 00010246 Jul 2 09:55:17 EDI kernel: RAX: 0000000000000000 RBX: ffff888147b8d548 RCX: 0000000000000000 Jul 2 09:55:17 EDI kernel: RDX: 0000000000000000 RSI: ffffffff82c75be0 RDI: ffff88810a76ce38 Jul 2 09:55:17 EDI kernel: RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000000 Jul 2 09:55:17 EDI kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88814225b130 Jul 2 09:55:17 EDI kernel: R13: 000000000000000a R14: ffff88813febd608 R15: ffff888147b8da78 Jul 2 09:55:17 EDI kernel: FS: 0000000000000000(0000) GS:ffff88a03f180000(0000) knlGS:0000000000000000 Jul 2 09:55:17 EDI kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 2 09:55:17 EDI kernel: CR2: 00001504a5c7d0e8 CR3: 0000000005416000 CR4: 0000000000750ef0 Jul 2 09:55:17 EDI kernel: PKRU: 55555554 Jul 2 09:55:17 EDI kernel: Call Trace: Jul 2 09:55:17 EDI kernel: <TASK> Jul 2 09:55:17 EDI kernel: ? __die_body+0x1a/0x5c Jul 2 09:55:17 EDI kernel: ? die+0x30/0x49 Jul 2 09:55:17 EDI kernel: ? do_trap+0x7b/0xfe Jul 2 09:55:17 EDI kernel: ? unraidd+0x1174/0x1265 [md_mod] Jul 2 09:55:17 EDI kernel: ? unraidd+0x1174/0x1265 [md_mod] Jul 2 09:55:17 EDI kernel: ? do_error_trap+0x6e/0x98 Jul 2 09:55:17 EDI kernel: ? unraidd+0x1174/0x1265 [md_mod] Jul 2 09:55:17 EDI kernel: ? exc_invalid_op+0x4c/0x60 Jul 2 09:55:17 EDI kernel: ? unraidd+0x1174/0x1265 [md_mod] Jul 2 09:55:17 EDI kernel: ? asm_exc_invalid_op+0x16/0x20 Jul 2 09:55:17 EDI kernel: ? unraidd+0x1174/0x1265 [md_mod] Jul 2 09:55:17 EDI kernel: ? preempt_latency_start+0x2b/0x46 Jul 2 09:55:17 EDI kernel: ? preempt_latency_start+0x2b/0x46 Jul 2 09:55:17 EDI kernel: md_thread+0xf4/0x122 [md_mod] Jul 2 09:55:17 EDI kernel: ? __pfx_autoremove_wake_function+0x10/0x10 Jul 2 09:55:17 EDI kernel: ? __pfx_md_thread+0x10/0x10 [md_mod] Jul 2 09:55:17 EDI kernel: kthread+0xf4/0xff Jul 2 09:55:17 EDI kernel: ? __pfx_kthread+0x10/0x10 Jul 2 09:55:17 EDI kernel: ret_from_fork+0x21/0x36 Jul 2 09:55:17 EDI kernel: ? __pfx_kthread+0x10/0x10 Jul 2 09:55:17 EDI kernel: ret_from_fork_asm+0x1b/0x30 Jul 2 09:55:17 EDI kernel: </TASK> Jul 2 09:55:17 EDI kernel: Modules linked in: xt_connmark xt_comment iptable_raw xt_mark xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap ipvlan veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xt_addrtype br_netfilter md_mod zfs(PO) spl(O) tcp_diag inet_diag nct6775 nct6775_core hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc ixgbe xfrm_algo mdio igc xe drm_gpuvm drm_exec gpu_sched drm_ttm_helper drm_suballoc_helper i915 intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi drm_buddy ttm crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit ghash_clmulni_intel sha512_ssse3 sha256_ssse3 drm_display_helper sha1_ssse3 aesni_intel crypto_simd cryptd Jul 2 09:55:17 EDI kernel: drm_kms_helper btusb btrtl rapl btbcm btintel intel_cstate drm bluetooth mei_pxp mei_hdcp wmi_bmof mpt3sas thunderbolt nvme intel_uncore mei_me intel_gtt i2c_i801 i2c_smbus agpgart ahci input_leds raid_class mei nvme_core led_class ecdh_generic joydev scsi_transport_sas libahci i2c_core ecc tpm_crb video vmd thermal tpm_tis fan tpm_tis_core tpm wmi backlight acpi_tad acpi_pad button [last unloaded: xfrm_algo] Jul 2 09:55:17 EDI kernel: ---[ end trace 0000000000000000 ]--- Jul 2 09:55:17 EDI kernel: pstore: backend (efi_pstore) writing error (-5) Jul 2 09:55:17 EDI kernel: RIP: 0010:unraidd+0x1174/0x1265 [md_mod] Jul 2 09:55:17 EDI kernel: Code: 00 83 3d 30 20 00 00 03 7e 16 41 8b 57 98 89 e9 48 c7 c7 21 43 76 a1 48 8b 73 20 e8 6e 6f 97 df 41 f6 87 69 ff ff ff 02 75 02 <0f> 0b 48 8b 43 20 49 03 46 18 41 c7 47 b0 00 10 00 00 49 8b 57 10 Jul 2 09:55:17 EDI kernel: RSP: 0018:ffffc9000156fdb0 EFLAGS: 00010246 Jul 2 09:55:17 EDI kernel: RAX: 0000000000000000 RBX: ffff888147b8d548 RCX: 0000000000000000 Jul 2 09:55:17 EDI kernel: RDX: 0000000000000000 RSI: ffffffff82c75be0 RDI: ffff88810a76ce38 Jul 2 09:55:17 EDI kernel: RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000000 Jul 2 09:55:17 EDI kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88814225b130 Jul 2 09:55:17 EDI kernel: R13: 000000000000000a R14: ffff88813febd608 R15: ffff888147b8da78 Jul 2 09:55:17 EDI kernel: FS: 0000000000000000(0000) GS:ffff88a03f180000(0000) knlGS:0000000000000000 Jul 2 09:55:17 EDI kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 2 09:55:17 EDI kernel: CR2: 00001504a5c7d0e8 CR3: 00000002d53f8000 CR4: 0000000000750ef0 Jul 2 09:55:17 EDI kernel: PKRU: 55555554 Jul 2 09:55:17 EDI kernel: ------------[ cut here ]------------ Jul 2 09:55:17 EDI kernel: WARNING: CPU: 6 PID: 7994 at kernel/exit.c:820 do_exit+0x83/0x904 Jul 2 09:55:17 EDI kernel: Modules linked in: xt_connmark xt_comment iptable_raw xt_mark xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap ipvlan veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xt_addrtype br_netfilter md_mod zfs(PO) spl(O) tcp_diag inet_diag nct6775 nct6775_core hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs bridge stp llc ixgbe xfrm_algo mdio igc xe drm_gpuvm drm_exec gpu_sched drm_ttm_helper drm_suballoc_helper i915 intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iosf_mbi drm_buddy ttm crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit ghash_clmulni_intel sha512_ssse3 sha256_ssse3 drm_display_helper sha1_ssse3 aesni_intel crypto_simd cryptd Jul 2 09:55:17 EDI kernel: drm_kms_helper btusb btrtl rapl btbcm btintel intel_cstate drm bluetooth mei_pxp mei_hdcp wmi_bmof mpt3sas thunderbolt nvme intel_uncore mei_me intel_gtt i2c_i801 i2c_smbus agpgart ahci input_leds raid_class mei nvme_core led_class ecdh_generic joydev scsi_transport_sas libahci i2c_core ecc tpm_crb video vmd thermal tpm_tis fan tpm_tis_core tpm wmi backlight acpi_tad acpi_pad button [last unloaded: xfrm_algo] Jul 2 09:55:17 EDI kernel: CPU: 6 PID: 7994 Comm: unraidd0 Tainted: P D O 6.8.12-Unraid #3 Jul 2 09:55:17 EDI kernel: Hardware name: ASUS System Product Name/ROG MAXIMUS Z690 HERO, BIOS 3603 05/28/2024 Jul 2 09:55:17 EDI kernel: RIP: 0010:do_exit+0x83/0x904 Jul 2 09:55:17 EDI kernel: Code: 24 74 04 75 13 b8 01 00 00 00 41 89 6c 24 60 48 c1 e0 22 49 89 44 24 70 4c 89 ef e8 80 00 9d 00 48 83 bb e0 07 00 00 00 74 02 <0f> 0b 48 8b bb f8 06 00 00 e8 45 ff 9c 00 48 8b 83 f0 06 00 00 83 Jul 2 09:55:17 EDI kernel: RSP: 0018:ffffc9000156fee0 EFLAGS: 00010282 Jul 2 09:55:17 EDI kernel: RAX: 0000000080000000 RBX: ffff88810a5b8000 RCX: 0000000000000000 Jul 2 09:55:17 EDI kernel: RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff Jul 2 09:55:17 EDI kernel: RBP: 000000000000000b R08: 0000000000000000 R09: 0000000000000000 Jul 2 09:55:17 EDI kernel: R10: 00003fffffffffff R11: ffffffffffffffff R12: ffff88810a281100 Jul 2 09:55:17 EDI kernel: R13: ffff888147b52100 R14: 0000000000000002 R15: ffffffff8223ab4c Jul 2 09:55:17 EDI kernel: FS: 0000000000000000(0000) GS:ffff88a03f180000(0000) knlGS:0000000000000000 Jul 2 09:55:17 EDI kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 2 09:55:17 EDI kernel: CR2: 00001504a5c7d0e8 CR3: 00000002d53f8000 CR4: 0000000000750ef0 Jul 2 09:55:17 EDI kernel: PKRU: 55555554 Jul 2 09:55:17 EDI kernel: Call Trace: Jul 2 09:55:17 EDI kernel: <TASK> Jul 2 09:55:17 EDI kernel: ? __warn+0x99/0x11a Jul 2 09:55:17 EDI kernel: ? report_bug+0xdb/0x155 Jul 2 09:55:17 EDI kernel: ? do_exit+0x83/0x904 Jul 2 09:55:17 EDI kernel: ? handle_bug+0x3c/0x63 Jul 2 09:55:17 EDI kernel: ? exc_invalid_op+0x13/0x60 Jul 2 09:55:17 EDI kernel: ? asm_exc_invalid_op+0x16/0x20 Jul 2 09:55:17 EDI kernel: ? do_exit+0x83/0x904 Jul 2 09:55:17 EDI kernel: ? __pfx_md_thread+0x10/0x10 [md_mod] Jul 2 09:55:17 EDI kernel: ? kthread+0xf4/0xff Jul 2 09:55:17 EDI kernel: make_task_dead+0x10f/0x10f Jul 2 09:55:17 EDI kernel: rewind_stack_and_make_dead+0x17/0x17 Jul 2 09:55:17 EDI kernel: RIP: 0000:0x0 Jul 2 09:55:17 EDI kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6. Jul 2 09:55:17 EDI kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 Jul 2 09:55:17 EDI kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 Jul 2 09:55:17 EDI kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 Jul 2 09:55:17 EDI kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 Jul 2 09:55:17 EDI kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 Jul 2 09:55:17 EDI kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Jul 2 09:55:17 EDI kernel: </TASK> Jul 2 09:55:17 EDI kernel: ---[ end trace 0000000000000000 ]--- edi-diagnostics-20240702-1005.zip Quote Link to comment
itimpi Posted July 2 Share Posted July 2 45 minutes ago, SamuraiMarv said: Even with doing a 24+ hour mem test that passed you this it could still be the RAM? Unfortunately yes. If the memtest fails you definitely have a problem, but passing is not as definitive. You can still have RAM problems that only show up under load or specific conditions. That is why it is recommended trying with less RAM sticks if the symptoms still suggest it could be a RAM related issue as that puts less load on the RAM controller and is thus less likely to go wrong. Quote Link to comment
SamuraiMarv Posted July 2 Author Share Posted July 2 (edited) 1 hour ago, itimpi said: Unfortunately yes. If the memtest fails you definitely have a problem, but passing is not as definitive. You can still have RAM problems that only show up under load or specific conditions. That is why it is recommended trying with less RAM sticks if the symptoms still suggest it could be a RAM related issue as that puts less load on the RAM controller and is thus less likely to go wrong. Makes sense, currently running a new parity check with only 1 stick as it typically takes a couple hours for the parity to stall it will be quite a while before I get through all the sticks but hopefully this will rule that out once and for all. Unless of course there's a faster way to determine the RAM is the cause? Edited July 2 by SamuraiMarv Quote Link to comment
SamuraiMarv Posted July 4 Author Share Posted July 4 Gave up on trying to figure it out with that board. Swapped from the z690 maximus hero since it's given me nothing but issues trying to get unraid running stable on it. I swapped to the proart z790 also got a 4 stick RAM pack. I was previously using 2 separate 2 stick packs. Since DDR5 can be real finicky decided to rule that out as well. Just got everything swapped out and running. Starting a parity check and going to bed hopefully when I wake up tomorrow it'll be half way done and not stalled. At this point if it stalls again the only culprit left would be the cpu if it truly is an intermittent hardware issue. Quote Link to comment
SamuraiMarv Posted July 4 Author Share Posted July 4 It just stalled, I've attached the diag. I highly doubt my cpu is bad is there anything else I could possibly be missing? Random bios setting maybe? This is on a brand new motherboard and with 4 brand new sticks of RAM all of which passed memtest already. Grasping at straws at this point as this doesn't make any sense. On the prior board it passed 2 parity checks no problem until this month. edi-diagnostics-20240704-0006.zip Quote Link to comment
JorgeB Posted July 4 Share Posted July 4 Unraid driver is still crashing, see if you can try with a different CPU. Quote Link to comment
SamuraiMarv Posted July 4 Author Share Posted July 4 7 hours ago, JorgeB said: Unraid driver is still crashing, see if you can try with a different CPU. I've downgraded back to 6.12.10 but I still saw the kernel error during a parity check. For now I am running a prime95 test on the cpu. Sourcing a new cpu would be expensive and not easily doable so before I go through that trouble I want to see if it'll fail a prime95 test. It's been running for 11 mins so far with no issues. I'll let it bake in for at least an hour, if nothing then I really don't think the cpu is bad. I've had this cpu running in my gaming rig for several months before it moved to the unraid server. Worst case scenario if it fails I'll try to source a new cpu but what then? If I get a new cpu and I'm still seeing these crashes on a parity check? Quote Link to comment
JorgeB Posted July 4 Share Posted July 4 7 minutes ago, SamuraiMarv said: If I get a new cpu and I'm still seeing these crashes on a parity check? I obviously cannot guarantee that a new CPU is going to resolve the problem, but something is causing those crashes, ideally you would have another PC or be able to borrow some parts, that you could swap around to try, or if you have another PC try running Unraid there, even if just for testing. Quote Link to comment
SamuraiMarv Posted July 4 Author Share Posted July 4 6 hours ago, JorgeB said: I obviously cannot guarantee that a new CPU is going to resolve the problem, but something is causing those crashes, ideally you would have another PC or be able to borrow some parts, that you could swap around to try, or if you have another PC try running Unraid there, even if just for testing. I was able to source another 14900k and swap out the CPU it's running a parity now hopefully this time it finishes. Though there was one weird issue after it booted back up the server name had reverted back to "Tower" I'm not sure why or what caused this? All of my other settings and things are still present just a pretty weird issue. I've attached the diagram just in case it helps. Hopefully you were right and it was the CPU, fingers crossed that the parity check finishes this time. tower-diagnostics-20240704-1847.zip Quote Link to comment
JorgeB Posted July 5 Share Posted July 5 9 hours ago, SamuraiMarv said: server name had reverted back to "Tower" I'm not sure why or what caused this? That could be a flash drive issue, possibly with all the crashing one of the config files got corrupted. Quote Link to comment
SamuraiMarv Posted July 5 Author Share Posted July 5 5 hours ago, JorgeB said: That could be a flash drive issue, possibly with all the crashing one of the config files got corrupted. That makes sense, it looks like the cpu was the culprit after all. This parity check has been going for the last 21 hours and says it'll finish in 3 hours. If/when it does finish, besides renaming my server back to its original name is there anything else I should be on the look out for in terms of possibly messed up config files? I also plan on having the server run at least 2 more parity checks and a couple reboots just to be on the safe side. Is there anything else you would suggest I should do before thinking the issue is fixed? Since with the last cpu it did successfully pass 2 parity checks last month before suddenly not working this month. Quote Link to comment
JorgeB Posted July 5 Share Posted July 5 22 minutes ago, SamuraiMarv said: besides renaming my server back to its original name is there anything else I should be on the look out for in terms of possibly messed up config files? Should just need to rename. 22 minutes ago, SamuraiMarv said: Is there anything else you would suggest I should do before thinking the issue is fixed? Not really. Quote Link to comment
SamuraiMarv Posted July 5 Author Share Posted July 5 Half way through the second parity check and saw this in the log. I think this is talking about the pci controller used by nvme on my board. The only nvme devices I have are two 1tb nvme disks mirrored as a cache. Should I be worried that one of these is about to die? I don't see any smart errors on either drive. Jul 5 15:30:30 EDI webGUI: Successful login user root from 10.0.0.141 Jul 5 17:00:53 EDI kernel: pcieport 0000:00:1b.4: AER: Corrected error message received from 0000:04:00.0 Jul 5 17:00:53 EDI kernel: nvme 0000:04:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Jul 5 17:00:53 EDI kernel: nvme 0000:04:00.0: device [15b7:5006] error status/mask=00000001/0000e000 Jul 5 17:00:53 EDI kernel: nvme 0000:04:00.0: [ 0] RxErr edi-diagnostics-20240705-1859.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.