August 20, 20241 yr I have a brand new UnRAID server that keeps "locking up" right before the parity sync finishes. I'm unsure what's going on. I've replaced everything except the hard drives (which are known to be in working order). The host cpu is a Ryzen 5 5600G, and C states are turned off in the BIOS. The most recent crash (detected by UptimeKuma) was around 2024-08-19 19:08:55 and looking at the syslog: Aug 19 19:00:01 nas crond[1542]: failed parsing crontab for user root: Invalid frequency setting of /usr/local/emhttp/plugins/ca.update.applications/scripts/updateApplications.php >/dev/null 2>&1 Aug 19 19:08:05 nas kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) Aug 19 19:08:05 nas kernel: sd 7:0:1:0: Power-on or device reset occurred Aug 19 19:08:06 nas kernel: ixgbe 0000:04:00.0: Adapter removed Aug 19 19:08:06 nas kernel: ixgbe 0000:04:00.0: Warning firmware error detected FWSM: 0xFFFFFFFF Aug 19 19:08:08 nas kernel: br0: port 1(eth0) entered disabled state Aug 19 19:08:11 nas ntpd[1517]: Deleting interface #1 br0, 10.0.1.5#123, interface stats: received=193, sent=193, dropped=0, active_time=30806 secs Aug 19 19:08:11 nas ntpd[1517]: 162.159.200.1 local addr 10.0.1.5 -> <null> Aug 19 19:08:11 nas ntpd[1517]: 216.239.35.4 local addr 10.0.1.5 -> <null> Aug 19 19:08:35 nas kernel: sd 7:0:4:0: attempting task abort!scmd(0x0000000092884838), outstanding for 30065 ms & timeout 30000 ms Aug 19 19:08:35 nas kernel: sd 7:0:4:0: [sdf] tag#1732 CDB: opcode=0x88 88 00 00 00 00 02 76 d5 7c a8 00 00 04 00 00 00 Aug 19 19:08:35 nas kernel: scsi target7:0:4: handle(0x000d), sas_address(0x4433221104000000), phy(4) Aug 19 19:08:35 nas kernel: scsi target7:0:4: enclosure logical id(0x500605b007eda110), slot(7) Aug 19 19:08:35 nas kernel: sd 7:0:4:0: task abort: SUCCESS scmd(0x0000000092884838) Aug 19 19:08:35 nas kernel: sd 7:0:5:0: attempting task abort!scmd(0x000000002ca9186f), outstanding for 30105 ms & timeout 30000 ms Aug 19 19:08:35 nas kernel: sd 7:0:5:0: [sdg] tag#1735 CDB: opcode=0x88 88 00 00 00 00 02 76 d5 7c a8 00 00 04 00 00 00 Aug 19 19:08:35 nas kernel: scsi target7:0:5: handle(0x000e), sas_address(0x4433221105000000), phy(5) Aug 19 19:08:35 nas kernel: scsi target7:0:5: enclosure logical id(0x500605b007eda110), slot(6) Aug 19 19:08:35 nas kernel: sd 7:0:5:0: task abort: SUCCESS scmd(0x000000002ca9186f) Aug 19 19:08:35 nas kernel: sd 7:0:4:0: Power-on or device reset occurred Aug 19 19:08:35 nas kernel: sd 7:0:5:0: Power-on or device reset occurred Aug 19 19:08:36 nas kernel: sd 7:0:1:0: Power-on or device reset occurred Aug 19 19:08:36 nas kernel: sd 7:0:4:0: Power-on or device reset occurred Aug 19 19:08:36 nas kernel: sd 7:0:5:0: Power-on or device reset occurred Aug 19 19:09:34 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Aug 19 19:09:34 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Aug 19 19:09:53 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Aug 19 19:10:13 nas kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) There's a few "curious" lines: Aug 19 19:08:05 nas kernel: sd 7:0:1:0: Power-on or device reset occurred Aug 19 19:08:06 nas kernel: ixgbe 0000:04:00.0: Adapter removed Aug 19 19:08:06 nas kernel: ixgbe 0000:04:00.0: Warning firmware error detected FWSM: 0xFFFFFFFF Aug 19 19:08:08 nas kernel: br0: port 1(eth0) entered disabled state Aug 19 19:08:11 nas ntpd[1517]: Deleting interface #1 br0, 10.0.1.5#123, interface stats: received=193, sent=193, dropped=0, active_time=30806 secs Aug 19 19:08:11 nas ntpd[1517]: 162.159.200.1 local addr 10.0.1.5 -> <null> Aug 19 19:08:11 nas ntpd[1517]: 216.239.35.4 local addr 10.0.1.5 -> <null> I've confirmed that the NIC is working in another system. (It's a genuine intel x520-da2). Even though the network appears to go down, when I try to login on the console directly, I'm unable to login (it times out after about 60 seconds). Any ideas? I've tried letting the array rebuild twice, and it keeps locking up right at the end with a similar syslog. Edited August 20, 20241 yr by nvroom
August 20, 20241 yr Community Expert 4 hours ago, Cosm1c said: Aug 19 19:08:35 nas kernel: sd 7:0:4:0: Power-on or device reset occurred Aug 19 19:08:35 nas kernel: sd 7:0:5:0: Power-on or device reset occurred Aug 19 19:08:36 nas kernel: sd 7:0:1:0: Power-on or device reset occurred Aug 19 19:08:36 nas kernel: sd 7:0:4:0: Power-on or device reset occurred Aug 19 19:08:36 nas kernel: sd 7:0:5:0: Power-on or device reset occurred These are usually a power/connection problem, check/replace cables, power and SATA/SAS, and try again
August 20, 20241 yr Author 8 hours ago, JorgeB said: These are usually a power/connection problem, check/replace cables, power and SATA/SAS, and try again What about this? ixgbe 0000:04:00.0: Warning firmware error detected FWSM: 0xFFFFFFFF
August 20, 20241 yr Author 5 minutes ago, JorgeB said: Look for a firmware update, if available. Will do. Also I'm thinking maybe the LSI card is overheating and dropping the connection since it happens right at the end of the 10 hour parity sync...
September 3, 20241 yr Author No idea - gave up on this. Replaced every single thing inside this server except the drives themselves - same issues. I will use my license on a different box and just run Promxox on this one since that seems stable for now.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.