mdizzle Posted July 2, 2020 Share Posted July 2, 2020 I am at my wits end. I have had an unRaid server for almost 10 years. I recently started having crashes with my system that had 10 year old hardware. When it crashes, it is not responsive in anyway, no video, no keyboard response, nothing. I have to cold shutdown and reboot to get back into it. First thing I did was upgrade unRaid to the lastest, didn't have any issues with that. But then it still was crashing. I decided it was a good time to upgrade the hardware. My friend and I purchased two sets of mobo, processor, memory and controllers from the same vendor. He installed his and it worked fine. I installed mine and it's still been crashing, especially when doing a parity sync. I've tried everything, mem tests, new PSU, new controllers, it would still crash when running the parity sync/rebuild. The only thing that is not different is the case and the actual drives. I replaced the mobo with a new identical one and that seemed to help. I was able to start the partiy sync and it ran for over 24 hours and then it crashed. I have a feeling something is going on with a drive but i can't seem to fix it. Every time I run a parity sync, it crashes. Currently it shows one drive as unmountable and the parity drive as emulated. I can't get through a full parity sync/rebuild. I'm attempting an xfs_repair on the problem drive after reading numerous posts to try to figure out the issue. That's been running for a while now, it's a 10tb drive. I've attached my diagnostics. If anyone has any thoughts, i could use the help, this has become very frustrating. tower-diagnostics-20200702-1121.zip Quote Link to comment
mdizzle Posted July 3, 2020 Author Share Posted July 3, 2020 The xfs_repair was running and the system seemed to crash so i don't know what happened with it. Had to reboot and the drive still shows as unmountable. Quote Link to comment
Vr2Io Posted July 3, 2020 Share Posted July 3, 2020 Error prompt on LSI storage controller, try update its BIOS and firmware Jul 2 07:55:02 Tower kernel: mpt2sas_cm0: LSISAS2308: FWVersion(18.00.00.00), ChipRevision(0x05), BiosVersion(07.35.00.00) Jul 2 09:40:39 Tower kernel: sd 9:0:0:0: [sdc] tag#6379 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 Jul 2 09:40:39 Tower kernel: sd 9:0:0:0: [sdc] tag#6379 CDB: opcode=0x88 88 00 00 00 00 03 a3 81 29 c0 00 00 00 08 00 00 Jul 2 09:40:39 Tower kernel: print_req_error: I/O error, dev sdc, sector 15628052928 Jul 2 09:40:57 Tower kernel: sd 9:0:1:0: [sdf] tag#6379 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 Jul 2 09:40:57 Tower kernel: sd 9:0:1:0: [sdf] tag#6379 CDB: opcode=0x88 88 00 00 00 00 05 74 ff ff 40 00 00 00 08 00 00 Jul 2 09:40:57 Tower kernel: print_req_error: I/O error, dev sdf, sector 23437770560 Jul 2 09:41:07 Tower kernel: sd 9:0:2:0: [sdj] tag#6379 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 Jul 2 09:41:07 Tower kernel: sd 9:0:2:0: [sdj] tag#6379 CDB: opcode=0x88 88 00 00 00 00 03 a3 81 29 c0 00 00 00 08 00 00 Jul 2 09:41:07 Tower kernel: print_req_error: I/O error, dev sdj, sector 15628052928 Jul 2 09:41:17 Tower kernel: sd 9:0:4:0: [sdl] tag#6379 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 Jul 2 09:41:17 Tower kernel: sd 9:0:4:0: [sdl] tag#6379 CDB: opcode=0x88 88 00 00 00 00 04 8c 3f ff 40 00 00 00 08 00 00 Jul 2 09:41:17 Tower kernel: print_req_error: I/O error, dev sdl, sector 19532873536 Jul 2 09:41:26 Tower kernel: sd 9:0:3:0: [sdk] tag#6379 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 Jul 2 09:41:26 Tower kernel: sd 9:0:3:0: [sdk] tag#6379 CDB: opcode=0x88 88 00 00 00 00 05 74 ff ff 40 00 00 00 08 00 00 Jul 2 09:41:26 Tower kernel: print_req_error: I/O error, dev sdk, sector 23437770560 Jul 2 09:41:36 Tower kernel: sd 9:0:5:0: [sdm] tag#6379 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 Jul 2 09:41:36 Tower kernel: sd 9:0:5:0: [sdm] tag#6379 CDB: opcode=0x88 88 00 00 00 00 04 8c 3f ff 40 00 00 00 08 00 00 Jul 2 09:41:36 Tower kernel: print_req_error: I/O error, dev sdm, sector 19532873536 Jul 2 09:41:46 Tower kernel: sd 9:0:6:0: [sdn] tag#6379 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 Jul 2 09:41:46 Tower kernel: sd 9:0:6:0: [sdn] tag#6379 CDB: opcode=0x88 88 00 00 00 00 03 a3 81 29 c0 00 00 00 08 00 00 Jul 2 09:41:46 Tower kernel: print_req_error: I/O error, dev sdn, sector 15628052928 2 hours ago, mdizzle said: drive still shows as unmountable. This is another problem as file system corrupt. Quote Link to comment
mdizzle Posted July 3, 2020 Author Share Posted July 3, 2020 Thanks Benson. I will do that and post the results. Quote Link to comment
mdizzle Posted July 3, 2020 Author Share Posted July 3, 2020 I'm having a hell of a time finding the right bios and firmware update. Can anyone point me in the right direction? Quote Link to comment
JorgeB Posted July 4, 2020 Share Posted July 4, 2020 Go to Broadcom's support site, it's under legacy. Quote Link to comment
mdizzle Posted August 27, 2020 Author Share Posted August 27, 2020 Well, after a long road of hard drive repairs and what not, my array is back but it still crashes and I don't know why. I've attached my diagnostics, if any one can look and see what might be the issue. Thanks. tower-diagnostics-20200826-1632.zip Quote Link to comment
Vr2Io Posted August 27, 2020 Share Posted August 27, 2020 First, pls avoid write to disk, if crash again during write then file system may corrupt again. Next, troubleshoot - Stop auto start array - Stress Memory, i.e. memory test - Stress on CPU, i.e. mount UD disk, perform "md5sum file" in multiple session or anything which could load CPU - start array in maintenance mode. "dd" READ test on all disks - If possible, change another PSU ...... .... ... Perform test in step by step to ensure each part normal ...... I notice you change the HBA to Marvell 88SE9485 Quote Link to comment
JorgeB Posted August 27, 2020 Share Posted August 27, 2020 4 hours ago, mdizzle said: my array is back but it still crashes and I don't know why. Please explain in more detail where/when it crashes. You're using a SAS2LP (or similar) controller and those are not recommended for a long time. Quote Link to comment
mdizzle Posted September 10, 2020 Author Share Posted September 10, 2020 Johnnie, Random crashes mostly but they seem to happen more frequently when running a parity check. In the past, i saw some drive issues and i replaced a problem drive hoping it would help and it did but i still get the crashes. I have tried 4 different controllers including two different recommended LSI controllers. In addition, my friend and I bought the same mb, chip, memory and LSI controllers and has the same issues. the SAS2LP controller you refer to, i had in my old machine that ran for 9 years before it started having issues, hence the upgrade. I've tested the memory before and came up clean. I haven't tested the HDD specifically but I would've expected some warning from unRaid if the drives were going bad. It's frustrating because i've tried everything short of changing the chip which is expensive and possibly replacing the memory. Quote Link to comment
JorgeB Posted September 10, 2020 Share Posted September 10, 2020 6 hours ago, mdizzle said: the SAS2LP controller you refer to, i had in my old machine that ran for 9 years before it started having issues, hence the upgrade. SAS2LP used to wok great with v5, but since v6 many users start having problems with it (same for SASLP which uses the same driver), many times issues begin after an Unraid upgrade out of the blue, doesn't mean it's the problem, since it doesn't usually makes the server crash, but definitely not recommended. The last diags you posted are just after rebooting, so not much to see, if it keeps crashing, or you can make it crash, use this then post that syslog. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.