weirdcrap Posted September 21, 2022 Share Posted September 21, 2022 (edited) v6.11-RC5 This is a follow up to this post. I had repaired the errors and was running a final non-correcting check. was running a non-correcting parity check when one of my disks decided to sh*t the bed? or Maybe the sas controller lost contact? Now UnRAID is stuck on a paused "Read Check". This was the last thing before a flood of read and write failures: Sep 21 16:47:26 VOID kernel: mpt2sas_cm1: log_info(0x31110d01): originator(PL), code(0x11), sub_code(0x0d01) Sep 21 16:47:26 VOID kernel: sd 12:0:2:0: [sdo] tag#1652 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=3s Sep 21 16:47:26 VOID kernel: sd 12:0:2:0: [sdo] tag#1652 Sense Key : 0x2 [current] Sep 21 16:47:26 VOID kernel: sd 12:0:2:0: [sdo] tag#1652 ASC=0x4 ASCQ=0x0 Sep 21 16:47:26 VOID kernel: sd 12:0:2:0: [sdo] tag#1652 CDB: opcode=0x88 88 00 00 00 00 00 04 29 0f 48 00 00 04 00 00 00 Sep 21 16:47:26 VOID kernel: I/O error, dev sdo, sector 69799752 op 0x0:(READ) flags 0x0 phys_seg 128 prio class 0 Now it's like the webui is half broken? Neither button to resume or cancel the check works. Firefox acts like it's loading (refresh symbol changes to stop) but nothing ever happens. I tried to change the spin down delay on the failed disk to run an extended smart test and the apply button doesn't work. I tried to run a short smart test and the button for it doesn't work either. I guess at this point I just restart the server to try to restore some semblance of functionality? void-diagnostics-20220921-1650.zip Edited September 21, 2022 by weirdcrap Quote Link to comment
trurl Posted September 21, 2022 Share Posted September 21, 2022 No SMART report for disk13, but disabled/emulated disk13 is mounted and 91% full. I guess you will have to shutdown and check connections Quote Link to comment
weirdcrap Posted September 21, 2022 Author Share Posted September 21, 2022 (edited) 18 minutes ago, trurl said: No SMART report for disk13, but disabled/emulated disk13 is mounted and 91% full. I guess you will have to shutdown and check connections I'm actually remote, probably won't be able to get to it for about a week. I was hoping I could give it a reboot in the meantime to atleast fix the webui so I can run a Smart test and see if the disk has failed or something happened with the connections. Or if someone like johnny had any insight into what the controller errors pointed to. I guess to play it safe I should just try to shut it down until I can get home. How do I initiate a clean power down from the terminal since the webui seems to be borked? Edited September 21, 2022 by weirdcrap Quote Link to comment
JorgeB Posted September 22, 2022 Share Posted September 22, 2022 9 hours ago, weirdcrap said: How do I initiate a clean power down from the terminal since the webui seems to be borked? powerdown Quote Link to comment
weirdcrap Posted September 22, 2022 Author Share Posted September 22, 2022 1 hour ago, JorgeB said: powerdown threw a deprecated error in the logs but seems to have worked. Thanks. I've gone ahead and ordered a new drive which should hopefully be waiting for me when I get home. I'll update this post once I can get my hands on the server and see whats what. Quote Link to comment
weirdcrap Posted September 26, 2022 Author Share Posted September 26, 2022 (edited) So this continues to get stranger. I got home and ran a short smart test on the disk and it passed. I transferred it to another PC to do a long smart test because I had issues with it not spinning down in UD. It passed that as well. Despite it passing I opted to replace the disk anyway. I precleared the new disk and started the parity rebuild. Different slot than the previous disk and everything. Not even an hour later it has also failed throwing a ton of errors. What gives? If it was a cabling issue it shouldn't have followed me to another slot... void-diagnostics-20220925-2144.zip EDIT: CA AppData backup apparently got real pissed that I didn't have the array started for a few days (it may have tried to run a backup) and wouldn't let me stop the array. I ended up having to force the system down so I could move the drive to another slot and try again. EDIT2: so there's something really strange going on here. Now every time I try to stop the array it hangs on unmounting the disk shares. last time it was the CA user share, this time it's the cache drive. syslog.txt EDIT3: I'm giving a rebuild one more go in another slot on an entirely different RAID controller this time. If it fails again hopefully someone here has a better idea on what's going on than I do. Edited September 26, 2022 by weirdcrap Quote Link to comment
Solution JorgeB Posted September 26, 2022 Solution Share Posted September 26, 2022 Disk dropped offline so there's no valid SMART report. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.