CaptainTivo Posted July 21, 2020 Share Posted July 21, 2020 (edited) Hello, This morning my Unraid main page is showing disk5 in an error state but also showing the same disk as unassigned. I've never seen this before, but I assume that the disk is bad and needs to be replaced. I had intended to retire this disk, anyway (old 2 TB disk) so this is a good time. The question is what to do: in the past, when I had a bad disk, I simply replaced it with a new disk of the same size and let Unraid rebuild the array. I have a new disk that I can put into the array, but it is 8 TB, not 2 TB. Should I remove the old 2 TB disk and replace it with the new 8 TB? Will Unraid rebuild the array with that config? If not, can I just copy (e.g. using rsync) the files from disk5 to another disk which has 2 TB of free space and then remove disk5? Attached are a screen shot of the main page and the diagnostics logs. Thanks. tower-diagnostics-20200721-0622.zip Edited July 21, 2020 by CaptainTivo wrong screen cap Quote Link to comment
JorgeB Posted July 21, 2020 Share Posted July 21, 2020 Disk looks fine, most likely an issue with the SASLP, since they are known to drop drives without a reason, could also be a cable/connection issue. Quote Link to comment
CaptainTivo Posted July 21, 2020 Author Share Posted July 21, 2020 Ok, thanks. So I re-seated the connectors and re-booted but the disk still shows disabled. Should I run a parity check to rebuild or what? Quote Link to comment
JonathanM Posted July 22, 2020 Share Posted July 22, 2020 https://wiki.lime-technology.com/Troubleshooting#Re-enable_the_drive Quote Link to comment
CaptainTivo Posted July 23, 2020 Author Share Posted July 23, 2020 OK. I did the re-enable rebuild procedure and it appears to be working fine now. Thanks for the help. 1 Quote Link to comment
CaptainTivo Posted July 23, 2020 Author Share Posted July 23, 2020 Looks like I spoke to soon. (I removed the SOLVED tag. I hope that is OK). This morning, the array is again reporting that disk5 is in an error state. I decided to simply reboot and see if the HBA would work long enough to do a SMART test on the drive. If I was sure it was OK, I would simply copy the (reconstructed) data to another disk and remove disk5. So I reboot and now I get the weirdness where the Main page shows disk5 with the red x and also showing it in the unassigned disks area. BUT, disk5 is NOT showing in the drop down so I can simply re-assign back to disk5. To further complicate things, there is a green dialog box showing "Notice{} - array turned good"!!!! This is clearly not true. Anyway, what to do now? 1) Start the array and hope that it has not forgotten that disk5 existed. I could still copy the data (reconstructed from the other disks and parity) to free space on another disk. 2) start the array in Maintenance Mode and run a SMART test on the disk. If its good, I could always mount it and copy from there. 3) Other? Attached are two diagnotics, one after disk5 was put into the error state (write errors to the disk) and the other after I rebooted to reset the HBA in the current state. tower-diagnostics-20200723-1104.zip - Unraid put disk5 into error. tower-diagnostics-20200723-1620.zip - after reboot but not starting the array. This is the current state of the server. Thanks again for taking the time to help. Also here is the main web page: tower-diagnostics-20200723-1104.zip tower-diagnostics-20200723-1620.zip Quote Link to comment
JonathanM Posted July 24, 2020 Share Posted July 24, 2020 2 hours ago, CaptainTivo said: Anyway, what to do now? Replace the HBA with an LSI based card. On 7/21/2020 at 11:07 AM, johnnie.black said: most likely an issue with the SASLP, since they are known to drop drives without a reason Quote Link to comment
JorgeB Posted July 24, 2020 Share Posted July 24, 2020 You can swap that disk with one using the onboard controller before rebuilding to see if it doesn't happen again, but either way you should replace that controller, they are not recommended for some time due to various known issues. Quote Link to comment
CaptainTivo Posted July 24, 2020 Author Share Posted July 24, 2020 4 hours ago, johnnie.black said: You can swap that disk with one using the onboard controller before rebuilding to see if it doesn't happen again, but either way you should replace that controller, they are not recommended for some time due to various known issues. I think you are right. I have been using the SASLP since I built the machine 9 years ago (version 4!) I had been running 6.7.2 since it came out with no problems but earlier this week I updated to 6.8.3 and this problem started. It could be a coincidence, but it suggestive. As it happens I bought a LSI SAS 9207-8i / LSI00301 a few months ago but did not install it. Question: can I install it now, with the disk in error state? I think I can simply swap out the the card without changing anything in the config, right? Alternatively, I can restore a backup of the 6.7.2 OS and see if I can get the server back to a stable state and then install the new HBA. What do you think? Quote Link to comment
JorgeB Posted July 24, 2020 Share Posted July 24, 2020 10 minutes ago, CaptainTivo said: Question: can I install it now, with the disk in error state? Yep, you can rebuild it after. Quote Link to comment
CaptainTivo Posted July 25, 2020 Author Share Posted July 25, 2020 OK. I replaced the AOC-SASLP with an LSI SAS 9207-8i card and rebuilt the drive. All seems well. Now to proceed with the array shrink. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.