ElonMuskie Posted February 25 Share Posted February 25 Hello all, I have had some smart errors, "uncorrectable error cnt" with my cache drive for a while. I used to have a Samsung SSD 860 500 GB in my system that gave me the uncorrectable error cnt error, so I replaced it with a Samsung 870 1 TB. The 1 TB eventually started giving me the same error, but I was unable to figure out any issues with it. I have been busy recently and haven't had much time to maintain my server, but today I went to use it and discovered an "Unable to write to cache" error in the Fix Common Problems popup. Now when I reboot the server and start the array it takes minutes to bring the array on-line, and I am unable to access my shares and the Docker service fails to start. I am assuming this is all related to the cache drive not working properly, but maybe there are additional problems? A previous theory I had was that it was a bad SATA cable or port on the mobo, so I tried a new cable, and a new port on the mobo to no avail. It still does not work. I also tried a new SATA power cable, but that as well did not work. I have a Dell HBA330 SAS controller that I tried connecting to the SSD, but UNRAID would not even see the drive upon reboot. I have two drives in the array on the controller, so I know the controller is working. Does anyone have any ideas what may be causing these issues? Is it just a bad drive? I have attached my diagnostics file to the post. Thanks in advance! the-beast-diagnostics-20240225-1258.zip Quote Link to comment
Solution trurl Posted February 25 Solution Share Posted February 25 SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 70% 4527 13602456 replace 1 Quote Link to comment
ElonMuskie Posted February 25 Author Share Posted February 25 Thanks, @trurl! Is there anything I can do to recover the data on the cache drive, or will I need to start over? Additionally, is there anything I need to do to prevent this from happening again? Quote Link to comment
itimpi Posted February 25 Share Posted February 25 I would try to copy (not move) as much of the data off that drive as you can, and it will then need replacing as it appears to be failing if it cannot complete the SMART test.. You seem to have been a bit unlucky with your SSDs because Samsung are one of the top brands and one does not expect their SSDs to fail early. There should be no setting needed in Unraid to prevent this - it is a hardware issue. 1 Quote Link to comment
ElonMuskie Posted February 26 Author Share Posted February 26 I'll be replacing the SSD with an identical drive (hopefully tonight). @itimpi, since I am unable to view my shares once the array is up, will I be able to unassign the bad SSD and copy the data to the new drive if they are both in Unassigned Devices? Is there another way I should copy the data from the old drive to the new one? I see that I can start the array without the cache drive, but I haven't looked into the implications of what that will do to the system. Quote Link to comment
trurl Posted February 26 Share Posted February 26 You might try unassigning the bad cache drive to see if your user shares come back. Of course, they won't include anything still on cache. You don't need user shares to do the copy anyway. Install Dynamix File Manager plugin. It will let you work with folders and files directly on the server. When you get the new cache assigned, you can try to mount the bad cache as an Unassigned Device, and copy its files to the new cache. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.