Can I run a filesystem check/fix when a drive is down?


Recommended Posts

I have been having all sorts of strange problems with my unraid setup off and on for years now.  Seemingly good drives die, I get disk errors even though smart health and diagnostics show no issues.  Sometimes disks will just go missing and after a reboot everything is fine.  I suspect I have some sort of a power issue that is causing IO errors making unraid think disks are bad and marking them disabled or missing.  Right now I have Disk 3 missing and emulated and I have tried to rebuild it with two other drives, both of which failed.  I have a synology that I want to copy all my files to but even with the Disk3 emulated, entire contents of source folders disappear hours into the copy and then the copy fails.  Occasionally the shares have all disappeared too but a reboot brings them back. Needless to say I have a mess of problems and I just want to get my files off while I still can.  I am guessing I have some corruption in my filesystem and wondered if I could do I file system check with a drive down, or of that would permanently mess something up. I don't want to spend another $100 on a drive for this thing as the drives that are in it are new and failing and there are two disk in it that are empty.  I just want to get my files off and start over without the current disk3 in the array and try a new power supply.    

 

Also, just now, I stopped the array and disk2 is now missing along with disk3, but I know if I reboot it will be back online and showing healthy.

 

Ideas?

tower-diagnostics-20220105-1950.zip

 

Edit****

I put the array into maintenance mode, and ran a filesystem check on the emulated drive, which I didn't know I could do.  It replayed 6 transactions, and now at least some of the missing files are back.

Edited by djonesax
Link to comment
8 hours ago, JorgeB said:

SASLP is not recommended for a long time, good idea to replace it, especially if the disks giving you issues are connected there.

 

Thanks, I understand but I am looking to retire this server and didnt want to replace the disk.  I also have two unused disks in the array, so if I decide to keep it, I can recreate the array and really dont need this disk.  That being said, I dont actually think there is anything wrong with the disks but rather a hardware issue somewhere causing IO errors making Unraid think that the disks are bad and disabling them.  Also, disks randomly have problems or drop offline at times.  Likes yesterday when I would stop the array Disk2 would go missing but a reboot would bring it back. Or times when I would do a directory listing on a large directory, the server would lock up for a bit, and then all my shares would be gone, that or a bunch of files would be missing, but a reboot would fix.  For some reason I ran a file system check on disk3 which fixed 6 errors and ever since I've had no disk problems and my file copies are going fine but I am now getting increasing CRC error counts on disk1.   I've tried new disks, cables, different ports, redistributing power to even out the rails.   The issues are so weird that I'm thinking there is a powersupply or motherboard issue and I didn't feel like going down the debugging road, of replacing parts until I fix the issue, so I just bought a Synology and hoping for a less hands on solution.  Unraid itself is great but the overall experience over 10+ years for me, perhaps mostly related to hardware, hasn't been the greatest.  Unraid is awesome but in my opinion, it requires someone with sysadmin and good hardware knowledge to keep it running optimally and I'm a little burnt out on it.  Sorry for the rant.

Link to comment
9 minutes ago, djonesax said:

I understand but I am looking to retire this server and didnt want to replace the disk.

I didn't say you should replace the disk, but the controller.

 

9 minutes ago, djonesax said:

That being said, I dont actually think there is anything wrong with the disks but rather a hardware issue somewhere causing IO errors making Unraid think that the disks are bad and disabling them. 

Yes, possibly the controller, probably the controller f the issues have been happening on disks connect there.

Link to comment
21 hours ago, djonesax said:

issue that is causing IO errors making unraid think disks are bad and marking them disabled or missing

It doesn't disable a disk because it thinks it is bad, it disables it because it can't write to it for some reason.

 

It has to disable the disk because the failed write makes it out-of-sync with parity. That failed write, and any subsequent writes to the disk, are emulated by parity and can be recovered by rebuilding.

 

Connection problems are much more common than bad disks, in your case it seems controller is to blame.

Link to comment

@JorgeB and @trurl

 

Thanks, it would seem that way I agree and both Disk2 and Disk3 are on the controller.  Parity and Disk1 are on the motherboard and they have had issues too. Maybe it was happenstance but for months it seemed to run better after I spilt the drives evenly across the two PS rails versus all on one rail.  Also it seemed to run better after taking power away from the cache disk.  The PS is at minimum 500W (could be 750W, don't remember and label is hidden) with 6 HDDs and a SSD.  If I had a poor performing powersupply could that cause the controller to act up?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.