elecgnosis Posted November 11, 2023 Share Posted November 11, 2023 (edited) After moving recently, I started a new parity check to see whether my disks were still okay. Later, I checked in on it and it seemed to have ran for about six hours, then quit with read errors on disk 1 and connectivity problems with five other drives. There were also UDMA errors on a hot spare that didn't even have any data on it, besides a preclear record. I shut down the server without downloading diagnostics (sorry). I checked how secure data and power cables were, moved some drives around in the server's slots, then powered back on. Next I unassigned the drive with read errors and mounted it in unassigned devices. I ran a BTRFS scrub. It returned an exit code of 0, meaning no errors. Then I ran an extended SMART scan on it and all of the other drives that had connectivity problems. Hopefully those show up in the diagnostics I've included here. Today, I tried reassigning my hot spare to the array in the position that the read error disk had. I accepted that the replacement drive would be rebuilt over and started the array. The operation didn't run for very long before the replacement drive came up with read errors, putting it in an unmountable state. I tried to stop the array, but none of the server buttons in Main are responding. I tried stopping the read-check as well, but nothing is responding there. I can click around the interface to do other things, just not this. I was able to pull both diagnostics and the system log. I also noticed that the system log had an error for /var/log being 97% full. I don't know if that's why the UI is acting like it is. The last time I was dealing with something like this, I remember finding something on the forum saying that stopping loop2 could get it to work again, but I haven't tried that yet. The BTRFS scrub leads me to believe that the original drive with the read error is okay. Should I just do a new config? I'll take any other steps you folks think I should try, otherwise. Edited November 11, 2023 by elecgnosis probable power supply issue, no need to keep open. Quote Link to comment
elecgnosis Posted November 11, 2023 Author Share Posted November 11, 2023 Forgot the files, here they are. omni-diagnostics-20231110-1910.zip Quote Link to comment
Solution JorgeB Posted November 11, 2023 Solution Share Posted November 11, 2023 Looks more like a power/connection problem. 6 hours ago, elecgnosis said: The BTRFS scrub leads me to believe that the original drive with the read error is okay. Should I just do a new config? I would do a new config with old disk1, after checking/replacing cables, or using a different PSU. Quote Link to comment
elecgnosis Posted November 11, 2023 Author Share Posted November 11, 2023 Thanks. Do you have any guidance for the UI not responding? I can force a shutdown if I have to, I'm just trying to avoid it. Quote Link to comment
JorgeB Posted November 11, 2023 Share Posted November 11, 2023 Type 'reboot' in the CLI, if it doesn't reboot after 5 or 10 minutes you will need to force it. Quote Link to comment
elecgnosis Posted November 11, 2023 Author Share Posted November 11, 2023 (edited) I started the array with a new config and almost immediately got errors on a different drive than I was having trouble with before. I'm going to go through the same Smart/Scrub checks, but I'm not expecting any surprises. Previously, I was running off of a voltage-regulating UPS. I didn't keep it during the move, so I wonder if that's the secret sauce here. I'm using an 850W power supply that has been rock-solid otherwise. I have a new UPS coming, but not for a couple weeks. If I still have trouble on the other side of that, I'll look into replacing the power supply. Marking this as solved for now. Edited November 11, 2023 by elecgnosis Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.