willdouglas Posted March 7, 2020 Share Posted March 7, 2020 (edited) Had some drives fail, luckily within warranty. First drive was replaced with a "hot spare" drive I leave in the server but don't allocate. During rebuild another drive failed but rebuild completed. First drive was removed and replaced, and I tried to begin the next rebuild but I observed some funny behavior that persisted through reboots. The array had difficulty starting and stopping, missing VMs, missing docker apps. Issues cleared when I pulled the second failed disk out. I also moved all drives to the SATA ports since all the failures were on the same SAS breakout. The rebuild completed, but another drive reported as bad during the juggling. I'm pretty sure I went beyond my failure limit at some point during the rebuilds and all the swapping around, my disk space utilization has dropped about 5TB from start to finish. My array is currently reporting good drive health but it's also reporting no user shares and when I try to start VMs I'm getting an error. I thought I might be able to add the shares back, but when I try nothing happens and "starting services..." is added to the status at the bottom of the GUI. I'm not looking to get back to where I was, more just wondering how to make the media I can recover available so it can be pulled off. Ideally I'd like to recover what I can and re-provision the whole setup from scratch. There have been some hardware and configuration changes to the machine through it's life and I'm sure I've got a nice pile of mistakes stacked precariously on top of other mistakes. I don't want to make things worse for myself during the recovery process. There is nothing I care about on this array, but there is a lot of stuff so I'd like to save time/bandwidth/admin work if possible. cadance-diagnostics-20200307-1231.zip Edited March 7, 2020 by willdouglas grammer/readability. Quote Link to comment
Squid Posted March 7, 2020 Share Posted March 7, 2020 You need to run the File System Check on disk 3 1 hour ago, willdouglas said: My array is currently reporting good drive health Doesn't appear that you've rebuilt onto disk 3 yet either. Either way, you still are going to have to do the above Quote Link to comment
willdouglas Posted March 7, 2020 Author Share Posted March 7, 2020 Will wait for rebuild completion, run file system checks, and report back. Thanks! Quote Link to comment
willdouglas Posted March 8, 2020 Author Share Posted March 8, 2020 Rebuild completed, disk check/repair run, user shares available but several show empty when navigated over the network. I can browse the filesystem, for instance my TV directory, from the webgui but actually attempting to access over the network displays an empty directory. cadance-diagnostics-20200308-0943.zip Quote Link to comment
willdouglas Posted March 8, 2020 Author Share Posted March 8, 2020 If there's a low effort way to get the shares back to normal I'll give it a shot, but I think I can pull from the individual drives via SCP and get a decent transfer rate. I'll sort it out on the other side. Quote Link to comment
Squid Posted March 8, 2020 Share Posted March 8, 2020 Now theres problems on disk 2 wanting you to run the checks on it. Wouldn't be a bad idea to run a memtest via the boot menu to rule it out as a possible cause. Quote Link to comment
willdouglas Posted March 9, 2020 Author Share Posted March 9, 2020 Ran the file system check and let it correct the errors, the outcome wasn't pretty. I did upgrade RAM in this host a few weeks before the first failure. Leaving memtest running overnight and tracking down some spare DIMMs. I also checked the drives that have been pulled in a different machine, they're definitely a mixture of dead/dying/busted. One refuses to spin up at all and the other two throw tons of errors but will allow me to view the directory structure before refusing to function beyond that. Quote Link to comment
willdouglas Posted March 21, 2020 Author Share Posted March 21, 2020 Had some life events get in the way of my troubleshooting. Memtest wouldn't boot for me from the unraid install, so I pulled the newest version and ran that. It took a few days but came out perfect, four passes and no errors. I've moved on to full rebuild mode in the interest of having a working system in place during coronavirus lockdown. New bootable USB drive with a fresh image of 6.8.3. Currently pre-clearing all my disks because I was seeing errors on one of the new drives already. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.