Philby1975 Posted February 18, 2022 Share Posted February 18, 2022 Hi, so confused I'm not sure how to explain all the steps I've taken, however I'll try not to write a novel. I hope I have posted in the right area. I did lots of searching but couldn't find anything similar. Setup: KCAS 850GM PSU, ASUS X570-P M/B, AMD Ryzen 9 3900X CPU, LSI SAS9201-16e 16 Port SAS/SATA 6Gb/s PCIe HBA, 2 8TB Parity, 9 Data Drives of various sizes. Finally purchased a 4 port HBA card (LSI SAS9201-16e 16 Port SAS/SATA 6Gb/s PCIe HBA) and installed some extra drives a couple of weeks ago - all good (only using one port at this stage). Last Friday Disk 2 was in error state (1st Pic) and being emulated - only had one parity at that time. Jumped online and ordered 3 new 8 TB drives with the idea that I would use one to replace disk 2, one for a 2nd parity (which I have been planning on) and one as a spare. Drives arrived yesterday and I used 1 to replace disk 2 - it was marked as having errors soon after parity rebuild began. Tried another new drive - marked as errors soon after parity rebuild began. Swapped Sata cable - same Swapped power cable - same Connected to HBA card - same Swapped Sata port, and started in maintenance mode - rebuilt at over 100MB/sec but Disk 4 showed read errors. This afternoon when the rebuild was complete, I simply restarted the server and Disk 4 still showed error state and being emulated. Shut down and added 2nd Parity Drive Restarted and now showing that Disk 2 is unmountable Disk 4 is in error state and disabled Started in maintenance mode and was able to rebuild 2nd parity. Remove disk 4... put disk 4 on a windows computer, did a clean and a scan ... no problems. Put back into unraid and rebuilt disk 4 and disk 9 went... just simply chasing my tail. ... A few days later. Swapped Disk 9 this morning and Array would not start, claiming that it was the wrong encryption passphrase. Checked a dozen times that it was copying and pasting correctly, swapped back to the original Disk and tried another dozen times to get the array started. Ended up shutting down as I had other more urgent things to do... then started it up again and Array started first go with the same copy and paste from where I keep it. Total mystery to me, but getting nervous about all these 'weird little issue' that keep happening. Yesterday had some time so I swapped disk 9, but then disk 4 was 'missing'. Disk 4 is recognised in a separate Ubuntu VM (physical disk attached) and I can even put in the encryption key and go through the directory so the data seems like it's all there. Tried replacing Disk 4 with my last new 8TB and it allows me to select it and then just says 'wrong disk' Took Disk 4 back to windows PC. Did a clean and created a partition, now the BIOS (unraid machine) can see it, but unraid does not show up with lsblk --scsi Currently can't start the array, not even in maintenance mode as it's greyed out. I just can't understand why it was working fine until I added the last couple of drives. It doesn't seem to have anything to do with the SATA leads, the power leads, or the HBA card as I have interchanged all those things so many times. I'm not sure if there is any way to see it BIOS is detecting a drive that is attached to the HBA card, but again... it doesn't seem to matter how the disk is attached. Any help would be greatly appreciated. tower-diagnostics-20220218-1636.zip Quote Link to comment
JorgeB Posted February 18, 2022 Share Posted February 18, 2022 3 hours ago, Philby1975 said: Currently can't start the array, You need to enter the old encryption passphrase twice, just the 2nd one is filled. About the disks errors diags are after rebooting, so not much to see. Quote Link to comment
Philby1975 Posted February 18, 2022 Author Share Posted February 18, 2022 13 hours ago, JorgeB said: You need to enter the old encryption passphrase twice, just the 2nd one is filled. About the disks errors diags are after rebooting, so not much to see. Wow, totally missed that. I've never had to put it in twice before... is there a reason that it is suddenly asking for that? Thank you so much for your reply. So I put in the passphrase twice and it was going to let me start the array, however this would have been with 2 disks down so I shutdown, replaced disk 4 with my last 8TB drive and when I restarted I had Disk 4,5, and 9 out of action (see pic). Attached is the Diagnostics before reboot. Going to return to previous drive config and see what happens. Guessing that to start the array right now would be a bad bad thing... but also quite confident that the data is still intact on Disk 5. Disk 9 had nothing on it anyway, but I really need Disk 4 to rebuild. tower-diagnostics-20220219-0922.zip Quote Link to comment
Philby1975 Posted February 18, 2022 Author Share Posted February 18, 2022 So back to previous drive config and Unraid see the same as it saw before... Did not touch disk 5 on either of the previous reboots, however changing disk 4 (also changed from onboard Sata to HBA Sata) and suddenly it's back. Don't really want to start the array like this so I'm going to replace disk 9... original disk but it has been cleaned and scanned (0 errors) on a windows machine tower-diagnostics-20220219-0935.zip Quote Link to comment
Philby1975 Posted February 18, 2022 Author Share Posted February 18, 2022 Ok, so far so good. I also plugged this in via onboard Sata and not HBA. Will restart and rebuild. In the meantime will also clean and scan disk 4. Hoping the Diags can provide some information... Any tools to test a hba card? This Diag is before starting the array. Will post another after the array is started. tower-diagnostics-20220219-0943.zip Quote Link to comment
Philby1975 Posted February 19, 2022 Author Share Posted February 19, 2022 (edited) Not sure if it might help, but around the same time that the first disk failed I noticed that write cache was disabled on a few disks. Manually enabled it, but it seems that each time I've restarted or changed the disk config more are disabled on boot so now I'm manually enabling most of them each time it boots.tower-diagnostics-20220219-1101.zip Diagnostics file is after the array has been running for a while. Preclear in progress, but stopped the rebuild on Disk 9 as it was running extremely slowly (40 odd days for a 1TB disk). Thanks in advance for any light you can shed on this issue. Edited February 19, 2022 by Philby1975 Quote Link to comment
JorgeB Posted February 19, 2022 Share Posted February 19, 2022 10 hours ago, Philby1975 said: is there a reason that it is suddenly asking for that? I don't use encryption but it has been known to happen when there's a missing/upgraded disk. 10 hours ago, Philby1975 said: when I restarted I had Disk 4,5, and 9 out of action Disks are not being detected, also some issues with disk7 in this boot, so likely you have a power/connection problem, should also update the LSI firmware since it's very old. 9 hours ago, Philby1975 said: Diagnostics file is after the array has been running for a while. These show errors on two disks connected to the onboard SATA controller, again suggesting a power/connection problem. Quote Link to comment
Philby1975 Posted February 19, 2022 Author Share Posted February 19, 2022 Thank you so much mate. This is really appreciated as I've invested a lot into Unraid and preached far and wide amongst friends about how great it is. So power is more likely the issue than the HBA card? I will start with updating the firmware... any idea of where I might find a good simple walk through? I wouldn't put my hand up for being stupid, but linux is not my native environment. Have just put the third different disk into Disk 9 for rebuild (after preclear) and it's still going at 3-400 kb/s rather than 100+ MB/s which is what I would expect. I'm guessing this may all be part of the same problem? Quote Link to comment
JorgeB Posted February 20, 2022 Share Posted February 20, 2022 15 hours ago, Philby1975 said: So power is more likely the issue than the HBA card? Yep, since there are also issues with disks connected to the onboard SATA controller. Quote Link to comment
Philby1975 Posted February 20, 2022 Author Share Posted February 20, 2022 Thankyou. Will let you know how I go. Quote Link to comment
Philby1975 Posted February 25, 2022 Author Share Posted February 25, 2022 Hi JorgeB, I hope you are well. I updated the HBA card (it was advertised as having P19... was surprised to see 9), and I have rigged another power supply to the hard drives. Had some good signs and got down to only 1 hard drive with issues at one point. Now back to three. I'm hoping to borrow some hardware tomorrow to swap the whole thing over to different gear and test as I'm starting to wonder if it's not a hardware issue. Reason for this is that I have had disks show up as SSD that aren't, A disk that was selected for the array but was actually an unnassigned device... also I have removed a bunch of dockers but they are still showing up as 'needing updates' when I run fix common problems. I have attached the diags again the way it is now with the updated HBA and separate PSU for the hard drives. Hoping it might give you some more clues. Appreciate your help. tower-diagnostics-20220225-1903.zip Quote Link to comment
JorgeB Posted February 25, 2022 Share Posted February 25, 2022 Diags are after rebooting, so not much to see, other than filesystem corruption on disks 4 and 9. Quote Link to comment
Philby1975 Posted March 6, 2022 Author Share Posted March 6, 2022 Hi JorgeB, just wanted to update you. Actually fried 10 of my 11 disks while testing and trying... obviously had an impact so I completely forgot about all the help you'd provided. Was very much appreciated so I didn't want to just disappear. Have shelved everything for now, and have all the important stuff on cloud backup... but one day soon I will start again. I might even swap the board on some of the drives and see if I can get anything back. Again, thank you for all your help. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.