TimV Posted February 10, 2021 Share Posted February 10, 2021 I'm running 6.9-rc2 with 6 8tb Iron Wolf drives and a 10tb parity drive. Installed on my system is a Broadcom SAS 9300-8i HBA. I can't go to an earlier version of unRaid because it won't work with my MB. (6.8.x would not work with my MB NIC). MB is Asus Tuf Gaming B460M-PLUS. I originally hooked up all but my parity drive to the HBA and was getting random read errors which would kick drives out of the array. Sometimes it was 100+ errors before it got kicked out, other times it's 1 error and BAM--- gone! I posted my diag logs here and was told to upgrade the hba firmware (done) and check the cables. I upgraded the firmware and changed to different cables while at the same time, moving as many drives as possible to the MB sata connectors. This has greatly reduced my drive errors and a drive that had tons of errors hasn't had a peep so I'm 99.9% sure it's an HBA issue and not an actual drive issue. Another interesting item is that this mainly happens at night... as if there is a sequence of events that happens to render the supposed read error. The remaining HBA drive got kicked out yesterday and I'm tired of always having to rebuild parity so I want to either yank it or just remove the drive from my share. I'm currently using unBalance to move the data off that drive to my other drives and would rather not have to rebuild the array so my idea was to go into my share definitions and just exclude that drive from all my shares. My theory is that the drive will just sit there as part of the array as an idle member until I decide on a course of action. As long as it's idle, there is a much reduced chance of it getting kicked out. I don't know if my version of unRaid isn't playing nicely with the HBA or if the HBA is just finicky. Maybe I'll get another HBA? Any recommendations on a new HBA or just in general? Quote Link to comment
trurl Posted February 10, 2021 Share Posted February 10, 2021 1 hour ago, TimV said: My theory is that the drive will just sit there as part of the array as an idle member until I decide on a course of action. If another drive fails, that drive will be used with all the others to rebuild the failed drive. Any unreliable disk in the array puts others at increased risk. 1 hour ago, TimV said: Any recommendations on a new HBA or just in general? If possible before rebooting and preferably with the array started Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread. Quote Link to comment
TimV Posted February 10, 2021 Author Share Posted February 10, 2021 Here are my diagnostics, but I shut the system down last night after parity was rebuilt to avoid another surprise this morning. cygnus-diagnostics-20210210-1220.zip Quote Link to comment
trurl Posted February 10, 2021 Share Posted February 10, 2021 6 minutes ago, TimV said: parity was rebuilt Do you mean you rebuilt parity without the problem disk assigned? Quote Link to comment
TimV Posted February 10, 2021 Author Share Posted February 10, 2021 2 hours ago, trurl said: Do you mean you rebuilt parity without the problem disk assigned? No, I rebuilt parity with all original drives included. Disk2 is the black sheep drive. WKD32TB8 Quote Link to comment
TimV Posted February 12, 2021 Author Share Posted February 12, 2021 Just had some read errors not long after I started a move. 896 read errors just popped up and it's still in a valid disk whereas other times I get 1 read error and it's kicked out. Attached is the diagnostics. I don't have any user data on there, there are 55G of space used on there, not sure if there are any valid files. When I go in there as root from the command line, I do an ls -lia and nothing shows up. My 30+ years of unix experience tells me there is not really anything there, but you never know. cygnus-diagnostics-20210212-1105.zip Quote Link to comment
trurl Posted February 12, 2021 Share Posted February 12, 2021 1 hour ago, TimV said: 896 read errors just popped up and it's still in a valid disk whereas other times I get 1 read error and it's kicked out. Unraid disables a disk when a write to it fails. If the failed read can't be recovered from it will get the data from the parity calculation and try to write it back to the disk and if that write fails the disk is disabled. Run an extended SMART test on that disk. Quote Link to comment
TimV Posted February 12, 2021 Author Share Posted February 12, 2021 I'm trying to run the extended test. So far, I've done it twice and it just sits there with a spinning circle reporting 10% done for over an hour. I see the disk is spun up and activity is happening. My guess is something isn't right or it's hitting every sector and won't update the percentage until after it's done. Quote Link to comment
trurl Posted February 13, 2021 Share Posted February 13, 2021 2 hours ago, TimV said: it's hitting every sector yes, extended test takes several hours. Quote Link to comment
TimV Posted February 13, 2021 Author Share Posted February 13, 2021 Many hours later, I'm at 50%. Maybe it'll be done in the morning. lol Quote Link to comment
JorgeB Posted February 13, 2021 Share Posted February 13, 2021 SMART report for that disk will show the estimated time for an extended test, e.g: Extended self-test routine recommended polling time: ( 983) minutes. Quote Link to comment
TimV Posted February 13, 2021 Author Share Posted February 13, 2021 It was finished this morning with no errors. Quote Link to comment
TimV Posted February 13, 2021 Author Share Posted February 13, 2021 ST8000VN004-2M2101_WKD32TB8-20210212-2308.txt Quote Link to comment
JorgeB Posted February 14, 2021 Share Posted February 14, 2021 On 2/10/2021 at 2:51 PM, TimV said: Any recommendations on a new HBA or just in general? The one you're using is a good option, but it might be a fake or just some other problem/compatibility issue, for a list of recommended controllers see here: Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.