September 11, 20232 yr Im not sure whats going on. I backed up all my data and decided to move over to ZFS. I created a zfs cache and single backup drive using spaceinvadersone's videos and it went fine. I then tried to create a 7x 14tb pool and everytime it asks me to format it fails and disables the drive. Then when i click format again it disables the next drive, and so continues each time i click format. I tried creating a raid 0 setup, and RaidZ1 setup and both fail. Am i doing something wrong? nas-diagnostics-20230911-1139.zip
September 11, 20232 yr Community Expert Devices are reporting as busy: Sep 11 08:38:13 NAS emhttpd: shcmd (4290): /sbin/wipefs -a /dev/sdd Sep 11 08:38:13 NAS root: wipefs: error: /dev/sdd: probing initialization failed: Device or resource busy Reboot and try to re-format after array start, if it still fails post new diags.
September 11, 20232 yr Author 9 minutes ago, JorgeB said: Devices are reporting as busy: Sep 11 08:38:13 NAS emhttpd: shcmd (4290): /sbin/wipefs -a /dev/sdd Sep 11 08:38:13 NAS root: wipefs: error: /dev/sdd: probing initialization failed: Device or resource busy Reboot and try to re-format after array start, if it still fails post new diags. That fixed it! thank you. Since im here, Which setup would be best for access to 2 future drive failures. 1) 6x 14tb with 2 14tb parity drives 2) 7x 14tb in Raidz1 with 1 parity drive and one zfs drive failure? 3) 8x 14tb in Raidz2
September 11, 20232 yr Community Expert 25 minutes ago, buster84 said: 7x 14tb in Raidz1 with 1 parity drive and one zfs drive failure? This is not possible, the other two options have similar redundancy, array has the advantage that if more than 2 drives fail the remaining could be access, raidz has a clear performance advantage.
September 11, 20232 yr Author 5 minutes ago, JorgeB said: This is not possible, the other two options have similar redundancy, array has the advantage that if more than 2 drives fail the remaining could be access, raidz has a clear performance advantage. Ok that makes sense. so im left with option 1 and 3. Witch one would have faster read/write speeds? Option 1 with raid0 and duel parity or option 3 using raidz2?
September 11, 20232 yr Author 49 minutes ago, JorgeB said: Option 3 will be much faster, with option 1 there's no raid. Perfect thank you!
September 15, 20232 yr Author On 9/11/2023 at 12:51 PM, JorgeB said: Option 3 will be much faster, with option 1 there's no raid. Ok, after a few days i became indeciive. I went back to xfs and ran the duel parity check. Then deleted that, made every drive zfs but without pools, then redid parity. Finished but this time one of the disks had IO error. Quote pool: disk3 state: SUSPENDED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC config: NAME STATE READ WRITE CKSUM disk3 ONLINE 0 0 0 /dev/md3p1 ONLINE 3 73 0 errors: 11 data errors, use '-v' for a list Im going to run a full smart extended test on this once i figure out how to restart the server, so far i havnt been able to shut the system down or stop the array, its stuck at stopping. In out previous conversation i updated the firmware on the lsi sas and now with this new error im wondering what else it could be? Is this truly hardware that i need find out that is failing or maybe a bug with zfs? Also without forcfully turning it off with the power button, is there anyway to force a shut down in terminal? Its stuck right now. I posted the diagnostics. nas-diagnostics-20230915-1024.zip
September 15, 20232 yr Community Expert It's not logged as a disk problem, could be power/connection, but I've seen some issues with SAS2 LSI controllers and large capacity Seagate disks, so it could also be that, SAS3 models usually don't have these issues. 4 minutes ago, buster84 said: Also without forcfully turning it off with the power button, is there anyway to force a shut down in terminal? Try typing 'reboot' on the CLI, if it doesn't do it after 5 minutes you'll need to force it.
September 15, 20232 yr Author 55 minutes ago, JorgeB said: It's not logged as a disk problem, could be power/connection, but I've seen some issues with SAS2 LSI controllers and large capacity Seagate disks, so it could also be that, SAS3 models usually don't have these issues. Try typing 'reboot' on the CLI, if it doesn't do it after 5 minutes you'll need to force it. Ok. using terminal didnt work. I force restarted it. I then tookout the "LSI" card and checked the thermal paste. Im not sure if it was completly dried out or a specific type that was used, but it was rock solid, had to use a plastic card to scrape it off the heat synce and the chip die along with soaking it in alcohol. Eventually i cleaned it up and threw on new thermal paste. Hopefully this fixes my issues, if not the next step will be a new LSI card that is sas3 (wish i realized i was buying a sas2, need to look out for that next time). After this error should i re-run parity or since it say's that its valid does that mean it wrote corrections to the drive if it needed them already?
September 15, 20232 yr Community Expert 28 minutes ago, buster84 said: After this error should i re-run parity You should run it again.
September 15, 20232 yr Author 2 hours ago, JorgeB said: You should run it again. Ok, wasnt sure, but i copied a little data over to test and disk 3 got disabled. Was this the LSI? or another error? I thought at first maybe the power supply, but now im really suspecting the LSI card and might just buy another one as i cant even restore my server with it continuing to happen. Another thought is maybe the lsi to sasa cable is bad. nas-diagnostics-20230915-1439.zip Edited September 15, 20232 yr by buster84
September 15, 20232 yr Author Ok so to test i pulled 2 drives off the lsi card and removed one of the cords. I only have 4 drives on the lsi now. I then tried to do xfs_repair and realized their formated zfs LOL. I now see that doing it this way may not be the best. I either do all xfs, or do a xraidz2. Is there a repair method for zfs like xfs? If not I'll just reformat everything and do the pool again. edit: went ahead and did xfs, leting the parities rebuild now. Edited September 15, 20232 yr by buster84
September 17, 20232 yr Author Well i didn’t get far in the parity rebuild and disk 3 became disabled. I then swapped the cords with another disk to see if it followed the cord and it didn’t. I then noticed more disk read errors so i didn’t do anything after that except run the smart extended test. took 20ish hours and passed. I realize now that in my old setup that worked without any issues had 6 14tb drives on the sas motherboard controller and only 3tb drives on the lsi, so I’m thinking your right that this controller is failing to work with the 14tb drives. I’m going to move at least 6 of them to my motherboard then leave the last 2 on the lsi and see if maybe it can handle just 2 drives for now. I’m going to have to order another one and do some research to make sure i get a sas3 version. Do you have any recommendations for any specific cards to look for since i thought i bought a sas3 3.0card already (lsi2308)? Im also posting my logs encase you see something else. nas-smart-20230917-0651 (drive3).zip nas-diagnostics-20230917-1004 (9:17).zip Edited September 17, 20232 yr by buster84
September 18, 20232 yr Community Expert 16 hours ago, buster84 said: i bought a sas3 3.0card already (lsi2308) SAS2308 is a SAS2 model (usually LSI 9207), I've never seen any issue with SAS3 models (SAS3008), e.g. LSI 9300-8i
September 18, 20232 yr Author 8 hours ago, JorgeB said: SAS2308 is a SAS2 model (usually LSI 9207), I've never seen any issue with SAS3 models (SAS3008), e.g. LSI 9300-8i I'll be on the lookout for one. Thank you. quick question, i contacted seagate because my read error rate was 14million and they said i can swap it out for a refurbished on since its under warranty, should i swap it out or is the hard drive interpreting the erros from the LSI and not real errors? Edited September 18, 20232 yr by buster84
September 18, 20232 yr Community Expert 11 minutes ago, buster84 said: my read error rate was 14million That's usually normal with Seagate drives, see here to see the actual errors, if any: https://forums.unraid.net/topic/86337-are-my-smart-reports-bad/?do=findComment&comment=800888
September 18, 20232 yr Author 1 hour ago, JorgeB said: That's usually normal with Seagate drives, see here to see the actual errors, if any: https://forums.unraid.net/topic/86337-are-my-smart-reports-bad/?do=findComment&comment=800888 Ok thank you, thats a relief. Time to go shopping. edit: In my search for a 9300 i came accross a thread talking about the sas controllers overheating. I took a thermal gun and noticed that its at 180-190f. I read max is 55c of 131f so its definitly overheating at the heatsync. I decided to buy a small fan for the card and see if it makes a difference. It'll arrive tomorrow. I know these cards have issues on their own, but im wondering if maybe its caused by overheating. I cant return this card so i figure ill give this a try first and report back. edit2: Interesting. I turned all my fans on high and now its reading around 100-130f. Still hot, but the parity check isnt showing an errors this time and the speed is at 250mbps vs 150mbps before when i started parity. Edited September 18, 20232 yr by buster84
September 20, 20232 yr Author On 9/18/2023 at 2:54 AM, JorgeB said: SAS2308 is a SAS2 model (usually LSI 9207), I've never seen any issue with SAS3 models (SAS3008), e.g. LSI 9300-8i I think i figured out the issue. So far everything has been working fine since i increased my internal case fan speeds. I also threw a 40mm fan on the lsi heat sync yesterday and its doing much better. Ive been able to copy data back over without an errors popping up like before. I'm starting to think the issue was that it was overheating since it was made for a server not a desktop. Thank you for all your help, if it happens again ill update the thread and buy the 9300-8i or a 9300-16i. Edited September 20, 20232 yr by buster84
September 25, 20232 yr Author Just updating the post. Its been almost a week now and this is what i noticed. When i had 4 14tb drives connected to the lsi disk 3 got disabled again shortly after starting my data copy. I then moved disk 3 to the motherboard sata connector and rebuilt the drive, its been running for 5 days now without any issues at all. So either my card caps rejects 56gb of capacity and only accepts 42gb in capacity or my card is defective or it got damaged from not having the proper cooling. At this point im happy with where the setup is and i think ive trouble shooted this to the max. I still have 1 more free sata slot on my motherboard if needed. If i run into any more issues, im immediately buying the 9300-16i. Hopefully this post will help anyone else in the future as i tried looking up the lsi 9207-8i with high capacity drives and came up short. Edited September 25, 20232 yr by buster84
October 11, 20232 yr Author Solution Update 10/11/23 (found the real reason disks were being disabled) Shortly after my last post i got a power supply on sale and threw it on. I also remembered that you said that this whole issue could be my power supply, so I thought maybe you were right.. So i did more testing and moved all 8 14TB exos drives back onto the LSI. I then copied over all my data and its been working 100% normally for almost 3 weeks now. No disabled disks, write errors or anything with 2 party drives. This is my advice for anyone getting disabled disks in unraid. Change your power supply first, it’s probably on its way out. I spent so much time troubleshooting when the entire time it was actually the power supply. I hope this post saves others from losing their data and saving time on troubleshooting.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.