Cessquill Posted March 11, 2021 Posted March 11, 2021 (edited) NOTE: There's a TL;DR section at the end of this post with required steps People with specific Seagate Ironwolf disks on LSI controllers have been having issues with Unraid 6.9.0 and 6.9.1. Typically when spinning up the drive could drop off the system. Getting it back on would require checking, unassigning, reassigning and rebuilding its contents (about 24 hours). It happened to me three times in a week across two of my four affected drives. The drive in question is the 8TB Ironwolf ST8000VN004, although 10TB has been mentioned, so it may affect several. There have been various comments and suggestions over the threads, and it appears that there is a workaround solution. The workaround is reversible, so if an official fix comes along you can revert your settings back. This thread is here to consolidate the great advice given by @TDD, @SimonF, @JorgeB and others to hopefully make it easier for people to follow. This thread is also here to hopefully provide a central place for those with the same hardware combo to track developments. NOTE: Carry out these steps at your own risk. Whilst I will list each step I did and it's all possible within Unraid, it's your data. Read through, and only carry anything out if you feel comfortable. I'm far from an expert - I'm just consolidating valuable information scattered - if this is doing more harm than good, or is repeated elsewhere, then close this off. The solution involves making changes to the settings of the Ironwolf disk. This is done by running some Seagate command line utilities (SeaChest) explained by @TDD here The changes we will be making are Disable EPC Disable Low Current Spinup (not confirmed if this is required) The Seagate utilities refer to disks slightly differently than Unraid, but there is a way to translate one to the other, explained by @SimonF here I have carried out these steps and it looks to have solved the issue for me. I've therefore listed them below in case it helps anybody. It is nowhere near as long-winded as it looks - I've just listed literally every step. Note that I am not really a Linux person, so getting the Seagate utilities onto Unraid might look like a right kludge. If there's a better way, let me know. All work is carried out on a Windows machine. I use Notepad to help me prepare commands beforehand, I can construct each command first, then copy and paste it into the terminal. If you have the option, make these changes before upgrading Unraid... Part 1: Identify the disk(s) you need to work on EDIT: See the end of this part for an alternate method of identifying the disks 1. Go down your drives list on the Unraid main tab. Note down the part in brackets next to any relevant disk (eg, sdg, sdaa, sdac, sdad) 2. Open up a Terminal window from the header bar in Unraid 3. Type the following command and press enter. This will give you a list of all drives with their sg and sd reference sg_map 4. Note down the sg reference of each drive you identified in step 1 (eg, sdg=sg6, sdaa=sg26, etc.) There is a second way to get the disk references which you may prefer. It uses SeaChest, so needs carrying out after Part 2 (below). @TDD explains it in this post here... Part 2: Get SeaChest onto Unraid NOTE: I copied SeaChest onto my Flash drive, and then into the tmp folder. There's probably a better way of doing this EDIT: Since writing this the zip file to download has changed its structure, I've updated the instructions to match the new download. 5. Open your flash drive from Windows (eg \\tower\flash), create a folder called "seachest" and enter it 6. Go to https://www.seagate.com/gb/en/support/software/seachest/ and download "SeaChest Utilities" 7. Open the downloaded zip file and navigate to Linux\Lin64\ubuntu-20.04_x86_64\ (when this guide was written, it was just "Linux\Lin64". The naming of the ubuntu folder may change in future downloads) 8. Copy all files from there to the seachest folder on your flash drive Now we need to move the seachest folder to /tmp. I used mc, but many will just copy over with a command. The rest of this part takes place in the Terminal window opened in step 2... 9. Open Midnight Commander by typing "mc" 10. Using arrows and enter, click the ".." entry on the left side 11. Using arrows and enter, click the "/boot" folder 12. Tab to switch to the right panel, use arrows and enter to click the ".." 13. Using arrows and enter, click the "/tmp" folder 14. Tab back to the left panel and press F6 and enter to move the seachest folder into tmp 15. F10 to exit Midnight Commander Finally, we need to change to the seachest folder on /tmp and make these utilities executable... 16. Enter the following commands... cd /tmp/seachest ...to change to your new seachest folder, and... chmod +x SeaChest_* ...to make the files executable. Part 3: Making the changes to your Seagate drive(s) EDIT: When this guide was written, there was what looked like a version number at the end of each file, represented by XXXX below. Now each file has "_x86_64-linux-gnu" so where it mentions XXXX you need to replace with that. This is all done in the Terminal window. The commands here have two things that may be different on your setup - the version of SeaChest downloaded (XXXX) and the drive you're working on (YY). This is where Notepad comes in handy - plan out all required commands first 17. Get the info about a drive... ./SeaChest_Info_XXXX -d /dev/sgYY -i ...in my case (as an example) "SeaChest_Info_150_11923_64 -d /dev/sg6 -i" You should notice that EPC has "enabled" next to it and Low Current Spinup is enabled 18. Disable EPC... ./SeaChest_PowerControl_XXXX -d /dev/sgYY --EPCfeature disable ...for example "SeaChest_PowerControl_1100_11923_64 -d /dev/sg6 --EPCfeature disable" 19. Repeat step 17 to confirm EPC is now disabled 20. Repeat steps 17-19 for any other disks you need to set 21. Disable Low Current Spinup...: ./SeaChest_Configure_XXXX -d /dev/sgYY --lowCurrentSpinup disable ...for example "SeaChest_Configure_1170_11923_64 -d /dev/sg6 --lowCurrentSpinup disable" It is not possible to check this without rebooting, but if you do not get any errors it's likely to be fine. 22. Repeat step 21 for any other disks You should now be good to go. Once this was done (took about 15 minutes) I rebooted and then upgraded from 6.8.3 to 6.9.1. It's been fine since when before I would get a drive drop off every few days. Make sure you have a full backup of 6.8.3, and don't make too many system changes for a while in case you need to roll back. Seachest will be removed when you reboot the system (as it's in /tmp). If you want to retain it on your boot drive, Copy to /tmp instead of moving it. You will need to copy it off /boot to run it each time, as you need to make it executable. Completely fine if you want to hold off for an official fix. I'm not so sure it will be a software fix though, since it affects these specific drives only. It may be a firmware update for the drive, which may just make similar changes to above. As an afterthought, looking through these Seagate utilities, it might be possible to write a user script to completely automate this. Another alternative is to boot onto a linux USB and run it outside of Unraid (would be more difficult to identify drives). *********************************************** TL;DR - Just the Steps I've had to do this several times myself and wanted somewhere to just get all the commands I'll need... Get all /dev/sgYY numbers from list (compared to dashboard disk assignments)... sg_map Download seachest from https://www.seagate.com/gb/en/support/software/seachest/ Extract and copy seachest folder to /tmp Change to seachest and make files executable... cd /tmp/seachest chmod +x SeaChest_* For each drive you need to change (XXXX is suffix in seachest files, YY is number obtained from above)... ./SeaChest_Info_XXXX -d /dev/sgYY -i ./SeaChest_PowerControl_XXXX -d /dev/sgYY --EPCfeature disable ./SeaChest_Configure_XXXX -d /dev/sgYY --lowCurrentSpinup disable Repeat first info command at the end to confirm EPC is disabled. Cold boot to make sure all sorted. Edited October 9, 2023 by Cessquill Tweaked title to be more specific, tweaked text to reflect no issues for two months, tweaked to clarify entering command; added "./" to start of command as it's now required; added TLDR summary section at the end 6 14 Quote
JorgeB Posted March 11, 2021 Posted March 11, 2021 Nice work, if you don't object I was thinking of moving this to the general guides section or it will likely drop from the 1st page here. Quote
Cessquill Posted March 11, 2021 Author Posted March 11, 2021 Just now, JorgeB said: Nice work, if you don't object I was thinking of moving this to the general guides section or it will likely drop from the 1st page here. Of course, yes. Partly seeking confirmation I hadn't done something seriously wrong - little bit out of my depth! Quote
TDD Posted March 11, 2021 Posted March 11, 2021 Thank you for the work bringing this together. There is an easy way to just target the disks you want to modify. SeaChest_PowerControl_1100_11923_64 -s --onlySeagate I believe most tools actually allow this -s switch. See screenshot. This allows you to skip the 'map' part and make this easier :-)! Kev. Quote
Cessquill Posted March 11, 2021 Author Posted March 11, 2021 41 minutes ago, TDD said: Thank you for the work bringing this together. There is an easy way to just target the disks you want to modify. SeaChest_PowerControl_1100_11923_64 -s --onlySeagate I believe most tools actually allow this -s switch. See screenshot. This allows you to skip the 'map' part and make this easier :-)! Kev. Thanks for that - I did see onlySeagate when trawling through the text doc manuals; forgot to go back to it (before I'd got SC working). Quote
RockDawg Posted March 13, 2021 Posted March 13, 2021 I am stuck on one thing. When I unzip the SeaChestUtilities.zip file and go to /Linux/Lin64/, there are 3 folders in there, no files. The folders are centos-7_aarch64, centos-7_x86_64 and ubuntu-20.04_x86_64. Which do I want? Quote
Cessquill Posted March 13, 2021 Author Posted March 13, 2021 13 minutes ago, RockDawg said: I am stuck on one thing. When I unzip the SeaChestUtilities.zip file and go to /Linux/Lin64/, there are 3 folders in there, no files. The folders are centos-7_aarch64, centos-7_x86_64 and ubuntu-20.04_x86_64. Which do I want? That's changed since I did it last week. I'm just starting to test, but @TDD or @JorgeB may be more help here Quote
RockDawg Posted March 13, 2021 Posted March 13, 2021 I tried the ubuntu files just for the info command and it worked. So I am going to try continuing with those. Quote
Cessquill Posted March 13, 2021 Author Posted March 13, 2021 Just now, RockDawg said: I tried the ubuntu files just for the info command and it worked. So I am going to try continuing with those. I'd have thought centos, but not being a Linux guy I'm not sure (or how much difference it makes). I'll update the post when it's clear. Quote
RockDawg Posted March 13, 2021 Posted March 13, 2021 (edited) I'm not a Linux guy either. At all. I just figured if it wasn't the right one it would throw an error. It seemed to work but the drive is still disabled after a reboot. I assume I still have to unassign, reassign and rebuild the drive? And these changes will merely keep it from going off line again? Edited March 13, 2021 by RockDawg Quote
Cessquill Posted March 13, 2021 Author Posted March 13, 2021 Just now, RockDawg said: I assume I still have to unassign, reassign and rebuild the drive? And these changes will merely keep it from going off line again? Yes Quote
RockDawg Posted March 13, 2021 Posted March 13, 2021 (edited) Anyone know why the drive has to be rebuilt? Did the data get corrupted? So people with more than one (or 2 if they had dual parity) that went out at the same time lost data? That's pretty scary that something like that could happen just by upgrading Unraid versions! Edited March 13, 2021 by RockDawg Quote
TDD Posted March 14, 2021 Posted March 14, 2021 12 hours ago, Cessquill said: That's changed since I did it last week. I'm just starting to test, but @TDD or @JorgeB may be more help here Linux guy here. Use the Ubuntu ones. If they don't work for unknown reasons, I have an archive of the older tool set. Kev. 1 Quote
JorgeB Posted March 14, 2021 Posted March 14, 2021 13 hours ago, RockDawg said: So people with more than one (or 2 if they had dual parity) that went out at the same time lost data? Unraid only disables as many disks as there are parity devices. 1 Quote
Neejrow Posted March 14, 2021 Posted March 14, 2021 I just want to say thank you very much for this guide. Made it very simple to fix. Hopefully this resolves my issue that I just ran into today! 1 Quote
Cessquill Posted March 14, 2021 Author Posted March 14, 2021 Updated original post to reflect new structure of SeaChest Utilities zip file 3 Quote
Vitor Ventura Posted March 24, 2021 Posted March 24, 2021 Hello there, Server just random crash? Without any info in log? Any pattern, simply crash? I have that, and have 1 LSI controller and 1 ST8000VN004, and in Unraid RC version, doesnt happen, only on stable versions... Quote
JorgeB Posted March 25, 2021 Posted March 25, 2021 11 hours ago, Vitor Ventura said: Server just random crash? Likely unrelated to this issue. Quote
Danny N Posted March 29, 2021 Posted March 29, 2021 hi, so plobably not related to this issue but last week i made major changes to my system in order to add a ssd cache and a gpu for a vm for this i added 2 lsi 9207 cards via a asus hyper m.2 expander (due to only having 2 pci slot with 4 lanes or more one of with was already in use) anyways, a little complex but seems to work(ish) but having problems with the lsi cards ether dropping out and then chrashing, this is using 6x ST16000NM001G with 4 data drives and dual parity and this issue has so far caused 4 disks to become disabled, data 2 and 3 on the first party check after the changes, at around 5% completion, the second attempt worked and data 2 and 3 rebuilt successfully at this point i made a full backup and ran another party check to confirm it was running ok and chrash at approx 10% with parity 1 disabled at this point i sawpped the controllers around so now controller 2 had the hdds connected and controller 1 had the ssds ran parity rebuild for parity 1 disk and at around 4% chrash and data 3 again disabled, unforchanelty id only enabled logging on the second chrash and didnt realise these was only saved on ram so have no logs im aware that ST16000NM001G is a not a ironwolf but read up that these are very similar to their 16tb ironwolf drives so maybe affected, i origianlly thought this was due to bent pins on the cpu with happened during this rebuild where i dropped the cpu after it attaching to the underside of the cooler and crushed it with the case while tiring to catch it, this affected 8 pins compleatly flattening them but according to the diagram on wiki chip these are for memory channel A and GND (pin 2 from the corner broke but this is only power) the cpu ran happly during stress test and is currently 2 hours through a mem test with 0 errors, so if this isnt the issue then i can only assume it to be the signal intrerty between the cpu and the 9207's which ill test by dropping the link speed down to gen 2 and hope this dont affect my 10gb nic full system spec before DATA: ST16000NM001G x6 cache- none vm data - samsung 860 1tb via unassigned drives docker data - sandisk 3d ultra 960gb via unassigned drives these was connected via mobo ports and via a cheap sata card i had lying around in pciex1_1 GPU: 1660super for plex (in pciex16_1) CPU: 3950X mobo: asus b550-m ram: 64gb corasir vengence (non ecc) @3600mhz psu; corsair 850W RMX case: fracal design node 804 with APC UPS 700VA damaged pin details: according to wiki chip (link to pic ) damaged pins was C39 - K39 (C39 - K38 fully flattened) and AP1 to AU1 was slightly bent but these, after repair B39 fell off as it was not only flattened but had achally folded in half and A39, C39 E39 and J39 still had a thin section on the top part of the pin right where it was bent, systems booted and passed CPU stress test ect, (didnt consider doing a mem test at this time) full system spec after DATA: ST16000NM001G x6 cache- 2x MX500 2tb vm data - 2x samsung 860 1tb via pools docker data - sandisk 3d ultra 960gb and samsung 860 1tb via pools these are via 2x lsi 9207 in slots pciex16_1 via hpyer m.2 slot 2 and 3 with the HDDs in one card and the SDD's in the other card) NIC: asus XG-C100C (in pciex16_1 via hpyer m.2 slot 4) GPU: 1660super for plex (in pciex16_1 via hpyer m.2 slot 1) GPU2: RX570 (intended for win 10 vm currently unsued in pciex16_2) CPU: 3950X (nowwith bent and missing pins) ram: 64gb corasir vengence (non ecc) @3600mhz mobo: asus b550-m psu; corsair 850W RMX with APC UPS 700VA case: fracal design node 804 (yeah its very tight build) ill update if i find the issue (or get logs of it now i have those set up) but slim chance its related (still got at least 22 hours of mem test to go tho) sorry for the long comment but more detail hopfully helps Quote
Danny N Posted March 29, 2021 Posted March 29, 2021 (edited) edit to the above: do have this log from after it finished the first rebuild and crashed disabling the parity disk tho it probably dont help much syslog Edited March 29, 2021 by Danny N attched file was in the middle of the text Quote
trurl Posted March 29, 2021 Posted March 29, 2021 Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread Quote
Danny N Posted March 29, 2021 Posted March 29, 2021 ty, currently doing a memtest for a day so il do this tommow once i get back into unraid 3 minutes ago, trurl said: Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread Quote
Danny N Posted March 30, 2021 Posted March 30, 2021 (edited) 23 hours ago, trurl said: Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread this is only about a minute after bootup, hopefully it helps, for now im gonna try dropping the pcie link speed down to gen 2 to see if its the ribbon cables (the sheilded ones) for the hba cards dnas-diagnostics-20210330-1804.zip EDIT: Adding syslog dnas-syslog-20210330-1711.zip Edited March 30, 2021 by Danny N adding syslog Quote
Danny N Posted April 1, 2021 Posted April 1, 2021 (edited) On 3/30/2021 at 6:06 PM, Danny N said: this is only about a minute after bootup, hopefully it helps, for now im gonna try dropping the pcie link speed down to gen 2 to see if its the ribbon cables (the sheilded ones) for the hba cards dnas-diagnostics-20210330-1804.zip 127.87 kB · 0 downloads EDIT: Adding syslog dnas-syslog-20210330-1711.zip 25.79 kB · 0 downloads ok seems to be the pcie express link speed - dropping to gen 2 and now had a successfull parity rebuild on 2 drives and then a full praity check without error, this is the first time its done a parity check sucessfully and also didnt finish 2 operations back to back before ether, so gonna say this has nothing to do with my issue EDIT: thanks for the help Edited April 1, 2021 by Danny N see edit tag Quote
optiman Posted April 8, 2021 Posted April 8, 2021 Thank you all for this thread, very helpful. I'm still on 6.8.3 and I have several Seagate ST8000NM0055 (standard 512E) firmware SN04, which are listed as Enterprise Capacity. I just checked and Seagate has a firmware update for this model, SN05 I also have several Seagate ST12000NE0008 Ironwolf Pro drives with firmware EN01, no firmware updates available. My controller is a LSI 9305-24i x8, bios P14 and firmware P16_IT. I've had zero issues, uptime 329 days. I was thinking of using the Seagate provided usb linux bootable flash builder and boot to that and run the commands outside of unraid. Given I only have seagate drives, I will need to do them all. Has anyone tried this with success? Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.