6.9.x, LSI Controllers & Ironwolf Disks Disabling - Summary & Fix


Cessquill

Recommended Posts

NOTE: There's a TL;DR section at the end of this post with required steps

People with specific Seagate Ironwolf disks on LSI controllers have been having issues with Unraid 6.9.0 and 6.9.1.  Typically when spinning up the drive could drop off the system.  Getting it back on would require checking, unassigning, reassigning and rebuilding its contents (about 24 hours).  It happened to me three times in a week across two of my four affected drives.

 

The drive in question is the 8TB Ironwolf ST8000VN004, although 10TB has been mentioned, so it may affect several.

 

There have been various comments and suggestions over the threads, and it appears that there is a workaround solution.  The workaround is reversible, so if an official fix comes along you can revert your settings back.  This thread is here to consolidate the great advice given by @TDD@SimonF, @JorgeB and others to hopefully make it easier for people to follow.

 

This thread is also here to hopefully provide a central place for those with the same hardware combo to track developments.

 

NOTE: Carry out these steps at your own risk. Whilst I will list each step I did and it's all possible within Unraid, it's your data.  Read through, and only carry anything out if you feel comfortable.  I'm far from an expert - I'm just consolidating valuable information scattered - if this is doing more harm than good, or is repeated elsewhere, then close this off.

 

The solution involves making changes to the settings of the Ironwolf disk.  This is done by running some Seagate command line utilities (SeaChest) explained by @TDD here

The changes we will be making are

  • Disable EPC
  • Disable Low Current Spinup (not confirmed if this is required)

 

The Seagate utilities refer to disks slightly differently than Unraid, but there is a way to translate one to the other, explained by @SimonF here 

 

I have carried out these steps and it looks to have solved the issue for me.  I've therefore listed them below in case it helps anybody.  It is nowhere near as long-winded as it looks - I've just listed literally every step.

 

Note that I am not really a Linux person, so getting the Seagate utilities onto Unraid might look like a right kludge.  If there's a better way, let me know.  All work is carried out on a Windows machine.  I use Notepad to help me prepare commands beforehand, I can construct each command first, then copy and paste it into the terminal.

 

If you have the option, make these changes before upgrading Unraid...

 

Part 1: Identify the disk(s) you need to work on

EDIT: See the end of this part for an alternate method of identifying the disks
1. Go down your drives list on the Unraid main tab.  Note down the part in brackets next to any relevant disk (eg, sdg, sdaa, sdac, sdad)
2. Open up a Terminal window from the header bar in Unraid
3. Type the following command and press enter.  This will give you a list of all drives with their sg and sd reference

sg_map

4. Note down the sg reference of each drive you identified in step 1 (eg, sdg=sg6, sdaa=sg26, etc.)

 

image.png.70d27cd49926e662e6bb5b7e77c64336.png

 

There is a second way to get the disk references which you may prefer.  It uses SeaChest, so needs carrying out after Part 2 (below).  @TDD explains it in this post here...

 

Part 2: Get SeaChest onto Unraid
NOTE: I copied SeaChest onto my Flash drive, and then into the tmp folder.  There's probably a better way of doing this

EDIT: Since writing this the zip file to download has changed its structure, I've updated the instructions to match the new download.
5. Open your flash drive from Windows (eg \\tower\flash), create a folder called "seachest" and enter it
6. Go to https://www.seagate.com/gb/en/support/software/seachest/ and download "SeaChest Utilities"
7. Open the downloaded zip file and navigate to Linux\Lin64\ubuntu-20.04_x86_64\ (when this guide was written, it was just "Linux\Lin64".  The naming of the ubuntu folder may change in future downloads) 
8. Copy all files from there to the seachest folder on your flash drive

Now we need to move the seachest folder to /tmp.  I used mc, but many will just copy over with a command.  The rest of this part takes place in the Terminal window opened in step 2...

9. Open Midnight Commander by typing "mc"
10. Using arrows and enter, click the ".." entry on the left side
11. Using arrows and enter, click the "/boot" folder
12. Tab to switch to the right panel, use arrows and enter to click the ".."
13. Using arrows and enter, click the "/tmp" folder
14. Tab back to the left panel and press F6 and enter to move the seachest folder into tmp
15. F10 to exit Midnight Commander

 

Finally, we need to change to the seachest folder on /tmp and make these utilities executable...
16. Enter the following commands...

cd /tmp/seachest

...to change to your new seachest folder, and...

chmod +x SeaChest_*

...to make the files executable.

 

Part 3: Making the changes to your Seagate drive(s)

EDIT: When this guide was written, there was what looked like a version number at the end of each file, represented by XXXX below.  Now each file has "_x86_64-linux-gnu" so where it mentions XXXX you need to replace with that.

 

This is all done in the Terminal window.  The commands here have two things that may be different on your setup - the version of SeaChest downloaded (XXXX) and the drive you're working on (YY).  This is where Notepad comes in handy - plan out all required commands first

 

17. Get the info about a drive...

./SeaChest_Info_XXXX -d /dev/sgYY -i

...in my case (as an example) "SeaChest_Info_150_11923_64 -d /dev/sg6 -i"

You should notice that EPC has "enabled" next to it and Low Current Spinup is enabled

 

18. Disable EPC...

./SeaChest_PowerControl_XXXX -d /dev/sgYY --EPCfeature disable

...for example "SeaChest_PowerControl_1100_11923_64 -d /dev/sg6 --EPCfeature disable"


19. Repeat step 17 to confirm EPC is now disabled
20. Repeat steps 17-19 for any other disks you need to set

 

21. Disable Low Current Spinup...:

./SeaChest_Configure_XXXX -d /dev/sgYY --lowCurrentSpinup disable

...for example "SeaChest_Configure_1170_11923_64 -d /dev/sg6 --lowCurrentSpinup disable"
It is not possible to check this without rebooting, but if you do not get any errors it's likely to be fine.
22. Repeat step 21 for any other disks

 

You should now be good to go.  Once this was done (took about 15 minutes) I rebooted and then upgraded from 6.8.3 to 6.9.1.  It's been fine since when before I would get a drive drop off every few days.  Make sure you have a full backup of 6.8.3, and don't make too many system changes for a while in case you need to roll back.

 

Seachest will be removed when you reboot the system (as it's in /tmp).  If you want to retain it on your boot drive, Copy to /tmp instead of moving it.  You will need to copy it off /boot to run it each time, as you need to make it executable.

 

Completely fine if you want to hold off for an official fix.  I'm not so sure it will be a software fix though, since it affects these specific drives only.  It may be a firmware update for the drive, which may just make similar changes to above.

 

As an afterthought, looking through these Seagate utilities, it might be possible to write a user script to completely automate this.  Another alternative is to boot onto a linux USB and run it outside of Unraid (would be more difficult to identify drives).

 

 

***********************************************

TL;DR - Just the Steps

I've had to do this several times myself and wanted somewhere to just get all the commands I'll need...

 

Get all /dev/sgYY numbers from list (compared to dashboard disk assignments)...

sg_map

 

Download seachest from https://www.seagate.com/gb/en/support/software/seachest/

Extract and copy seachest folder to /tmp

Change to seachest and make files executable...

cd /tmp/seachest
chmod +x SeaChest_*

 

For each drive you need to change (XXXX is suffix in seachest files, YY is number obtained from above)...

./SeaChest_Info_XXXX -d /dev/sgYY -i
./SeaChest_PowerControl_XXXX -d /dev/sgYY --EPCfeature disable
./SeaChest_Configure_XXXX -d /dev/sgYY --lowCurrentSpinup disable

Repeat first info command at the end to confirm EPC is disabled.  Cold boot to make sure all sorted. 

Edited by Cessquill
Tweaked title to be more specific, tweaked text to reflect no issues for two months, tweaked to clarify entering command; added "./" to start of command as it's now required; added TLDR summary section at the end
  • Like 6
  • Thanks 13
Link to comment

Thank you for the work bringing this together.  There is an easy way to just target the disks you want to modify.

 

SeaChest_PowerControl_1100_11923_64 -s --onlySeagate

 

I believe most tools actually allow this -s switch.  See screenshot.  This allows you to skip the 'map' part and make this easier :-)!

 

Kev.

 

1b.png.4cd890cbe41ec1ee48f4e2e5ae394bd7.png

Link to comment
41 minutes ago, TDD said:

Thank you for the work bringing this together.  There is an easy way to just target the disks you want to modify.

 

SeaChest_PowerControl_1100_11923_64 -s --onlySeagate

 

I believe most tools actually allow this -s switch.  See screenshot.  This allows you to skip the 'map' part and make this easier :-)!

 

Kev.

 

1b.png.4cd890cbe41ec1ee48f4e2e5ae394bd7.png

Thanks for that - I did see onlySeagate when trawling through the text doc manuals; forgot to go back to it (before I'd got SC working).

Link to comment
13 minutes ago, RockDawg said:

I am stuck on one thing.  When I unzip the SeaChestUtilities.zip file and go to /Linux/Lin64/, there are 3 folders in there, no files.  The folders are centos-7_aarch64, centos-7_x86_64 and ubuntu-20.04_x86_64.  Which do I want?

That's changed since I did it last week. I'm just starting to test, but @TDD or @JorgeB may be more help here

Link to comment

I'm not a Linux guy either.  At all.  I just figured if it wasn't the right one it would throw an error.  It seemed to work but the drive is still disabled after a reboot.  I assume I still have to unassign, reassign and rebuild the drive?  And these changes will merely keep it from going off line again?

Edited by RockDawg
Link to comment

Anyone know why the drive has to be rebuilt?  Did the data get corrupted?  So people with more than one (or 2 if they had dual parity) that went out at the same time lost data?

 

That's pretty scary that something like that could happen just by upgrading Unraid versions!

Edited by RockDawg
Link to comment
  • 2 weeks later...

hi, so plobably not related to this issue but last week i made major changes to my system in order to add a ssd cache and a gpu for a vm for this i added 2 lsi 9207 cards via a asus hyper m.2 expander (due to only having 2 pci slot with 4 lanes or more one of with was already in use) anyways, a little complex but seems to work(ish) but having problems with the lsi cards ether dropping out and then chrashing, this is using 6x ST16000NM001G with 4 data drives and dual parity and this issue has so far caused 4 disks to become disabled, data 2 and 3 on the first party check after the changes, at around 5% completion, the second attempt worked and data 2 and 3 rebuilt successfully at this point i made a full backup and ran another party check to confirm it was running ok and chrash at approx 10%  with parity 1 disabled at this point i sawpped the controllers around so now controller 2 had the hdds connected and controller 1 had the ssds ran parity rebuild for parity 1 disk and at around 4% chrash and data 3 again disabled, unforchanelty id only enabled logging on the second chrash and didnt realise these was only saved on ram so have no logs 

im aware that ST16000NM001G is a not a ironwolf but read up that these are very similar to their 16tb ironwolf drives so maybe affected, i origianlly thought this was due to bent pins on the cpu with happened during this rebuild where i dropped the cpu after it attaching to the underside of the cooler and crushed it with the case while tiring to catch it, this affected 8 pins compleatly flattening them but according to the diagram on wiki chip these are for memory channel A and GND (pin 2 from the corner broke but this is only power) the cpu ran happly during stress test and is currently 2 hours through a mem test with 0 errors, so if this isnt the issue then i can only assume it to be the signal intrerty between the cpu and the 9207's which ill test by dropping the link speed down to gen 2 and hope this dont affect my 10gb nic

full system spec before
DATA: ST16000NM001G x6
cache- none
vm data - samsung 860 1tb via unassigned drives
docker data - sandisk 3d ultra 960gb via unassigned drives 
these was connected via mobo ports and via a cheap sata card i had lying around in pciex1_1
GPU: 1660super for plex (in pciex16_1)
CPU: 3950X

mobo: asus b550-m
ram: 64gb corasir vengence (non ecc) @3600mhz
psu; corsair 850W RMX

case: fracal design node 804
with APC UPS 700VA

damaged pin details:
according to wiki chip (link to pic 1600px-OPGA-1331_pinmap.svg.png)

damaged pins was C39 - K39 (C39 - K38 fully flattened) and AP1 to AU1 was slightly bent but these, after repair B39 fell off as it was not only flattened but had achally folded in half :( and A39, C39 E39 and J39 still had a thin section on the top part of the pin right where it was bent, systems booted and passed CPU stress test ect, (didnt consider doing a mem test at this time)


full system spec after
DATA: ST16000NM001G x6
cache- 2x MX500 2tb 
vm data - 2x samsung 860 1tb via pools
docker data - sandisk 3d ultra 960gb and samsung 860 1tb via pools

these are via 2x lsi 9207 in slots  pciex16_1 via hpyer m.2 slot 2 and 3 with the HDDs in one card and the SDD's in the other card)

NIC: asus XG-C100C (in pciex16_1 via hpyer m.2 slot 4)
GPU: 1660super for plex (in pciex16_1 via hpyer m.2 slot 1)
GPU2: RX570 (intended for win 10 vm currently unsued in pciex16_2)
CPU: 3950X (nowwith bent and missing pins)
ram: 64gb corasir vengence (non ecc) @3600mhz

mobo: asus b550-m
psu; corsair 850W RMX
with APC UPS 700VA
case: fracal design node 804 (yeah its very tight build)

 

ill update if i find the issue (or get logs of it now i have those set up) but slim chance its related (still got at least 22 hours of mem test to go tho)

sorry for the long comment but more detail hopfully helps
       

Link to comment
23 hours ago, trurl said:

Go to Tools  - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread 

this is only about a minute after bootup, hopefully it helps, for now im gonna try dropping the pcie link speed down to gen 2 to see if its the ribbon cables (the sheilded ones) for the hba cards

dnas-diagnostics-20210330-1804.zip

EDIT: Adding syslog

dnas-syslog-20210330-1711.zip

Edited by Danny N
adding syslog
Link to comment
On 3/30/2021 at 6:06 PM, Danny N said:

this is only about a minute after bootup, hopefully it helps, for now im gonna try dropping the pcie link speed down to gen 2 to see if its the ribbon cables (the sheilded ones) for the hba cards

dnas-diagnostics-20210330-1804.zip 127.87 kB · 0 downloads

EDIT: Adding syslog

dnas-syslog-20210330-1711.zip 25.79 kB · 0 downloads

ok seems to be the pcie express link speed - dropping to gen 2 and now had a successfull parity rebuild on 2 drives and then a full praity check without error, this is the first time its done a parity check sucessfully and also didnt finish 2 operations back to back before ether, so gonna say this has nothing to do with my issue 
EDIT: thanks for the help :)

Edited by Danny N
see edit tag
Link to comment

Thank you all for this thread, very helpful.  I'm still on 6.8.3 and I have several Seagate ST8000NM0055 (standard 512E) firmware SN04, which are listed as Enterprise Capacity.  I just checked and Seagate has a firmware update for this model, SN05  I also have several Seagate ST12000NE0008 Ironwolf Pro drives with firmware EN01, no firmware updates available.  My controller is a LSI 9305-24i x8, bios P14 and firmware P16_IT.  I've had zero issues, uptime 329 days.

 

I was thinking of using the Seagate provided usb linux bootable flash builder and boot to that and run the commands outside of unraid.  Given I only have seagate drives, I will need to do them all.  Has anyone tried this with success?

Link to comment
  • Cessquill changed the title to 6.9.x, LSI Controllers & Ironwolf Disks Disabling - Summary & Fix

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.