6.9.x, LSI Controllers & Ironwolf Disks Disabling - Summary & Fix


Cessquill

Recommended Posts

thanks guys!  My issue is that the instructions to disable the lowcurrentspinup did not work for me on any of my drives.  Even after cold boot, they show that feature is still enabled.  Weird because after issuing the command, it says it successfully disabled that feature and a reboot is required, yet the feature is still enabled.

 

I guess I will just have to take a chance and upgrade. 

 

So what should I watch for?  Errors in the syslog, disk log, SMART log?  Will the main screen show the errors counting up?

image.png.2e0b64988c51ce7559cc9b0ada322db6.png

 

And if I do see errors, what should I do, roll back ASAP.  I don't want to loose data.

 

Thanks!

Link to comment

I apologize if this has been cited, but more often than not it is necessary to completely power down and ensure the drives have zero power then start up.  Just like updating your motherboard BIOS.  Other than that, AFAIK the issue is only with the mentioned models.  The fix has held for me since day one of application.

 

Kev.

Link to comment
  • 2 weeks later...

Glad I found out this thread - LSI + 2 x 8 TB IronWolf - had errors and got both data and parity disabled - but after start stop array errors were gone... issued extended smart scan on all disks - no problems, then one day later in the middle of the night error appeared - looks once rclone started to pull out data from gdrive (set to midnight) disk had to spin up and error.

Anyway - SeaChest, disabled both EPC and lowpowerspin - power cycle server - checked again with SeaChest and both things are already disabled. Rebuild now and let's see what's going to happen now. Brand new disks, brand new server and once a week I have issues - and to think that my old Microserver gen8 running xpenology was rock solid.

 

@TDD and @Cessquill - any issues so far or it's stable with that fix?

Edited by cpu
Link to comment
On 9/23/2021 at 4:10 AM, Cessquill said:

I don't think you'll have problems.  This only affected 1 or 2 of all the Ironwolf models, none of which you have.

 

I can confirm that ST8000NM0055 drives are most definitely affected by this issue.  This bit me hard when I upgraded to v6.9.2 back in April.  I had to roll back to 6.8.3 to recover from a dual-drive "failure" and inability to rebuild on 6.9.2, and never attempted any of the fixes posted here.  I felt extremely lucky to escape without losing data, and I'm still running 6.8.3.

 

 

On 9/23/2021 at 3:59 PM, optiman said:

I ran a parity check and it just finished without any errors.  syslog looks good.  I would guess that I would already see errors if I was going to have the issue.

 

Thanks!

 

optiman, glad to read this worked for you.  Since it has been a couple weeks, is your system still okay?  I'm starting to feel a little trapped on 6.8.3, so I'll probably have to apply this fix.  Since we both have ST8000NM0055 drives, your results matter most to me.

 

I was hopeful that this was a bug in 6.9.x that would be fixed in 6.10, and that I wouldn't need to do the drive fix.  Came here to see if anyone had tested this on 6.10 without applying these fixes, but no dice.

 

Paul

Edited by Pauven
Link to comment

I disabled the EPC and did the upgrade and haven't had any issues.  I guess the EPC was the issue, because as I mentioned in my other post, I wasn't able to disable the lowpowerspinup.  As long as you disabled EPC, you should be good to go.

 

Based on input from others, it sounds like future unraid version will not effect us.  The issue is with LSI and Seagate.  We need updated LSI drivers that address this, or updated fw from Seagate that addresses these issues.

 

For me, I'm moving back to WD drives.  I used those for years without any issues.

Edited by optiman
  • Thanks 1
Link to comment
  • 2 weeks later...

Hi All, 

 

I would like to say thanks for this thread solved my issue. 

 

I have 4 X ST10000VN0004 10TB HD's Had SC60 Firmware (purchased 3-4 years ago)

and 2 X ST10000VN0008 10 TB HD's Has SC61 Firmware.  (purchased 6 Months ago)

 

Checked the Seagate website to see if there was any firmware updates and there was an update for the ST10000VN0004 10TB HD's which updates SC60 Firmware so SC61 Firmware.

 

I was getting tons of read errors on the 4 drives and occasionally it would drop one of the drives when connected to my LSI card. 

When i connected the drives to my motherboard i had no issues for months. 

 

I tried replacing cables no luck, i have an LSI card and a HP expander took them off the expander and connected it directly to the LSI card no luck. In the end i though it was my sata back plane since the drives worked no probs directly on the motherboard. 

 

Didn't even think that a firmware on the drives could have a compatibility issue with LSI cards.

 

Anyway happy its all sorted. 

Thanks so much.

 

 

  • Like 1
Link to comment
  • 2 weeks later...
On 10/15/2021 at 4:53 PM, cheww said:

Hi All, 

 

I would like to say thanks for this thread solved my issue. 

 

I have 4 X ST10000VN0004 10TB HD's Had SC60 Firmware (purchased 3-4 years ago)

and 2 X ST10000VN0008 10 TB HD's Has SC61 Firmware.  (purchased 6 Months ago)

 

Checked the Seagate website to see if there was any firmware updates and there was an update for the ST10000VN0004 10TB HD's which updates SC60 Firmware so SC61 Firmware.

 

I was getting tons of read errors on the 4 drives and occasionally it would drop one of the drives when connected to my LSI card. 

When i connected the drives to my motherboard i had no issues for months. 

 

I tried replacing cables no luck, i have an LSI card and a HP expander took them off the expander and connected it directly to the LSI card no luck. In the end i though it was my sata back plane since the drives worked no probs directly on the motherboard. 

 

Didn't even think that a firmware on the drives could have a compatibility issue with LSI cards.

 

Anyway happy its all sorted. 

Thanks so much.

 

 

 

Bad news .. Started happening again just now for me. one of the new HDD's and one of the old ones at the same time.. Both have the updated firmware. I have connected them back on to the Motherboard.. get it stable again and try that EPC thing in the first post.

 

Back in about 15-20 days let you know how it went

 

 

Link to comment

Config:  5 8TB Seagate non-Ironwolfs as data drives, 10 TB Seagate Ironwolf ST10000VN0008 as parity drive.


Updated to Unraid 6.9.x ago from 6.8.3 (around Oct 17, a week ago)


Last night my server flagged the Ironwolf parity drive as having failed SMART test and having too many bad sectors. Then it marked the drive with a red x.
Today I put the Ironwolf into a Win10 PC and ran seatools, all tests except the long generic. Seatools found no problem with it. I read around on the forums and found this thread. Downloaded SeaChest and disabled EPC (on a win10 PC).


After adding the Ironwolf back to the array there were again smart problems, and the drive was marked as failed. I wasn't even able to get it to be listed as unassigned.


After reverting to 6.8.3 the drive came up as unassigned and I was able to set it up as parity drive. But when the array came back up to rebuild the parity it again complained about smart errors and flagged the parity drive with a red x.


Now I don't know what to do. Should I enable EPC again and continue trying with 6.8.3 (which had worked fine before) or should I forget about the ironwolf for now and buy another parity drive?
Attached are the last diagnostics from when I was back to 6.8.3. I also have diagnostics available from 6.9.x (last stable version)

schiethucken-diagnostics-20211025-1925.zip

Link to comment
Quote

7. Open the downloaded zip file and navigate to Linux\Lin64\ubuntu-20.04_x86_64\ (when this guide was written, it was just "Linux\Lin64".  The naming of the ubuntu folder may change in future downloads) 

Hello!

 

I just added a second parity for the first time ever and of course I ended up with this problem I think.  It's my only ST8000VN004 drive (lots of ST4000 drives with no issues otherwise). 

 

It appears that Seagate has changed their folder structure a bit in the SeaChestUtilities.zip and I'm unsure which route to take. 

 

Right now the folder structure in the zip is: Linux->RAID or Non-RAID->centos-7-x86_64 or centos-7_aarch64

 

Not sure which one I should be grabbing to start this process.

 

Thanks for any help!

Link to comment
23 minutes ago, Legion47 said:

@noja

I used the Non-RAID variant for x86_64 architecture. It's most likely also the one you want.

Let's put it this way: If you needed aarch64, you'd probably know. It's for ARM-based systems, which aren't very common for home server use.

I appreciate that, thanks for the guidance!

Link to comment

Does any one know if or when this issue will be fixed in Unraid?

My install consists of 8 Seagate 8 TB Iron wolf drives, two are used for parity. i have two 2TB SSD drives used as cache.  I started with Unraid 6.8.3 and all worked great. I updated to 6.9.1  in April and all worked great until I shutdown to install a couple of unassigned drives for miscellaneous use. When I restarted one of my parity drives and three data pool drives refused to start. I posted and for help and found this was a common issue for users with Seagate drives. I decided to roll back to 6.8 and wait for a future version that fixed this issue. Thanks for your help.

Link to comment

It's my understanding that the issue isn't with Unraid at all, so the answer would be never.  If you have a LSI controller and Seagate drives, it's best to just follow the instructions and disable EPC to be safe.

 

Again, my understanding is that the issue is with combo of LSI controller and some Seagate drives.  We need updated LSI drivers that could help address this, or updated fw from Seagate that addresses it.  Let's hope new Seagate fw at least has that EPC disabled by default

Link to comment
42 minutes ago, optiman said:

It's my understanding that the issue isn't with Unraid at all, so the answer would be never. 

 

If this is true, then why does the problem only appear after upgrading to Unraid 6.9.x?  

 

I'd been running on 6.8.3 for a long time without issues.  Last spring I upgraded to 6.9.3 and bam! the issues hit immediately.  I never did the EPC fix.  The problem was incessant on 6.9.x, and I didn't want to risk loosing data playing around with drive setting as I had 2 drives out and was already risking data loss, so I rolled back to 6.8.3, and the problem went away.  Half a year later and it's been smooth sailing on 6.8.3.  I stayed on 6.8.3 because it works and there wasn't anything in the 6.9.x branch I'm needing.

 

Even the very first post here mentions that the problems started with 6.9.0, which 100% matches my experience.

 

Perhaps what you are saying is that the problem lies in the Linux kernel or one of the various drivers that were upgraded in the 6.9.x releases, and the issue is not in any of LimeTech's Unraid code.  That may be true, though I'm not sure I've seen it clearly detailed in this thread exactly where the problem lies, so I would appreciate pointers to any additional information I may have missed.

 

It certainly seems reasonable to me that since a change in 6.9.x broke this, another change in 6.10.x could fix it, so I'm not inclined to give up hope entirely.  And there have been many times LimeTech has chased down bugs in other components on behalf of their users - and this issue has been reported to them in more than one ticket so they should be aware of it, though disappointingly I've never seen them weigh in on the topic.

Link to comment
  • 1 month later...

Tried the EPC disabling when one of my relatively new Seagate Iron wolf drives got disabled in Unraid 6.9.2 and LSI controller, detailed info below
 

Num   Ctlr            FW Ver        NVDATA        x86-BIOS         PCI Addr
----------------------------------------------------------------------------

0  SAS2008(B2)     20.00.07.00    14.01.00.08      No Image      00:03:00:00


SeaChest_PowerControl_x86_64-redhat-linux --scan --onlySeagate
ATA      /dev/sg11    ST4000VN008-2DR166  SC60


SeaChest_PowerControl_x86_64-redhat-linux -d /dev/sg11 --EPCfeature disable
==========================================================================================
 SeaChest_PowerControl - Seagate drive utilities - NVMe Enabled
 Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
 SeaChest_PowerControl Version: 3.0.2-2_2_3 X86_64
 Build Date: Jun 17 2021
 Today: Fri Dec 10 01:53:39 2021        User: root
==========================================================================================

/dev/sg11 - ST4000VN008-2DR166 - ZDH9N576 - ATA
Failed to send EPC command to /dev/sg11.
EPC Feature set might not be supported.
Or EPC Feature might already be in the desired state.

Did I miss something? I dont see EPC in info listing and I am unable to set EPC. Is this because drive is connected to LSI controller?

Link to comment
On 10/28/2021 at 6:38 PM, Pauven said:

 

If this is true, then why does the problem only appear after upgrading to Unraid 6.9.x?  

I had the same questions but I'm unable to provide more technical information about why this happened.  As you said, there was updates on the Linux side (good thing) with 6.9.x and this issue was born.  I see this as Linux continues to evolve and LSI and Seagate have to keep up to remain compatible.  That said, this is just my opinion and there are way smarter people here than me who can talk to this.  What I can tell you is that with EPC disabled, I've had zero issues.  I'm running 6.9.1 because 6.9.2 all my drives remain spun up :(

 

@ReLe  It looks like Seagate  has changed the file and folder format again with their tools.  For example, the command I used did not have the word RedHat in it. 

 

Back up your data and at your own risk, try again using the version of SeaCheast that I have attached for you.  The files are from June 2019 and they work perfectly.  Due to file size limits, I only attached the Info and PowerControl utilities.  PM me if you want any of the other tools that were part of the zip file.  I personally didn't need any of the others.

 

Unzip and give it a try again.  I used this version on my system, and you can follow the instructions on page 1 of this thread, exactly

 

Be sure to run the Info command first to see what the current state is.  Confirm you have the correct mapping.

Seacheast.zip

Edited by optiman
Link to comment

Thanks @optiman I tried with the 2019 version got same error as before.
EPC feature is not listed with info command. 
I tested the drive with (Asus Deluxe X79) motherboard integraded ?marvell? sata controller and also intel sata ports and neither controller paired with the Seagate disk lists EPC in info.

Info:
 

SeaChest_Info_150_11923_64 -d /dev/sg4 -i
==========================================================================================
 SeaChest_Info - Seagate drive utilities - NVMe Enabled
 Copyright (c) 2014-2019 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
 SeaChest_Info Version: 1.5.0-1_19_23 X86_64
 Build Date: Jun 10 2019
 Today: Fri Dec 10 20:13:30 2021
==========================================================================================

/dev/sg4 - ST4000VN008-2DR166 - xxxxxxxx - ATA
        Model Number: ST4000VN008-2DR166
        Serial Number: xxxxxxxxxx
        Firmware Revision: SC60
        World Wide Name: 5000C500C97E2CB5
        Drive Capacity (TB/TiB): 4.00/3.64
        Native Drive Capacity (TB/TiB): 4.00/3.64
        Temperature Data:
                Current Temperature (C): 28
                Highest Temperature (C): 41
                Lowest Temperature (C): 0
        Power On Time:  85 days 23 hours 43 minutes
        Power On Hours: 2063.72
        MaxLBA: 7814037167
        Native MaxLBA: 7814037167
        Logical Sector Size (B): 512
        Physical Sector Size (B): 4096
        Sector Alignment: 0
        Rotation Rate (RPM): 5980
        Form Factor: 3.5"
        Last DST information:
                DST has never been run
        Long Drive Self Test Time:  10 hours 9 minutes
        Interface speed:
                Max Speed (Gb/s): 6.0
                Negotiated Speed (Gb/s): 3.0
        Annualized Workload Rate (TB/yr): 123.25
        Total Bytes Read (TB): 20.87
        Total Bytes Written (TB): 8.17
        Encryption Support: Not Supported
        Cache Size (MiB): 64.00
        Read Look-Ahead: Enabled
        Write Cache: Enabled
        Low Current Spinup: Disabled
        SMART Status: Good
        ATA Security Information: Supported
        Firmware Download Support: Full, Segmented, Deferred, DMA
        Specifications Supported:
                ACS-3
                ACS-2
                ATA8-ACS
                ATA/ATAPI-7
                ATA/ATAPI-6
                ATA/ATAPI-5
                ATA/ATAPI-4
                SATA 3.1
                SATA 3.0
                SATA 2.6
                SATA 2.5
                SATA II: Extensions
                SATA 1.0a
                ATA8-AST
        Features Supported:
                Sanitize
                SATA NCQ
                SATA Rebuild Assist
                SATA Software Settings Preservation [Enabled]
                SATA Device Initiated Power Management
                HPA
                Power Management
                Security
                SMART [Enabled]
                DCO
                48bit Address
                PUIS
                APM [Enabled]
                GPL
                Streaming
                SMART Self-Test
                SMART Error Logging
                Write-Read-Verify
                Sense Data Reporting [Enabled]
                SCT Write Same
                SCT Error Recovery Control
                SCT Feature Control
                SCT Data Tables
                Host Logging
                Seagate In Drive Diagnostics (IDD)



However, I noticed that when I run the info, I get following errors to dmesg:
ata5.00: invalid command format 2
Every Seagate disk queryed with the seachest_info gives that error to dmesg from their ata address.  
 

Edited by ReLe
Link to comment
  • 4 weeks later...
On 12/10/2021 at 7:30 PM, ReLe said:

Thanks @optiman I tried with the 2019 version got same error as before.
EPC feature is not listed with info command. 
I tested the drive with (Asus Deluxe X79) motherboard integraded ?marvell? sata controller and also intel sata ports and neither controller paired with the Seagate disk lists EPC in info.

Info:
 

SeaChest_Info_150_11923_64 -d /dev/sg4 -i
==========================================================================================
 SeaChest_Info - Seagate drive utilities - NVMe Enabled
 Copyright (c) 2014-2019 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
 SeaChest_Info Version: 1.5.0-1_19_23 X86_64
 Build Date: Jun 10 2019
 Today: Fri Dec 10 20:13:30 2021
==========================================================================================

/dev/sg4 - ST4000VN008-2DR166 - xxxxxxxx - ATA
        Model Number: ST4000VN008-2DR166
        Serial Number: xxxxxxxxxx
        Firmware Revision: SC60
        World Wide Name: 5000C500C97E2CB5
        Drive Capacity (TB/TiB): 4.00/3.64
        Native Drive Capacity (TB/TiB): 4.00/3.64
        Temperature Data:
                Current Temperature (C): 28
                Highest Temperature (C): 41
                Lowest Temperature (C): 0
        Power On Time:  85 days 23 hours 43 minutes
        Power On Hours: 2063.72
        MaxLBA: 7814037167
        Native MaxLBA: 7814037167
        Logical Sector Size (B): 512
        Physical Sector Size (B): 4096
        Sector Alignment: 0
        Rotation Rate (RPM): 5980
        Form Factor: 3.5"
        Last DST information:
                DST has never been run
        Long Drive Self Test Time:  10 hours 9 minutes
        Interface speed:
                Max Speed (Gb/s): 6.0
                Negotiated Speed (Gb/s): 3.0
        Annualized Workload Rate (TB/yr): 123.25
        Total Bytes Read (TB): 20.87
        Total Bytes Written (TB): 8.17
        Encryption Support: Not Supported
        Cache Size (MiB): 64.00
        Read Look-Ahead: Enabled
        Write Cache: Enabled
        Low Current Spinup: Disabled
        SMART Status: Good
        ATA Security Information: Supported
        Firmware Download Support: Full, Segmented, Deferred, DMA
        Specifications Supported:
                ACS-3
                ACS-2
                ATA8-ACS
                ATA/ATAPI-7
                ATA/ATAPI-6
                ATA/ATAPI-5
                ATA/ATAPI-4
                SATA 3.1
                SATA 3.0
                SATA 2.6
                SATA 2.5
                SATA II: Extensions
                SATA 1.0a
                ATA8-AST
        Features Supported:
                Sanitize
                SATA NCQ
                SATA Rebuild Assist
                SATA Software Settings Preservation [Enabled]
                SATA Device Initiated Power Management
                HPA
                Power Management
                Security
                SMART [Enabled]
                DCO
                48bit Address
                PUIS
                APM [Enabled]
                GPL
                Streaming
                SMART Self-Test
                SMART Error Logging
                Write-Read-Verify
                Sense Data Reporting [Enabled]
                SCT Write Same
                SCT Error Recovery Control
                SCT Feature Control
                SCT Data Tables
                Host Logging
                Seagate In Drive Diagnostics (IDD)



However, I noticed that when I run the info, I get following errors to dmesg:
ata5.00: invalid command format 2
Every Seagate disk queryed with the seachest_info gives that error to dmesg from their ata address.  
 

try updating firmware before

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.