6.9.x, LSI Controllers & Ironwolf Disks Disabling - Summary & Fix

Cessquill · April 9, 2021

15 hours ago, optiman said:

I was thinking of using the Seagate provided usb linux bootable flash builder and boot to that and run the commands outside of unraid. Given I only have seagate drives, I will need to do them all. Has anyone tried this with success?

I haven't tried the bootable Seagate utility, but I was assuming it would just load to a command prompt with the tools preinstalled. For me it was easier to go via Unraid (plus no downtime).

In other news, I haven't had a single issue since applying the above (before I had three issues in about a week).

optiman · April 9, 2021

I'm trying to decide if I should update the fw on the 8tb drives or leave them on SN04. I haven't had any issues and they say we should not upgrade unless you are having issues.

Advice please - upgrade SN05 on those 8tb drives first, or leave them and make the changes using TDD's instructions and then upgrade unraid?

MisterWolfe · April 9, 2021

I'm on 6.9.1 and have an LSI 9200-8i controller. No issues at all with my ironwolf drives, thankfully. I wonder if the issue is card version specific.

Cessquill · April 9, 2021

1 minute ago, MisterWolfe said:

I'm on 6.9.1 and have an LSI 9200-8i controller. No issues at all with my ironwolf drives, thankfully. I wonder if the issue is card version specific.

I only had issues with one model of IronWolf drives (mentioned in first post). All others were fine. Trouble is, out of 16 Ironwolf's, 4 were that model.

TDD · April 9, 2021

I only know of the exact 8TB unit in question that requires this tweak.

I presented Seagate all the info on the issue but got meaningless responses back. Was hoping to chat with the hardware/firmware guys.

We can only hope the intel makes it to where it needs to be.

For note, my testing was done on both my LSI controllers and the same outcome was found prior to the fix:

[1000:0064]01:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02)

[1000:0072]02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)

Kev.

tdunks · April 13, 2021

I have three St8000as002 drives that have started having this issue too but this fix did not work. All three have the exact same number of read errors only during parity checks and always near the end. I am not sure if there is another issue at work as well.

YB96 · April 15, 2021

Thanks for the work, but I need your advice if I am affected since I have slightly different symptoms.

I have one ST8000VN004 drive and an LSI SAS 2008 controller with newest it mode firmware (at least it was in august).

Μy problem is I have read errors on that 8tb ironwolf, only read errors. The drive does not get disabled, but the counter is increasing.

The read Errors started about a week ago.

I had the 6.9 beta since about november and no problems, except for one time where the exact same disk had a lot of errors which was probably due to a plug not fully inserted after adding a new drive. That time I had many UDMA CRC errors in smart which did not increase since i reseated the cable so this is probably not related(?).

Shall I go ahead and apply this fix or do you guys think it´s unrelated?

Logs attached.

unraid-nick-diagnostics-20210416-0104.zip

JorgeB · April 16, 2021

7 hours ago, YB96 said:

Μy problem is I have read errors on that 8tb ironwolf, only read errors.

Error appear to always be during spin up, try the fix or see if disabling spin down for that disk helps.

optiman · April 16, 2021

Ok now we have several Seagate models affected, something is just not right here. Does anyone know what has actually changed and caused this? Can the Unraid team fix this in a future release or does this mean that for everyone running Seagate drives are at risk? Even if the fix works today, how do you know it will be ok in the next release? It seems there is a deeper issue here that must get addressed.

With all of this, I'm staying 6.8.3 for now and continue to enjoy my trouble-free server. I don't have any spare drives to test with and data loss is not an option for me.

A fix that does not involve messing with drive fw or options would much appreciated.

JorgeB · April 16, 2021

1 hour ago, optiman said:

Can the Unraid team fix this in a future release or does this mean that for everyone running Seagate drives are at risk?

No, if there's a problem it's with the LSI driver and some Seagates drives, either LSI or Seagate would need to fix it.

Cessquill · April 16, 2021

1 hour ago, optiman said:

Even if the fix works today, how do you know it will be ok in the next release?

Because it was a fault with either the drive or the controller, and the fix was a change to the drives settings. I understand that other systems have also had problems with this drive/controller combo. Any future upgrade could theoretically break a previously unfound issue with anything.

If the manufacturers don't step up then I'd reconsider whether to use their hardware for server work in future. Just as I wouldn't set up a pfSense box using Realtek NICs.

If it helps, I've had zero issues since reining in the drive's settings.

YB96 · April 17, 2021

18 hours ago, JorgeB said:

Error appear to always be during spin up, try the fix or see if disabling spin down for that disk helps.

Thanks, today my drive got kicked from the array and I have to rebuild it. When I was following your guide I noticed something: EPC was already disabled, so this might indicate disabling the low currentt spinup is indeed required.

Don´t know if it helps spotting something, this is my drive information.

/dev/sg4 - ST8000VN004-2M2101 - *hidden* - ATA
        Model Number: ST8000VN004-2M2101
        Serial Number: *hidden*
        Firmware Revision: SC60
        World Wide Name: 5000C500CF63B876
        Drive Capacity (TB/TiB): 8.00/7.28
        Native Drive Capacity (TB/TiB): 8.00/7.28
        Temperature Data:
                Current Temperature (C): 34
                Highest Temperature (C): 59
                Lowest Temperature (C): 19
        Power On Time:  224 days 17 hours 
        Power On Hours: 5393.00
        MaxLBA: 15628053167
        Native MaxLBA: 15628053167
        Logical Sector Size (B): 512
        Physical Sector Size (B): 4096
        Sector Alignment: 0
        Rotation Rate (RPM): 7200
        Form Factor: 3.5"
        Last DST information:
                Time since last DST (hours): 3869.00
                DST Status/Result: 0x0
                DST Test run: 0x1
        Long Drive Self Test Time:  12 hours 30 minutes 
        Interface speed:
                Max Speed (Gb/s): 6.0
                Negotiated Speed (Gb/s): 6.0
        Annualized Workload Rate (TB/yr): 308.52
        Total Bytes Read (TB): 150.15
        Total Bytes Written (TB): 39.79
        Encryption Support: Not Supported
        Cache Size (MiB): 256.00
        Read Look-Ahead: Enabled
        Write Cache: Enabled
        Low Current Spinup: Ultra Low Enabled
        SMART Status: Unknown or Not Supported
        ATA Security Information: Supported
        Firmware Download Support: Full, Segmented, Deferred
        Specifications Supported:
                ACS-4
                ACS-3
                ACS-2
                ATA8-ACS
                ATA/ATAPI-7
                ATA/ATAPI-6
                ATA/ATAPI-5
                SATA 3.3
                SATA 3.2
                SATA 3.1
                SATA 3.0
                SATA 2.6
                SATA 2.5
                SATA II: Extensions
                SATA 1.0a
                ATA8-AST
        Features Supported:
                Sanitize
                SATA NCQ
                SATA Rebuild Assist
                SATA Software Settings Preservation [Enabled]
                SATA Device Initiated Power Management
                Power Management
                Security
                SMART [Enabled]
                48bit Address
                PUIS
                GPL
                Streaming
                SMART Self-Test
                SMART Error Logging
                Write-Read-Verify
                DSN
                AMAC
                EPC
                Sense Data Reporting
                SCT Write Same
                SCT Error Recovery Control
                SCT Feature Control
                SCT Data Tables
                Host Logging
                Set Sector Configuration
                Seagate In Drive Diagnostics (IDD)
        Adapter Information:
                Vendor ID: 1000h
                Product ID: 0072h
                Revision: 0003h

Edited April 17, 2021 by YB96
hide the S/N

TDD · April 17, 2021

There very well could be edge cases with other Ironwolf drives but assuredly it is an issue with the ST8000VN004.

I would not bet on a timely, if ever, firmware update for the drive itself.

The two changes make the drive more aggressive with its spinup and readyness to compensate for the driver timing out while waiting for its ready state.

You have nothing to lose by making these changes as they are reversible; the amount of power saving is negligible IMHO and the benefits of a upgraded UnRAID are worth it.

Try and see!

Kev.

jamikest · April 17, 2021

I just wanted to give a quick THANK YOU for this post.

I was receiving multiple errors when my 8TB Ironwolf drive would spin up. I went through cables, relocating on the controller, and finally trying a new 8TB Ironwolf drive. The issue persisted through all of my measures. Digging a bit deeper, I found this post. I tried the SeaChest commands and the spin up errors are resolved.

For anyone else forum searching, here is the syslog output anytime a drive would spin up (sometimes with read errors in unraid, sometimes no read errors as in the example below):

Apr 17 11:03:37 Tower emhttpd: spinning up /dev/sdc
Apr 17 11:03:53 Tower kernel: sd 7:0:1:0: attempting task abort!scmd(0x000000009175e648), outstanding for 15282 ms & timeout 15000 ms
Apr 17 11:03:53 Tower kernel: sd 7:0:1:0: [sdc] tag#1097 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e3 00
Apr 17 11:03:53 Tower kernel: scsi target7:0:1: handle(0x0009), sas_address(0x4433221101000000), phy(1)
Apr 17 11:03:53 Tower kernel: scsi target7:0:1: enclosure logical id(0x5c81f660d1f49300), slot(2)
Apr 17 11:03:56 Tower kernel: sd 7:0:1:0: task abort: SUCCESS scmd(0x000000009175e648)
Apr 17 11:03:56 Tower emhttpd: read SMART /dev/sdc

After disabling low current and EPC, here is the result of spinning up the same drives (no errors!):

Apr 17 12:08:42 Tower emhttpd: spinning up /dev/sdc
Apr 17 12:08:51 Tower emhttpd: read SMART /dev/sdc

jamikest · April 19, 2021

On 4/16/2021 at 11:48 PM, TDD said:

There very well could be edge cases with other Ironwolf drives but assuredly it is an issue with the ST8000VN004.

I would not bet on a timely, if ever, firmware update for the drive itself.

The two changes make the drive more aggressive with its spinup and readyness to compensate for the driver timing out while waiting for its ready state.

You have nothing to lose by making these changes as they are reversible; the amount of power saving is negligible IMHO and the benefits of a upgraded UnRAID are worth it.

Try and see!

Kev.

Just to add to this a bit further - I am running both 4TB ST4000VN008 and 8TB ST8000VN004 Ironwolf drives. I have not had a single error on my (5) 4 TB drives since I started my build about ~5 months ago. As soon as I added an 8TB Ironwolf to my array, the errors started. One more thing I find interesting is that I swapped in an 8TB Ironwolf to my parity ~6 weeks ago and have no errors on that drive located in my parity. I am not sure why the parity drive behaves differently. I disabled EPC and low power spin up on all the 8TB drives (parity and array) and left the 4TB as is.

TDD · April 19, 2021

My 8TB Ironwolf was the sole Seagate and it was the parity that errored out. It all comes down to strictly how idle the drive is and spin-ups past that.

Kev.

Mason736 · April 28, 2021

On 4/19/2021 at 10:30 AM, TDD said:

My 8TB Ironwolf was the sole Seagate and it was the parity that errored out. It all comes down to strictly how idle the drive is and spin-ups past that.

Kev.

I can confirm this. Over since I changed the ST8000VN004 drives (4 of them) to never spin down, always be spun up, they have been fine, and have not dropped from the array.

TDD · April 28, 2021

I want to add that my solo 8TB drive, my parity, does spin down and up as needed and does not always spin. This fix does not affect any requests to go idle.

Kev.

edrohler · May 10, 2021

WOW! Thank you for this thread. I have been scratching my head about this all week. I posted here. Instead of tweaking the drives, I am just going to disable the spin down delay for any drives in the enclosure and hope for an update to the driver.

TDD · May 10, 2021

The full tweak allows spin downs gracefully so you give up nothing. Might even save a watt or two :-).

Kev.

Cessquill · May 10, 2021

8 hours ago, edrohler said:

WOW! Thank you for this thread. I have been scratching my head about this all week. I posted here. Instead of tweaking the drives, I am just going to disable the spin down delay for any drives in the enclosure and hope for an update to the driver.

Just saw your post in the unbalance thread, and was about to suggest you check here. No need now

lgil · May 13, 2021

Hi,

I have followed the steps to disable EPC and Low Current Spinup, all commands finished without errors.

Today morning one of my 2 disks ST8000VN004 (disk 1 - sdf), show a read disk error below:

Disk1_Error.png.322ab9db38ec40bb5a196530a68884aa.png

Ran a Smart Test and no errors, reboot Unraid and now show no errors to Disk 1.

I really appreciate any help.

kolepard · May 13, 2021

I have 3 of these drives in my array, and since applying the fix, I have had no problems, and my array has been on 24/7 (currently over 44 days) with multiple spin ups and spin downs of all the drives.

I am not close to the level of expertise of others here, but my guess is that this is a seperate, unrelated issue.

Kevin

lgil · May 15, 2021

On 5/13/2021 at 7:22 PM, lgil said:

Hi,

I have followed the steps to disable EPC and Low Current Spinup, all commands finished without errors.

Today morning one of my 2 disks ST8000VN004 (disk 1 - sdf), show a read disk error below:

Ran a Smart Test and no errors, reboot Unraid and now show no errors to Disk 1.

I really appreciate any help.

An update

I had setup back to the previous configuration and reboot my Unraid server.

Before starting the array, I have tried one more time this process and is working now.

May 15 05:11:49 honeysnas emhttpd: spinning down /dev/sde

May 15 05:11:49 honeysnas emhttpd: spinning down /dev/sdf

May 15 06:21:17 honeysnas emhttpd: read SMART /dev/sdf

May 15 06:51:21 honeysnas emhttpd: spinning down /dev/sdf

May 15 09:45:33 honeysnas emhttpd: read SMART /dev/sde

May 15 09:45:33 honeysnas emhttpd: read SMART /dev/sdf

May 15 10:17:25 honeysnas emhttpd: spinning down /dev/sde

May 15 10:17:25 honeysnas emhttpd: spinning down /dev/sdf

No read errors since yesterday afternoon

Thank you @Cessquill, great work.

TangoEchoAlpha · June 5, 2021

I am very glad that I stumbled across this thread - thanks @Cessquill !

I've got a LSI card ordered from Art Of Server that I am waiting to have delivered - I've gone for the LSI 9201-8i, which I understand is the same as the 9211-8i but without the IR mode NVRAM chip. I was just about to order some more drives and was about to hit the buy button on the Seagates.

Does anyone know if this issue affects the ST8000NE001? This is the same drive as the ST8000VN004, just the former is the Ironwolf Pro and the latter the standard Ironwolf? I was about to buy the Pro drive, not because of the 'Pro' moniker but for the extra 2 years warranty. And given that the retailer does a 48 hour replacement service for the lifetime of the warranty, that extra two years could be of benefit - the price difference was only £10 GBP.

Running Unraid 6.9.2

Thanks in advance 😎

Edited June 5, 2021 by TangoEchoAlpha

6.9.x, LSI Controllers & Ironwolf Disks Disabling - Summary & Fix

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Cessquill

krazijoe

Cessquill

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation