6.9.x, LSI Controllers & Ironwolf Disks Disabling - Summary & Fix


Cessquill

Recommended Posts

15 hours ago, optiman said:

I was thinking of using the Seagate provided usb linux bootable flash builder and boot to that and run the commands outside of unraid.  Given I only have seagate drives, I will need to do them all.  Has anyone tried this with success?

I haven't tried the bootable Seagate utility, but I was assuming it would just load to a command prompt with the tools preinstalled.  For me it was easier to go via Unraid (plus no downtime).

 

In other news, I haven't had a single issue since applying the above (before I had three issues in about a week).

Link to comment

I'm trying to decide if I should update the fw on the 8tb drives or leave them on SN04.  I haven't had any issues and they say we should not upgrade unless you are having issues.

 

Advice please - upgrade SN05 on those 8tb drives first, or leave them and make the changes using TDD's instructions and then upgrade unraid?

Link to comment
1 minute ago, MisterWolfe said:

I'm on 6.9.1 and have an LSI 9200-8i controller. No issues at all with my ironwolf drives, thankfully. I wonder if the issue is card version specific.

I only had issues with one model of IronWolf drives (mentioned in first post).  All others were fine.  Trouble is, out of 16 Ironwolf's, 4 were that model.

Link to comment

I only know of the exact 8TB unit in question that requires this tweak.

 

I presented Seagate all the info on the issue but got meaningless responses back.  Was hoping to chat with the hardware/firmware guys.

We can only hope the intel makes it to where it needs to be.

 

For note, my testing was done on both my LSI controllers and the same outcome was found prior to the fix:

 

[1000:0064]01:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02)

[1000:0072]02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)

 

Kev.

Link to comment

I have three St8000as002 drives that have started having this issue too but this fix did not work. All three have the exact same number of read errors only during parity checks and always near the end. I am not sure if there is another issue at work as well.

Link to comment

Thanks for the work, but I need your advice if I am affected since I have slightly different symptoms.

I have one ST8000VN004 drive and an LSI SAS 2008 controller with newest it mode firmware (at least it was in august).

 

Μy problem is I have read errors on that 8tb ironwolf, only read errors. The drive does not get disabled, but the counter is increasing.

The read Errors started about a week ago. 

I had the 6.9 beta since about november and no problems, except for one time where the exact same disk had a lot of errors which was probably due to a plug not fully inserted after adding a new drive. That time I had many UDMA CRC errors in smart which did not increase since i reseated the cable so this is probably not related(?).

 

Shall I go ahead and apply this fix or do you guys think it´s unrelated? 

Logs attached.

unraid-nick-diagnostics-20210416-0104.zip

Link to comment

Ok now we have several Seagate models affected, something is just not right here.  Does anyone know what has actually changed and caused this?  Can the Unraid team fix this in a future release or does this mean that for everyone running Seagate drives are at risk?  Even if the fix works today, how do you know it will be ok in the next release?  It seems there is a deeper issue here that must get addressed.

 

With all of this, I'm staying 6.8.3 for now and continue to enjoy my trouble-free server.  I don't have any spare drives to test with and data loss is not an option for me. 

 

A fix that does not involve messing with drive fw or options would much appreciated.

Link to comment
1 hour ago, optiman said:

Can the Unraid team fix this in a future release or does this mean that for everyone running Seagate drives are at risk?

No, if there's a problem it's with the LSI driver and some Seagates drives, either LSI or Seagate would need to fix it.

Link to comment
1 hour ago, optiman said:

Even if the fix works today, how do you know it will be ok in the next release?

Because it was a fault with either the drive or the controller, and the fix was a change to the drives settings.  I understand that other systems have also had problems with this drive/controller combo.  Any future upgrade could theoretically break a previously unfound issue with anything.

 

If the manufacturers don't step up then I'd reconsider whether to use their hardware for server work in future.  Just as I wouldn't set up a pfSense box using Realtek NICs.

 

If it helps, I've had zero issues since reining in the drive's settings.

Link to comment
18 hours ago, JorgeB said:

Error appear to always be during spin up, try the fix or see if disabling spin down for that disk helps.

Thanks, today my drive got kicked from the array and I have to rebuild it. When I was following your guide I noticed something: EPC was already disabled, so this might indicate disabling the low currentt spinup is indeed required.

 

Don´t know if it helps spotting something, this is my drive information. 

/dev/sg4 - ST8000VN004-2M2101 - *hidden* - ATA
        Model Number: ST8000VN004-2M2101
        Serial Number: *hidden*
        Firmware Revision: SC60
        World Wide Name: 5000C500CF63B876
        Drive Capacity (TB/TiB): 8.00/7.28
        Native Drive Capacity (TB/TiB): 8.00/7.28
        Temperature Data:
                Current Temperature (C): 34
                Highest Temperature (C): 59
                Lowest Temperature (C): 19
        Power On Time:  224 days 17 hours 
        Power On Hours: 5393.00
        MaxLBA: 15628053167
        Native MaxLBA: 15628053167
        Logical Sector Size (B): 512
        Physical Sector Size (B): 4096
        Sector Alignment: 0
        Rotation Rate (RPM): 7200
        Form Factor: 3.5"
        Last DST information:
                Time since last DST (hours): 3869.00
                DST Status/Result: 0x0
                DST Test run: 0x1
        Long Drive Self Test Time:  12 hours 30 minutes 
        Interface speed:
                Max Speed (Gb/s): 6.0
                Negotiated Speed (Gb/s): 6.0
        Annualized Workload Rate (TB/yr): 308.52
        Total Bytes Read (TB): 150.15
        Total Bytes Written (TB): 39.79
        Encryption Support: Not Supported
        Cache Size (MiB): 256.00
        Read Look-Ahead: Enabled
        Write Cache: Enabled
        Low Current Spinup: Ultra Low Enabled
        SMART Status: Unknown or Not Supported
        ATA Security Information: Supported
        Firmware Download Support: Full, Segmented, Deferred
        Specifications Supported:
                ACS-4
                ACS-3
                ACS-2
                ATA8-ACS
                ATA/ATAPI-7
                ATA/ATAPI-6
                ATA/ATAPI-5
                SATA 3.3
                SATA 3.2
                SATA 3.1
                SATA 3.0
                SATA 2.6
                SATA 2.5
                SATA II: Extensions
                SATA 1.0a
                ATA8-AST
        Features Supported:
                Sanitize
                SATA NCQ
                SATA Rebuild Assist
                SATA Software Settings Preservation [Enabled]
                SATA Device Initiated Power Management
                Power Management
                Security
                SMART [Enabled]
                48bit Address
                PUIS
                GPL
                Streaming
                SMART Self-Test
                SMART Error Logging
                Write-Read-Verify
                DSN
                AMAC
                EPC
                Sense Data Reporting
                SCT Write Same
                SCT Error Recovery Control
                SCT Feature Control
                SCT Data Tables
                Host Logging
                Set Sector Configuration
                Seagate In Drive Diagnostics (IDD)
        Adapter Information:
                Vendor ID: 1000h
                Product ID: 0072h
                Revision: 0003h

 

Edited by YB96
hide the S/N
Link to comment

There very well could be edge cases with other Ironwolf drives but assuredly it is an issue with the ST8000VN004.

 

I would not bet on a timely, if ever, firmware update for the drive itself.

 

The two changes make the drive more aggressive with its spinup and readyness to compensate for the driver timing out while waiting for its ready state.

You have nothing to lose by making these changes as they are reversible; the amount of power saving is negligible IMHO and the benefits of a upgraded UnRAID are worth it.

 

Try and see!

 

Kev.

Link to comment

I just wanted to give a quick THANK YOU for this post. 

 

I was receiving multiple errors when my 8TB Ironwolf drive would spin up.  I went through cables, relocating on the controller, and finally trying a new 8TB Ironwolf drive.  The issue persisted through all of my measures.  Digging a bit deeper, I found this post. I tried the SeaChest commands and the spin up errors are resolved.

 

For anyone else forum searching, here is the syslog output anytime a drive would spin up (sometimes with read errors in unraid, sometimes no read errors as in the example below):

 

Apr 17 11:03:37 Tower emhttpd: spinning up /dev/sdc
Apr 17 11:03:53 Tower kernel: sd 7:0:1:0: attempting task abort!scmd(0x000000009175e648), outstanding for 15282 ms & timeout 15000 ms
Apr 17 11:03:53 Tower kernel: sd 7:0:1:0: [sdc] tag#1097 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e3 00
Apr 17 11:03:53 Tower kernel: scsi target7:0:1: handle(0x0009), sas_address(0x4433221101000000), phy(1)
Apr 17 11:03:53 Tower kernel: scsi target7:0:1: enclosure logical id(0x5c81f660d1f49300), slot(2)
Apr 17 11:03:56 Tower kernel: sd 7:0:1:0: task abort: SUCCESS scmd(0x000000009175e648)
Apr 17 11:03:56 Tower emhttpd: read SMART /dev/sdc

 

After disabling low current and EPC, here is the result of spinning up the same drives (no errors!):


Apr 17 12:08:42 Tower emhttpd: spinning up /dev/sdc
Apr 17 12:08:51 Tower emhttpd: read SMART /dev/sdc
 

  • Like 1
Link to comment
On 4/16/2021 at 11:48 PM, TDD said:

There very well could be edge cases with other Ironwolf drives but assuredly it is an issue with the ST8000VN004.

 

I would not bet on a timely, if ever, firmware update for the drive itself.

 

The two changes make the drive more aggressive with its spinup and readyness to compensate for the driver timing out while waiting for its ready state.

You have nothing to lose by making these changes as they are reversible; the amount of power saving is negligible IMHO and the benefits of a upgraded UnRAID are worth it.

 

Try and see!

 

Kev.

 

Just to add to this a bit further - I am running both 4TB ST4000VN008 and 8TB ST8000VN004 Ironwolf drives.  I have not had a single error on my (5) 4 TB drives since I started my build about ~5 months ago.  As soon as I added an 8TB Ironwolf to my array, the errors started. One more thing I find interesting is that I swapped in an 8TB Ironwolf to my parity ~6 weeks ago and have no errors on that drive located in my parity.  I am not sure why the parity drive behaves differently.  I disabled EPC and low power spin up on all the 8TB drives (parity and array) and left the 4TB as is.

Link to comment
  • 2 weeks later...
On 4/19/2021 at 10:30 AM, TDD said:

My 8TB Ironwolf was the sole Seagate and it was the parity that errored out.  It all comes down to strictly how idle the drive is and spin-ups past that.

 

Kev.

 

I can confirm this.  Over since I changed the ST8000VN004 drives (4 of them) to never spin down, always be spun up, they have been fine, and have not dropped from the array.

Link to comment
  • 2 weeks later...
8 hours ago, edrohler said:

WOW! Thank you for this thread. I have been scratching my head about this all week. I posted here. Instead of tweaking the drives, I am just going to disable the spin down delay for any drives in the enclosure and hope for an update to the driver. 

Just saw your post in the unbalance thread, and was about to suggest you check here.  No need now :)

  • Thanks 1
Link to comment
  • Cessquill changed the title to 6.9.x, LSI Controllers & Ironwolf Disks Disabling - Summary & Fix

Hi,

I have followed the steps to disable EPC and Low Current Spinup, all commands finished without errors.

 

Today morning one of my 2 disks ST8000VN004 (disk 1 - sdf), show a read disk error below:

Disk1_Error.png.322ab9db38ec40bb5a196530a68884aa.png

 

Ran a Smart Test and no errors, reboot Unraid  and now show no errors to Disk 1.

I really appreciate any help.

 

 

 

Link to comment

I have 3 of these drives in my array, and since applying the fix, I have had no problems, and my array has been on 24/7 (currently over 44 days) with multiple spin ups and spin downs of all the drives.

 

I am not close to the level of expertise of others here, but my guess is that this is a seperate, unrelated issue.

 

Kevin

Link to comment
On 5/13/2021 at 7:22 PM, lgil said:

Hi,

I have followed the steps to disable EPC and Low Current Spinup, all commands finished without errors.

 

Today morning one of my 2 disks ST8000VN004 (disk 1 - sdf), show a read disk error below:

Disk1_Error.png.322ab9db38ec40bb5a196530a68884aa.png

 

Ran a Smart Test and no errors, reboot Unraid  and now show no errors to Disk 1.

I really appreciate any help.

 

 

 

 

An update :)

 

I had setup back to the previous configuration and reboot my Unraid server.

 

Before starting the array, I have tried one more time this process and is working now.

 

May 15 05:11:49 honeysnas emhttpd: spinning down /dev/sde

May 15 05:11:49 honeysnas emhttpd: spinning down /dev/sdf

May 15 06:21:17 honeysnas emhttpd: read SMART /dev/sdf

May 15 06:51:21 honeysnas emhttpd: spinning down /dev/sdf

 

May 15 09:45:33 honeysnas emhttpd: read SMART /dev/sde

May 15 09:45:33 honeysnas emhttpd: read SMART /dev/sdf

May 15 10:17:25 honeysnas emhttpd: spinning down /dev/sde

May 15 10:17:25 honeysnas emhttpd: spinning down /dev/sdf

 

No read errors since yesterday afternoon

 

Thank you @Cessquill, great work.

  • Like 1
Link to comment
  • 3 weeks later...

I am very glad that I stumbled across this thread - thanks @Cessquill !

 

I've got a LSI card ordered from Art Of Server that I am waiting to have delivered - I've gone for the LSI 9201-8i, which I understand is the same as the 9211-8i but without the IR mode NVRAM chip. I was just about to order some more drives and was about to hit the buy button on the Seagates.

 

Does anyone know if this issue affects the ST8000NE001? This is the same drive as the ST8000VN004, just the former is the Ironwolf Pro and the latter the standard Ironwolf? I was about to buy the Pro drive, not because of the 'Pro' moniker but for the extra 2 years warranty. And given that the retailer does a 48 hour replacement service for the lifetime of the warranty, that extra two years could be of benefit - the price difference was only £10 GBP.

 

Running Unraid 6.9.2

 

Thanks in advance 😎

Edited by TangoEchoAlpha
  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.