6.9.x, LSI Controllers & Ironwolf Disks Disabling - Summary & Fix


Cessquill

Recommended Posts

Just found this topic and thanks for the info (and the afternoon's reading, of course)!

Wish I would have found it sooner.... :P

 

I'll perform the magic on my sole VN004 (problem child) after parity check completes.

FWIW - I only receive a single! read error on it (been driving me nuts).

 

FWIW - my other disks are VN0022s. I have had zero issues with them.

Link to comment

Downloaded SeaChest, followed instructions, and it didn't work too well.

Note: this is using the current version of SeaChest; files end in "x86_64-redhat-linux"  (from the Non-RAID directory).

Is this correct?

 

 

/dev/sg7 - ST8000VN004-2M2101 - WSD56WY4 - ATA
Failed to send EPC command to /dev/sg7.
EPC Feature set might not be supported.
Or EPC Feature might already be in the desired state.

 

 

Any suggestions? I figure I could continue to attempt a fix, return what I have and get another VN022 for a bit more $$$ or get

get something a bit larger.

 

Anyway - the info:

 

==========================================================================================
 SeaChest_Info - Seagate drive utilities - NVMe Enabled
 Copyright (c) 2014-2021 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
 SeaChest_Info Version: 2.1.0-2_2_3 X86_64
 Build Date: Jun 17 2021
 Today: Mon Jan 10 17:16:13 2022        User: root
==========================================================================================

/dev/sg7 - ST8000VN004-2M2101 - <edited> - ATA
        Model Number: ST8000VN004-2M2101
        Serial Number:  <edited>
        Firmware Revision: SC60
        World Wide Name: 5000C500E34389D3
        Drive Capacity (TB/TiB): 8.00/7.28
        Temperature Data:
                Current Temperature (C): 23
                Highest Temperature (C): 35
                Lowest Temperature (C): 22
        Power On Time:  11 days 3 hours
        Power On Hours: 267.00
        MaxLBA: 15628053167
        Native MaxLBA: Not Reported
        Logical Sector Size (B): 512
        Physical Sector Size (B): 4096
        Sector Alignment: 0
        Rotation Rate (RPM): 7200
        Form Factor: 3.5"
        Last DST information:
                Not supported
        Long Drive Self Test Time: Not Supported
        Interface speed:
                Max Speed (Gb/s): 6.0
                Negotiated Speed (Gb/s): 6.0
        Annualized Workload Rate (TB/yr): 1313.05
        Total Bytes Read (TB): 24.01
        Total Bytes Written (TB): 16.01
        Encryption Support: Not Supported
        Cache Size (MiB): 256.00
        Read Look-Ahead: Enabled
        Write Cache: Enabled
        Low Current Spinup: Disabled
        SMART Status: Unknown or Not Supported
        ATA Security Information: Supported
        Firmware Download Support: Full, Segmented, Deferred
        Specifications Supported:
                ACS-4
                ACS-3
                ACS-2
                ATA8-ACS
                ATA/ATAPI-7
                ATA/ATAPI-6
                ATA/ATAPI-5
                SATA 3.3
                SATA 3.2
                SATA 3.1
                SATA 3.0
                SATA 2.6
                SATA 2.5
                SATA II: Extensions
                SATA 1.0a
                ATA8-AST
        Features Supported:
                Sanitize
                SATA NCQ
                SATA Rebuild Assist
                SATA Software Settings Preservation [Enabled]
                SATA Device Initiated Power Management
                Power Management
                Security
                SMART [Enabled]
                48bit Address
                PUIS
                GPL
                Streaming
                SMART Self-Test
                SMART Error Logging
                Write-Read-Verify
                DSN
                AMAC
                EPC [Enabled]
                Sense Data Reporting
                SCT Write Same
                SCT Error Recovery Control
                SCT Feature Control
                SCT Data Tables
                Host Logging
                Set Sector Configuration
        Adapter Information:
                Vendor ID: 1000h
                Product ID: 0072h
                Revision: 0003h

Edited by dingy
Link to comment

Lastly - thinking it could be a firmware issues, so I looked on the Seagate website. There is no new firmware available.

The only thing different between my VN0022 drives and the VN004 problem child is that the VN0022s are running SC61 FW whereas both VN004s (still have the old one in the system) are running SC60.

 

 

The following was in today's disk log:

Jan 9 12:08:37 Tower emhttpd: read SMART /dev/sdh
Jan 9 12:08:56 Tower emhttpd: shcmd (26): echo 128 > /sys/block/sdh/queue/nr_requests
Jan 9 12:08:59 Tower s3_sleep: included disks=sdb sdc sdd sde sdf sdh
Jan 10 00:24:59 Tower emhttpd: spinning down /dev/sdh
Jan 10 03:00:31 Tower kernel: sd 1:0:6:0: [sdh] tag#2660 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00
Jan 10 03:00:35 Tower kernel: sd 1:0:6:0: [sdh] tag#2740 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=19s
Jan 10 03:00:35 Tower kernel: sd 1:0:6:0: [sdh] tag#2740 Sense Key : 0x2 [current]
Jan 10 03:00:35 Tower kernel: sd 1:0:6:0: [sdh] tag#2740 ASC=0x4 ASCQ=0x0
Jan 10 03:00:35 Tower kernel: sd 1:0:6:0: [sdh] tag#2740 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 40 00 00 00 08 00 00
Jan 10 03:00:35 Tower kernel: blk_update_request: I/O error, dev sdh, sector 64 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 10 03:00:53 Tower emhttpd: read SMART /dev/sdh
Jan 10 03:33:10 Tower emhttpd: spinning down /dev/sdh
Jan 10 17:16:32 Tower emhttpd: read SMART /dev/sdh
Jan 10 17:46:15 Tower emhttpd: spinning down /dev/sdh

 

The 3AM time is interesting though, as this is the time (weekly) that data is transferred from the Cache...

Link to comment

read my post on page 4, see that I uploaded the older version which has worked for many.  Give that a try and read instructions carefully.  you can check the current state, see first post instructions.  If EPC is not listed, your drive doesn't support it.  You can also check for new drive firmware and try again.

 

I hope it helps you

Link to comment

Also worth noting you need to shutdown the server with the disk in it, the disk has to lose power and boot back up.

rebooting the server doesn't kill power to the drive.

Mine still had issues till i shutdown the server and powered it back on.

Link to comment

Yeah, this is nutty.

The older version of software shows EPC NOT being enabled. I've tried changing the EPC settings and received the same error

"Failed to send EPC command to /dev/sg7".  I will power-cycle the server (and clear the read error).

 

The process that logs the error? Looks like it might be SMART related. The other drives log SMART at about the same time.

 

The way I figure it - if it happens again, I'll replace the disk controller. I have until Feb 1 to send my replacement back, although since I'm seeing the same exact issue on different drives and different ports (but the same controller) the only thing left is the controller.

Either that, or snag another VN0022..

 

The only other thing would be to install RC2 or backrev...

 

************************

 

UPDATE::

Yep. Nutty. Need beers nutty.

Seems that my drive letters changed. Low current spinup has changed from Disabled to Ultra Low Enabled (FWIW - same settings as is on my VN0022 drives). EPC does not show as enabled. Guess I need to try setting the Low Current spinup, reboot and see what happens.

 

It appears that although disabling EPC "didn't work", something did. Guess it is a feature, not a bug.

 

UPDATE #2:

Was able to disable Low Current Spinup. Rebooted and have verified that Low Current Spinup is disabled and EPC is not enabled. Time to wait and see what breaks.

 

<EDIT>

Shout-outs for the help. Thanks!

Edited by dingy
Link to comment
  • 2 weeks later...
  • 5 weeks later...

Found this thread as I'm trying to decide on buying an LSI HBA to replace my adaptec 6805, but I've got 12 x ST8000VN004's (and another 5 x 6tb ironwolf drives)... So I'm starting to think I'll just stay put. Everything has been pretty solid for me except the other day where my controller had a fit and killed my parity (it happened while I was extracting a bunch of zip files and copying large amounts of data to the array).

 

I'm currently running without parity until I decide what I'm doing (ran without for years on drivepool so I'm not bothered ultimately). That said, this seems to only really be a problem for people that spin down drives? I've generally got all my drives spinning 24/7, so perhaps it won't be an issue? 

Edited by KRiSX
Link to comment
  • 2 weeks later...
  • 2 weeks later...

Big thanks to @Cessquill for the write-up, and for all the other contributors to this thread!  I just successfully fixed this on my server.

 

I finally took the plunge after waiting half a year, as I started having major compatibility issues with Unassigned Devices on 6.8.3, and couldn't put off upgrading anymore.

 

I followed all the steps that Cessquill outlined, and disabled EPC on my four Seagate ST8000NM0055 drives.  I used the latest SeaChest Utility files from Seagate's website, downloaded yesterday.  They appear to have changed again, and this time I used the files from path:

  • \Linux\Non-RAID\centos-7-x86_64\

 

Oddly different than @optiman's experience with his ST8000NM0055's, mine all had Low Current Spinup disabled, so I didn't mess with that.

 

I'm also running on a Marvel based controller, which probably creates a unique data point that this issue doesn't just affect LSI controllers. 

 

Last time I upgraded to 6.9.x, I had major issues, and could not get beyond 66GB of my parity rebuild, which is why I rolled back to 6.8.3.  After applying the EPC fix and upgrading to 6.9.2, I've done multiple drive spindown's/spinup's with no issues, and a full parity check which completed in record time.  It's perhaps too early to celebrate, but it does seem like the issue is resolved on my setup.

 

I also have two pre-cleared ST8000VN0022's that are not in my array.  They had both EPC and Low Current Spinup (Ultra Low) enabled.  I decided to leave Low Current Spinup alone, but went ahead and disabled EPC for both of these drives.  These will migrate into my array in the coming months, so I don't know yet how they'll behave.  I don't even know if I would have had issues with them, but since other users here mentioned them I decided to play it safe.

 

I also used SeaChest_Info to examine my other non-Seagate drives (surprisingly it works), and found that EPC exists and is enabled on my HGST_HUH728080ALE drives, but those don't cause any problems.

 

I kinda hate that these Seagate Exos 8TB drives are such a good value, as they've become my chosen upgrade path, so now I'll have to remember to disable EPC on all new drives going forward.  While I do like the HGST drives better, the price premium is just too much for a server this large.

 

Thanks again!!!

-Paul

  • Like 1
Link to comment
  • 2 weeks later...

Hi 

Is the problem still relevant for 6.10 rc4?

And where can I find the list of affected drive models? I currently have one Seagate drive in my array, and lsi HBA on the way (other disks are wd red pro).

 

My Seagate drive is ST12000NM0008

 

 

 

Link to comment
  • 4 weeks later...

For those of us that are just buying these Seagate drives and have a windows machine handy to "fix" the drives. 

 

Download Seatools and install it in the default directory.

Open admin command prompt and do the following, just the bolded commands

 

cd C:\Program Files\Seagate\SeaChest   This sets the prompt to the correct folder.

SeaChest_Basics_x64_windows --scan     Scans for the Seagate HD, get the Disk Number to fill in for XX, should be PDXX

SeaChest_Basics_X64_windows -d PDXX -i      Gives you the info on the disk.

SeaChest_PowerControl_X64_windows -d PDXX --EPCfeature disable        disables the EPC Feature

SeaChest_Configure_X64_windows -d PDXX --lowCurrentSpinup disable        disables the Low Current Spinup.

REBOOT    the Seagate Drive. 

 

Once the drive is back up, do the following to ensure that the 2 features are disabled.

1. Open an admin command prompt

2. cd C:\Program Files\Seagate\SeaChest 

3. SeaChest_Basics_x64_windows --scan     Scans for the Seagate HD, get the Disk Number to fill in for XX, should be PDXX

4. SeaChest_Basics_X64_windows -d PDXX -i      Gives you the info on the disk.

 

The info should be in the middle for the Low Current Spinup and near the end for EPC.

  • Like 2
Link to comment
Posted (edited)
On 4/30/2022 at 7:28 PM, krazijoe said:

For those of us that are just buying these Seagate drives and have a windows machine handy to "fix" the drives. 

 

Download Seatools and install it in the default directory.

Open admin command prompt and do the following, just the bolded commands

 

cd C:\Program Files\Seagate\SeaChest   This sets the prompt to the correct folder.

SeaChest_Basics_x64_windows --scan     Scans for the Seagate HD, get the Disk Number to fill in for XX, should be PDXX

SeaChest_Basics_X64_windows -d PDXX -i      Gives you the info on the disk.

SeaChest_PowerControl_X64_windows -d PDXX --EPCfeature disable        disables the EPC Feature

SeaChest_Configure_X64_windows -d PDXX --lowCurrentSpinup disable        disables the Low Current Spinup.

REBOOT    the Seagate Drive. 

 

Once the drive is back up, do the following to ensure that the 2 features are disabled.

1. Open an admin command prompt

2. cd C:\Program Files\Seagate\SeaChest 

3. SeaChest_Basics_x64_windows --scan     Scans for the Seagate HD, get the Disk Number to fill in for XX, should be PDXX

4. SeaChest_Basics_X64_windows -d PDXX -i      Gives you the info on the disk.

 

The info should be in the middle for the Low Current Spinup and near the end for EPC.

 

Thanks for the info, I may try this. Sounds easier than messing with the settings on Linux, just move the drive to a Windows system.
I also have a ST8000VN004-2M21 SC60 drive which gives weird errors when connected to LSI HBA card. None of my WD Red 8TB drives have any issues. 

So far I just had the Seagate drive connected to the internal SATA (Asmedia controller) and experience no issues.  ¯\_(ツ)_/¯

Unraid 6.9.2, Asus AMD X370, Ryzen 2700x, LSI 2308 controller.

Edited by Glomp
Link to comment
2 minutes ago, Glomp said:

Thanks for the info, I may try this. Sounds easier than messing with the settings on Linux, just move the drive to a Windows system.

Very useful, yes - thank you.  If I had a USB SATA dock, or benchtop test PC I'd probably do that.  My Unraid drives are in hot swap caddies though, so it's easier for me to do it in Linux.  However, think I've changed 4/5 drives since writing this, and have had to refer back to this thread every time for the commands.

  • Like 1
Link to comment
  • 4 weeks later...
Posted (edited)

Thanks to @JorgeB for pointing me to this thread, and to @Cessquill for the good work finding, working in, and resolving this obscure issue.

 

It's plagued me for a few months (and I was away traveling, so I was hoping I wouldn't drop my array while away.  Simple change, reboot, all good. 

Edited by Xoron
Link to comment
  • 5 weeks later...
  • 4 weeks later...
  • 2 weeks later...

I have experienced this on two drives this week running 6.10.3.

 

Ran the tools/commands, even on my other 8tb drives that were 'fine'. Parity rebuilt, working fine. As of today, two more drives have been disabled (drives that I ran the commands on too). Not sure if I should move the 8tb drives on and replace with 10tb, or jump ship back to WD!

Link to comment
3 hours ago, BOGLOAD said:

Ran the tools/commands, even on my other 8tb drives that were 'fine'. Parity rebuilt, working fine. As of today, two more drives have been disabled (drives that I ran the commands on too).

Are you sure the drives are set correctly?  And they are dropping offline during normal running of Unraid (ie, not during a reboot)?

 

I have 8 of the models in my array, and all have been stable since applying these settings to all of them.  I occasionally get a drive drop off during a reboot, but I think that's because I'm pushing my PSU too hard (need to cut down on drives)

Link to comment
On 7/28/2022 at 6:48 PM, Cessquill said:

Are you sure the drives are set correctly?  And they are dropping offline during normal running of Unraid (ie, not during a reboot)?

 

I have 8 of the models in my array, and all have been stable since applying these settings to all of them.  I occasionally get a drive drop off during a reboot, but I think that's because I'm pushing my PSU too hard (need to cut down on drives)

Absolutely. Rebuilt the array, and things have been fine since then. Double and triple checked all the 8TB drives - ran the commands again to ensure I did do just that. Haven't had any drop-offs in a few days, and I've tried rebooting/power cycling a few times to see if it happens again -- nothing for now. Hopefully this doesn't happen again! Got a 750w PSU here, not overly worried about using too much juice :)

Link to comment
4 minutes ago, BOGLOAD said:

...and I've tried rebooting/power cycling a few times to see if it happens again...

As I understand it, this issue causes the drives to drop off when they spin down/up whilst the array is running.  If you're losing drives on a machine restart, I suspect that may be a different issue.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.