doron Posted October 6, 2020 Author Share Posted October 6, 2020 13 minutes ago, dansonamission said: The drive is a ST4000NM0023 Seagate, revision GS0D and the card HP H220 LSI 9205-8i 9207 with P20 firmware. This manual for this drive (in fact, the entire Constellation ES.3 series) seems to indicate (sec 6.1) that an explicit NOTIFY needs to be sent to the device to recover from the spindown mode we're sending the device into (Standby_Z). This is in contrast with other devices tested (e.g. WD/HGST) that automatically spin up when sent to this state. Has anyone seen positive results with this drive and this plugin? We might end up having to enumerate the drive types where this works well vs. those that fail, and build a white list in the plugin. Nasty 😞 Can I get brief messages here from anyone who's using (or tried using) the plugin, reporting success/failure? Just a one liner with: <HDD Model> <Success/Failure> (<optional comment>) would be great. Example; HUH721212AL4200 Success PM would also work if you don't want to post. Thanks! Quote Link to comment
SimonF Posted October 6, 2020 Share Posted October 6, 2020 (edited) My drives are as follows, not tested with the plugin but work with manual commands. HGST HUS724030ALS640 A1C4 Success HITACHI HMRSK2000GBAS07K 3P02 Success Will get a seagate drive and do some testing and feedback Edited October 6, 2020 by SimonF Additional Info Quote Link to comment
SuperDan Posted October 7, 2020 Share Posted October 7, 2020 (edited) HITACHI HUC106060CSS600 A430 Failure Read errors when spun down and trying to wake back up. They do spin backup up but have had random (3 times) drives with red x. Array is 20 drives of the above type. Edited October 7, 2020 by SuperDan Quote Link to comment
doron Posted October 7, 2020 Author Share Posted October 7, 2020 39 minutes ago, SuperDan said: HITACHI HUC106060CSS600 A430 Failure Read errors when spun down and trying to wake back up. They do spin backup up but have had random (3 times) drives with red x. Array is 20 drives of the above type. Thanks. So basically you're saying the result in your case is not consistent? Some (most?) of the time they do spin back up but at times they get the read errors? Could you paste syslog lines from the time of such error that red-x-ed a drive? Quote Link to comment
SuperDan Posted October 7, 2020 Share Posted October 7, 2020 1 hour ago, doron said: Thanks. So basically you're saying the result in your case is not consistent? Some (most?) of the time they do spin back up but at times they get the read errors? That is correct. I had to re enable the plug to get the log entries. This time 2 drives red x'd on me. The only entry I saw related to those drives are: Oct 7 09:32:46 unNAS kernel: blk_update_request: I/O error, dev sdz, sector 64 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Oct 7 09:32:46 unNAS kernel: md: disk17 read error, sector=0 Oct 7 09:32:51 unNAS kernel: blk_update_request: I/O error, dev sdz, sector 64 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Oct 7 09:32:51 unNAS kernel: md: disk17 write error, sector=0 Oct 7 09:41:37 unNAS kernel: blk_update_request: I/O error, dev sdu, sector 586549720 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Oct 7 09:41:37 unNAS kernel: md: disk15 read error, sector=586549656 Oct 7 09:41:37 unNAS kernel: blk_update_request: I/O error, dev sdu, sector 586549720 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0 Oct 7 09:41:37 unNAS kernel: md: disk15 write error, sector=586549656 Oct 7 09:41:37 unNAS kernel: blk_update_request: I/O error, dev sdu, sector 64 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Oct 7 09:41:37 unNAS kernel: md: disk15 write error, sector=0 Quote Link to comment
SuperDan Posted October 7, 2020 Share Posted October 7, 2020 1 hour ago, doron said: Thanks. So basically you're saying the result in your case is not consistent? Some (most?) of the time they do spin back up but at times they get the read errors? Could you paste syslog lines from the time of such error that red-x-ed a drive? Actually I found more log entries for one of the drives that went red x: Oct 7 09:31:30 unNAS SAS Assist v0.6[36184]: spinning down slot 17, device /dev/sdz (/dev/sg26) Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 Sense Key : 0x2 [current] [descriptor] Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 ASC=0x4 ASCQ=0x11 Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 CDB: opcode=0x28 28 00 00 00 00 40 00 00 08 00 Oct 7 09:32:46 unNAS kernel: blk_update_request: I/O error, dev sdz, sector 64 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Oct 7 09:32:46 unNAS kernel: md: disk17 read error, sector=0 Oct 7 09:32:46 unNAS kernel: sd 2:0:8:0: Power-on or device reset occurred Oct 7 09:32:51 unNAS kernel: sd 2:0:23:0: [sdz] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Oct 7 09:32:51 unNAS kernel: sd 2:0:23:0: [sdz] tag#8 Sense Key : 0x2 [current] [descriptor] Oct 7 09:32:51 unNAS kernel: sd 2:0:23:0: [sdz] tag#8 ASC=0x4 ASCQ=0x11 Oct 7 09:32:51 unNAS kernel: sd 2:0:23:0: [sdz] tag#8 CDB: opcode=0x2a 2a 00 00 00 00 40 00 00 08 00 Oct 7 09:32:51 unNAS kernel: blk_update_request: I/O error, dev sdz, sector 64 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Oct 7 09:32:51 unNAS kernel: md: disk17 write error, sector=0 Quote Link to comment
doron Posted October 7, 2020 Author Share Posted October 7, 2020 Thanks very much for this. 57 minutes ago, SuperDan said: Oct 7 09:31:30 unNAS SAS Assist v0.6[36184]: spinning down slot 17, device /dev/sdz (/dev/sg26) Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 Sense Key : 0x2 [current] [descriptor] Oct 7 09:32:46 unNAS kernel: sd 2:0:23:0: [sdz] tag#24 ASC=0x4 ASCQ=0x11 Aha. That's the infamous/dreaded 02-04-11, same as we've seen on the Seagate. Will have to exclude this series as well 😞 57 minutes ago, SuperDan said: Oct 7 09:32:46 unNAS kernel: sd 2:0:8:0: Power-on or device reset occurred A different drive? Was something else happening at the same time? Quote Link to comment
SuperDan Posted October 7, 2020 Share Posted October 7, 2020 (edited) 1 hour ago, doron said: Thanks very much for this. Aha. That's the infamous/dreaded 02-04-11, same as we've seen on the Seagate. Will have to exclude this series as well 😞 A different drive? Was something else happening at the same time? Maybe a differnet drive(s) since my cache drives are SATA SSD's. May as well add these drive as well since the are HITACHI HUC106060CSS600 drives rebranded to Netapp drives but still suffer the above problem, NETAPP X422_TAL13600A10 NETAPP X422_HCOBD600A10 Edited October 7, 2020 by SuperDan 1 Quote Link to comment
SuperDan Posted October 7, 2020 Share Posted October 7, 2020 Something I just noticed, the other user having this problem is using an HP controller HP H220 LSI 9205-8i 9207 And I am using a DELL H310 LSI MegaRAID SAS 2008 Maybe proprietary firmware on the HBA may have something to do with it? Just throwing it out there. Quote Link to comment
doron Posted October 7, 2020 Author Share Posted October 7, 2020 1 minute ago, SuperDan said: Something I just noticed, the other user having this problem is using an HP controller HP H220 LSI 9205-8i 9207 And I am using a DELL H310 LSI MegaRAID SAS 2008 Maybe proprietary firmware on the HBA may have something to do with it? Just throwing it out there. Hmm. Certainly a possibility, yes. Currently it's second on my list of potential causes, only because the OEM manual of the Seagate described the drive's behavior in the Standby_Z state (this is what we're using to spin the drive down), and it explicitly stated that the drive will require an init op to spin up again. This is in contrast with the HGST/WD SAS drives, which (explicitly) specify that in the same state the drive is ready to spin back up on next I/O. The manual for your Hitachi is silent about this - does not say either - but the 04-11 sense makes me think it does it like the Seagate. You'd think that standards would be standards, and the drive behavior following a standard SAS/SCSI operation would be standard... go figure. It could still be the controller, the above is just my current thinking. Quote Link to comment
SuperDan Posted October 7, 2020 Share Posted October 7, 2020 30 minutes ago, doron said: Hmm. Certainly a possibility, yes. Currently it's second on my list of potential causes, only because the OEM manual of the Seagate described the drive's behavior in the Standby_Z state (this is what we're using to spin the drive down), and it explicitly stated that the drive will require an init op to spin up again. This is in contrast with the HGST/WD SAS drives, which (explicitly) specify that in the same state the drive is ready to spin back up on next I/O. The manual for your Hitachi is silent about this - does not say either - but the 04-11 sense makes me think it does it like the Seagate. You'd think that standards would be standards, and the drive behavior following a standard SAS/SCSI operation would be standard... go figure. It could still be the controller, the above is just my current thinking. Hmm, this has given me the motivation I needed to replace the DELL controller with one that is flashed with the LSI firmware. Been meaning to do it for a while now. I just ordered it and when I install it Ill report back if anything changed. 1 Quote Link to comment
SimonF Posted October 9, 2020 Share Posted October 9, 2020 [1000:0064] 07:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02) [10:0:0:0] disk ATA ST3320310CS SC14 /dev/sdl 320GB [10:0:1:0] disk SEAGATE ST4000NM0023 XMGJ /dev/sdm 4.00TB testing with SEAGATE drive works as expected so far, but I have only added as a standalone second pool drive. Spins up when accessed. will test on the other controller below. in my system over the weekend and may add into pool as the parity. [1000:0086] 02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05) [3:0:0:0] disk HGST HUS724030ALS640 A1C4 /dev/sdd 3.00TB [3:0:1:0] disk HITACHI HMRSK2000GBAS07K 3P02 /dev/sde 2.00TB [3:0:2:0] disk HGST HUS724030ALS640 A1C4 /dev/sdf 3.00TB [3:0:3:0] disk HGST HUS724030ALS640 A1C4 /dev/sdh 3.00TB [3:0:4:0] disk HGST HUS724030ALS640 A1C4 /dev/sdj 3.00TB [3:0:5:0] disk HITACHI HMRSK2000GBAS07K 3P02 /dev/sdk 2.00TB Both controllers are in IT Mode on P20. Quote Link to comment
doron Posted October 9, 2020 Author Share Posted October 9, 2020 (edited) 2 hours ago, SimonF said: [10:0:1:0] disk SEAGATE ST4000NM0023 XMGJ /dev/sdm 4.00TB testing with SEAGATE drive works as expected so far, but I have only added as a standalone second pool drive. Spins up when accessed. That's very interesting. Could it be that it just takes very long to spin back up? I'm trying to find a plausible explanation as to why the same drive works as expected in your case but failed in the previous poster's case. I guess the next test needs to be in the array. Or, it might, after all, be the controller. Or the firmware version, although I'd really doubt that. I was just about to push out a new version of the plugin with some changes and improvements, and with a mechanism to filter out certain HDDs by name/vendor. I'll hold off until we have some more insight. Edited October 9, 2020 by doron Quote Link to comment
Spazhead Posted October 10, 2020 Share Posted October 10, 2020 FYI not working for my 7 Seagate ST4000NM0023 on Dell H200 IT mode hoping for a solution... great job thanks Quote Link to comment
doron Posted October 10, 2020 Author Share Posted October 10, 2020 26 minutes ago, Spazhead said: FYI not working for my 7 Seagate ST4000NM0023 on Dell H200 IT mode Thanks for reporting. Can you elaborate on "not working" in this case - are you seeing i/o errors? Did you get a red x? etc. Can you share the drive f/w version? We're seeing different reports wrt this drive so trying to dig further in (f/w, controller etc.). Quote Link to comment
SuperDan Posted October 10, 2020 Share Posted October 10, 2020 (edited) Just a follow up. I replaced the DELL H310 LSI MegaRAID SAS 2008 controller with a DELL H710 LSI 9207-8i P20 flashed with LSI IT mode firmware version P20 (20.00.07.00). After 18 hours of running and doing some other tests the drives spin up/down as they should and no errors or red x drives. So, at least in my case replacing the controller fixed it. Edited October 10, 2020 by SuperDan Quote Link to comment
SimonF Posted October 10, 2020 Share Posted October 10, 2020 (edited) 50 minutes ago, SuperDan said: controller fixed it Thanks for the update please feedback if any errors happen in the future. Are all the drives you are using HITACHI Testing with SEAGATE as parity drive performs as expected with Manual spin down via command on 9201-16e will test Serial Attached SCSI controller" "Broadcom / LSI" "SAS2308 PCI-Express Fusion-MPT SAS-2" -r05 "Super Micro Computer Inc" "Onboard SAS2308 PCI-Express Fusion-MPT SAS-2 tomorrow Edited October 10, 2020 by SimonF Additional Info Quote Link to comment
doron Posted October 10, 2020 Author Share Posted October 10, 2020 51 minutes ago, SuperDan said: Just a follow up. I replaced the DELL H310 LSI MegaRAID SAS 2008 controller with a DELL H710 LSI 9207-8i P20 flashed with LSI IT mode firmware version P20 (20.00.07.00). After 18 hours of running and doing some other tests the drives spin up/down as they should and no errors or red x drives. So, at least in my case replacing the controller fixed it. Thanks - great stuff! Other reports also begin to confirm that this is probably controller dependent. Do you happen to have the PCI IDs of these two? For the running one, you can obtain it via "lspci -vnnmm". For the one you removed - well, if it is connected to another machine... Quote Link to comment
SuperDan Posted October 10, 2020 Share Posted October 10, 2020 8 minutes ago, SimonF said: Thanks for the update please feedback if any errors happen in the future. Are all the drives you are using HITACHI Testing with SEAGATE as parity drive performs as expected with Manual spin down via command. Will do. Yes, all of the SAS drives are HITACHI. Quote Link to comment
SuperDan Posted October 10, 2020 Share Posted October 10, 2020 6 minutes ago, doron said: Thanks - great stuff! Other reports also begin to confirm that this is probably controller dependent. Do you happen to have the PCI IDs of these two? For the running one, you can obtain it via "lspci -vnnmm". For the one you removed - well, if it is connected to another machine... Sorry, I did not record the ID for the one I removed. Unfortunately these are the Mini Mono cards that require a proprietary PCI storage slot on Dell servers and I do not have another server I can plug the old one into. output of lspci -vnnmm Slot: 03:00.0 Class: Serial Attached SCSI controller [0107] Vendor: Broadcom / LSI [1000] Device: SAS2308 PCI-Express Fusion-MPT SAS-2 [0087] SVendor: Dell [1028] SDevice: SAS2308 PCI-Express Fusion-MPT SAS-2 [1f38] Rev: 05 NUMANode: 0 IOMMUGroup: 18 1 Quote Link to comment
Spazhead Posted October 10, 2020 Share Posted October 10, 2020 9 hours ago, doron said: Thanks for reporting. Can you elaborate on "not working" in this case - are you seeing i/o errors? Did you get a red x? etc. Can you share the drive f/w version? We're seeing different reports wrt this drive so trying to dig further in (f/w, controller etc.). i guess it just doesn't sleep log error Oct 10 04:29:19 TOWER kernel: md: do_drive_cmd: disk13: ATA_OP e0 ioctl error: -5 Oct 10 04:29:19 TOWER emhttpd: error: mdcmd, 2723: Input/output error (5): write drives still work Quote Link to comment
doron Posted October 10, 2020 Author Share Posted October 10, 2020 2 minutes ago, SuperDan said: Sorry, I did not record the ID for the one I removed. Unfortunately these are the Mini Mono cards that require a proprietary PCI storage slot on Dell servers and I do not have another server I can plug the old one into. Yes, I know these Dell Minis all too well. No worries - you are providing really helpful data. I'm going to assume the old one is 1000:0073 / 1028:1f51 . If anyone can corroborate or correct (Dell H310 Mini Mono) - please do. Quote Link to comment
SimonF Posted October 11, 2020 Share Posted October 11, 2020 SEAGATE Disk works as expected on both controllers Class: Serial Attached SCSI controller [0107] Vendor: Broadcom / LSI [1000] Device: SAS2308 PCI-Express Fusion-MPT SAS-2 [0086] SVendor: Super Micro Computer Inc [15d9] SDevice: Onboard SAS2308 PCI-Express Fusion-MPT SAS-2 [0691] Rev: 05 and Class: Serial Attached SCSI controller [0107] Vendor: Broadcom / LSI [1000] Device: SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] [0064] SVendor: Broadcom / LSI [1000] SDevice: 9201-16e 6Gb/s SAS/SATA PCIe x8 External HBA [30d0] Rev: 02 Quote Link to comment
jenga201 Posted October 12, 2020 Share Posted October 12, 2020 (edited) Thanks for making this a plugin. Hope I can help with some diagnostics. HUS724030ALS640 Fail with *Temp and drive not spun down. Still operational, seemingly no effect. Getting read errors after a reboot. Oct 11 20:38:24 Beast kernel: mdcmd (47): spindown 7 Oct 11 20:38:24 Beast kernel: md: do_drive_cmd: disk7: ATA_OP e0 ioctl error: -5 Oct 11 20:38:24 Beast emhttpd: error: mdcmd, 2723: Input/output error (5): write Oct 11 20:38:24 Beast SAS Assist v0.6[27532]: spinning down slot 7, device /dev/sde (/dev/sg4) Slot: 06:00.0 Class: Serial Attached SCSI controller [0107] Vendor: Broadcom / LSI [1000] Device: SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] [0064] SVendor: Broadcom / LSI [1000] SDevice: SAS 9201-16i [30c0] Rev: 02 NUMANode: 0 Edited October 12, 2020 by jenga201 Quote Link to comment
doron Posted October 12, 2020 Author Share Posted October 12, 2020 Thanks for taking the time. 13 hours ago, jenga201 said: Thanks for making this a plugin. Hope I can help with some diagnostics. HUS724030ALS640 Fail with *Temp and drive not spun down. Still operational, seemingly no effect. Thanks. How did you determine that the drive is not spun down? 13 hours ago, jenga201 said: Getting read errors after a reboot. Oct 11 20:38:24 Beast kernel: mdcmd (47): spindown 7 Oct 11 20:38:24 Beast kernel: md: do_drive_cmd: disk7: ATA_OP e0 ioctl error: -5 Oct 11 20:38:24 Beast emhttpd: error: mdcmd, 2723: Input/output error (5): write Oct 11 20:38:24 Beast SAS Assist v0.6[27532]: spinning down slot 7, device /dev/sde (/dev/sg4) Slot: 06:00.0 Class: Serial Attached SCSI controller [0107] Vendor: Broadcom / LSI [1000] Device: SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] [0064] SVendor: Broadcom / LSI [1000] SDevice: SAS 9201-16i [30c0] Rev: 02 NUMANode: 0 These two messages (OP e0 error and mdcmd error (5)) are not really read errors. They are the a the drive/controller's response to the ATA ops sent by Unraid, and are not an indication of a problem. I'm assuming you have been receiving them prior to installing the plugin (and keep receiving them if you remove the plugin) - please correct me if I'm wrong. The next version of the plugin (not pushed out yet) includes a feature that just filters out these syslog messages. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.