Titus Posted October 30, 2020 Share Posted October 30, 2020 23 hours ago, doron said: Thanks for reporting that. Have you at any time tried the manual command to spin a drive down? Such as sg_start -r --pc=3 /dev/sdX I wonder whether it gets stuck the same way, and then gets a "task abort" a bit later. I haven't seen similar reports up until now. killed one of my drives Quote Link to comment
doron Posted October 30, 2020 Author Share Posted October 30, 2020 killed one of my drivesDefine "killed"? Quote Link to comment
Titus Posted October 30, 2020 Share Posted October 30, 2020 cannot restart and will not restart even on reboot of server Quote Link to comment
Titus Posted October 30, 2020 Share Posted October 30, 2020 (edited) 2020-10-30 09:34Unraid Disk 10 errorAlert [123456] - Disk 10 in error state (disk dsbl) (sdm)alert will no come back online Edited October 30, 2020 by Titus Quote Link to comment
odirneto Posted October 31, 2020 Share Posted October 31, 2020 (edited) For me not working as well. Instaled the plug-in under CA app. Green signal on, error on log and drive does not spun down. Also, when put automatic spun down, i got spammed in log. Model: HUS723030ALS640_YVJR6J5K Edited October 31, 2020 by odirneto Quote Link to comment
doron Posted October 31, 2020 Author Share Posted October 31, 2020 23 hours ago, Titus said: 2020-10-30 09:34Unraid Disk 10 errorAlert [123456] - Disk 10 in error state (disk dsbl) (sdm)alert will no come back online When a drive is marked disabled in the array (gets the red x), it will not come back up automatically. It needs to be rebuilt; there are a few guides as to how to do that. However the drive is not physically disabled - it is probably in good shape - just needs to be reintroduced into the array and rebuilt. Quote Link to comment
doron Posted October 31, 2020 Author Share Posted October 31, 2020 13 hours ago, odirneto said: i got spammed in log. Model: HUS723030ALS640_YVJR6J5K Can you share a sample of the spam you received in the log? Quote Link to comment
odirneto Posted November 3, 2020 Share Posted November 3, 2020 On 10/31/2020 at 2:02 PM, doron said: Can you share a sample of the spam you received in the log? Is the same mensage from tryng to spun down, but a lot of times in a row Quote Link to comment
doron Posted November 4, 2020 Author Share Posted November 4, 2020 (edited) On 11/3/2020 at 8:04 PM, odirneto said: Is the same mensage from tryng to spun down, but a lot of times in a row This is weird - unless this reflects your pushing the green buttons for disk 2 and 3 repeatedly several times in quick succession. Is this what happens? BTW these messages are generated due to Unraid trying ATA spindown commands against a SAS drives. The next version of the plugin (I've been sitting on it for a while, hoping to collect more data on drive/controller combos with which there's failures) filters these messages out from your log. Edited November 4, 2020 by doron Quote Link to comment
Golfonauta Posted November 15, 2020 Share Posted November 15, 2020 (edited) My experience in my specific array is very good. Running 0.6 without errors in any drives and when I clic on the drives I notice the slow spin up to read the directories, so it is really working well. I have been moving files and even deleting folders, no errors in the drives. My main issue is that even with that feature enabled my shelve seem to be still power hungry... but I will need to investigate more. I'm using this PCI card: [1000:0087] 01:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05) with an EMC shelve from a VNX 5200 and the original 15 hitachi sas drives that came with it: [7:0:0:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdd 3.00TB [7:0:1:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdf 3.00TB [7:0:2:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdg 3.00TB [7:0:3:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdh 3.00TB [7:0:4:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdi 3.00TB [7:0:5:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdk 3.00TB [7:0:6:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdl 3.00TB [7:0:7:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdm 3.00TB [7:0:8:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdn 3.00TB [7:0:9:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdo 3.00TB [7:0:10:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdp 3.00TB [7:0:11:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdq 3.00TB [7:0:12:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdr 3.00TB [7:0:13:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sds 3.00TB [7:0:14:0]disk HITACHI HUS72303CLAR3000 C442 /dev/sdt 3.00TB aside from other 3 sata drives connected to the booard integrated controller Processor is a Xeon E3-1200 in an Intel motherboard Let me know if I can help by posting any more details. Edited November 15, 2020 by Golfonauta typo 1 Quote Link to comment
stigs Posted November 16, 2020 Share Posted November 16, 2020 After installing the plugin I tried spinning down drives and nothing seemed to be different from before the plugin was installed, i.e. Normal errors and drives not actually spinning down. I uninstalled and reinstalled (as well as a reboot) but the plugin still doesn't seem to be working at all. There are no SAS Assist entries in syslog on either manual spin-down or Unraid's spin-down, Only the normal barrage of errors. One thing to note is that command line spin-down of the drives is working using: sg_start -r --pc=3 /dev/sdX and starting back up again with: sg_start -r --pc=0 /dev/sdX So I'm sure the issue is somewhere on the OS side and not the communication with the disks or the disks themselves. I cannot think of a reason that the plugin wouldn't even show in any of the logs like all other logs I've seen in this thread. I have done a couple migrations in the life of my install including about a week ago after the server had been off for about 150 days (had to update everything recently) if that is of any relevance. I can confirm the install of the plugin. Please let me know of any other information that might be helpful! Quote Link to comment
doron Posted November 16, 2020 Author Share Posted November 16, 2020 4 hours ago, stigs said: There are no SAS Assist entries in syslog on either manual spin-down or Unraid's spin-down, Only the normal barrage of errors. Thanks for reporting this. The next 0.7 version should(...) address your case. I'll hopefully push in within the next couple days. When you update to it, please report again. Quote Link to comment
doron Posted November 18, 2020 Author Share Posted November 18, 2020 Just pushed out version 0.7 of the plugin. There are many changes, a few notable ones listed below. The main method of spinning a SAS drive down remains the same - meaning, that if you had issues (or worse, red x's) following spindown attempts in previous versions, there's a good chance this version will not improve this particular situation, so please test with care. - Adapt the syslog hook to various Unraid configs (between Unraid and Dynamix there are several different forms of syslog configs, which vary among them if you config syslog settings, so there's now a mechanism that will reconfigure the hook per different situations and will dynamically respond to changes in settings). @stigs, I'm guessing this might address your issue as well. - Filter out syslog lines (aka "spam"...) from some SAS devices rejecting ATA standby op (e0) - Introduce an exclusion list, which should gradually contain drive/controller combinations that are known to not respond favorably to spindown command - More consistent log messages and tags - Add new debug and testing tools - Many other changes, major code reorg Enjoy! Please report issues (or success). 1 Quote Link to comment
clincher Posted November 18, 2020 Share Posted November 18, 2020 37 minutes ago, doron said: Just pushed out version 0.7 of the plugin. Woohoo! Finally getting syslog hook messages and it's working. Nov 18 09:22:25 teraserver kernel: mdcmd (34010): spindown 7 Nov 18 09:22:28 teraserver SAS Assist v0.7: spinning down slot 7, device /dev/sdh (/dev/sg7) The green dot doesn't turn gray, but checking the disk status from the console, it's showing spun down. root@teraserver:~# sdparm --command=sense /dev/sdh /dev/sdh: HGST HUH721010AL4200 A21D Additional sense: Standby condition activated by command Quote Link to comment
stigs Posted November 18, 2020 Share Posted November 18, 2020 (edited) 1 hour ago, doron said: Just pushed out version 0.7 of the plugin. 1 hour ago, doron said: @stigs, I'm guessing this might address your issue as well. Good work!! I can report all is well after updating. No more log spam: CHECK SAS ASSIST log entries showing: CHECK Drives spin down/up properly: CHECK Thanks a lot, this plugin fixes all my issues and will help save me hundreds of dollars this year undoubtedly. I'll keep you updated if any more issues arise. H7230AS60SUN3.0T Success ST33000SSSUN3.0T Success Edited November 18, 2020 by stigs Quote Link to comment
jowe Posted November 18, 2020 Share Posted November 18, 2020 Really good work @doron! Just tried the commands against my SAS drives after upgrading firmware of the SAS controller. But "sg_start --readonly --pc=3 /dev/sgX" disables the drive, and i need a reboot to get it back. However, the plugin loops the "drive not supported" messages see below. Nov 18 20:13:45 Tower kernel: mdcmd (931): spindown 3 Nov 18 20:13:45 Tower kernel: mdcmd (932): spindown 4 Nov 18 20:13:45 Tower SAS Assist v0.7: disk 3 (/dev/sdh) not supported by SAS spindown plugin (excluded), not spun down Nov 18 20:13:46 Tower SAS Assist v0.7: disk 4 (/dev/sdi) not supported by SAS spindown plugin (excluded), not spun down Nov 18 20:13:46 Tower kernel: mdcmd (933): spindown 3 Nov 18 20:13:46 Tower kernel: mdcmd (934): spindown 4 Nov 18 20:13:47 Tower SAS Assist v0.7: disk 3 (/dev/sdh) not supported by SAS spindown plugin (excluded), not spun down Nov 18 20:13:47 Tower SAS Assist v0.7: disk 4 (/dev/sdi) not supported by SAS spindown plugin (excluded), not spun down Nov 18 20:13:47 Tower kernel: mdcmd (935): spindown 3 Nov 18 20:13:48 Tower kernel: mdcmd (936): spindown 4 Nov 18 20:13:48 Tower SAS Assist v0.7: disk 3 (/dev/sdh) not supported by SAS spindown plugin (excluded), not spun down Nov 18 20:13:48 Tower SAS Assist v0.7: disk 4 (/dev/sdi) not supported by SAS spindown plugin (excluded), not spun down Quote Link to comment
doron Posted November 18, 2020 Author Share Posted November 18, 2020 (edited) 6 minutes ago, jowe said: However, the plugin loops the "drive not supported" messages see below. Okay that's actually a feature. Since your drives clearly can't be spun down properly, the plugin avoids them (they are on the exclude list). If it can't help, at least it avoids collateral damage... Now, when you say "loops" - is there actually an endless loop of "spindown 3" - "spindown 4" - "spindown 3" - "spindown 4" or is this a result of your hitting the green button a few times in a row? Actually, if all your SAS drives are on the exclude list, there's unfortunately little point for you to run this plugin at this time 😞 Edited November 18, 2020 by doron Quote Link to comment
jowe Posted November 18, 2020 Share Posted November 18, 2020 10 minutes ago, doron said: Okay that's actually a feature. Since your drives clearly can't be spun down properly, the plugin avoids them (they are on the exclude list). If it can't help, at least it avoids collateral damage... Yes I understand that the message should display, when timer reaches for example 15min. Or hitting spin down button. 12 minutes ago, doron said: Now, when you say "loops" - is there actually an endless loop of "spindown 3" - "spindown 4" - "spindown 3" - "spindown 4" or is this a result of your hitting the green button a few times in a row? I did not push the button at all, just waited 15min, and the message loops every second. I could have provided a really long list with same messages. 14 minutes ago, doron said: Actually, if all your SAS drives are on the exclude list, there's unfortunately little point for you to run this plugin at this time 😞 Yes I know, just wanted to give it a go with the latest FW, but unfortunately it didn't work. Nevertheless it's a great project! Quote Link to comment
doron Posted November 18, 2020 Author Share Posted November 18, 2020 (edited) 22 minutes ago, jowe said: I did not push the button at all, just waited 15min, and the message loops every second. I could have provided a really long list with same messages. So this happens with the plugin installed; and when you remove the plugin - those "spindown 3"/"spindown 4" messages do not appear (or at least not in quick succession)?! If so, this is puzzling and I would like to try to get to the bottom of it. EDIT: I seem to be unable to reproduce it, and also can't see right now how the plugin will cause repeated "mdcmd spindown" to happen. Edited November 18, 2020 by doron Quote Link to comment
jowe Posted November 19, 2020 Share Posted November 19, 2020 13 hours ago, doron said: So this happens with the plugin installed; and when you remove the plugin - those "spindown 3"/"spindown 4" messages do not appear (or at least not in quick succession)?! If so, this is puzzling and I would like to try to get to the bottom of it. EDIT: I seem to be unable to reproduce it, and also can't see right now how the plugin will cause repeated "mdcmd spindown" to happen. I get the errors instead if i remove the plugin, so its more of an unraid thing, it tries to spin down the drive again and again... Not a problem for me as i have the disks to never spin down now. Quote Link to comment
doron Posted November 19, 2020 Author Share Posted November 19, 2020 1 minute ago, jowe said: I get the errors instead if i remove the plugin, so its more of an unraid thing, it tries to spin down the drive again and again... Not a problem for me as i have the disks to never spin down now. Thanks for confirming. (still a weird thing that I have never seen elsewhere but...) 1 Quote Link to comment
limetech Posted November 21, 2020 Share Posted November 21, 2020 Hi @doron - thank you for putting this plugin together. I wanted to give you an outline of some changes in next 6.9-beta release that will make implementing SAS spin down a little easier, or at least make experimentation easier. What I did was rip out all the spin up/down logic in the md/unraid driver. Instead all spin up/down handling will be done by user-space emhttpd process. When emhttpd needs to perform a spin up, spin down, or read SMART attributes, it will invoke a script named after the operation to perform and device transport. These scripts are located in /usr/local/sbin and named as follows: emhttp_device_<transport>_<operation> These scripts are provided: emhttp_device_ata_spinup emhttp_device_ata_spindown emhttp_device_ata_smart The scripts are passed the device name, such as "sdb", and in the case of "smart" operation, an optional parameter (unused at the moment). For example, suppose emhttpd decides to spin down disk1, where disk1 corresponds to device /dev/sdb. In this case emhttpd will invoke: /usr/local/sbin/emhttp_device_ata_spindown sdb This command is actually executed in the background. At present there is no code which actually verifies that the operation succeeded. Here is the content of above script: #!/bin/bash # $1 device name, eg, "sdb" /usr/sbin/hdparm -y /dev/$1 Pretty simple. To use with SAS your plugin would install these three scripts: emhttp_device_scsi_spinup emhttp_device_scsi_spindown emhttp_device_scsi_smart I expect to be publishing 6.9.0-beta36 "soon". Since the 'mdcmd' error messages are no longer present, your current plugin won't work. Quote Link to comment
doron Posted November 21, 2020 Author Share Posted November 21, 2020 Hi @limetech - thanks for the heads up! (hehe I was just reading that other thread when your message came in). This sounds cool and exactly the right approach (the plugin will shrink to a few lines but hey - it was supposed to be a temporary stopgap anyway). Getting rid of the syslog dependency would be a blessing (btw I bumped into a few issues with Unraid's handling of rsyslog config but will deal with it in a separate thread - the plugin has an elaborate work around). One question: Where exactly is the value of <transport> derived from for this exercise? Thanks again for doing this. Quote Link to comment
limetech Posted November 21, 2020 Share Posted November 21, 2020 9 minutes ago, doron said: One question: Where exactly is the value of <transport> derived from for this exercise? I looks at entries in /dev/disk/by-id which correspond to the device name. Each entry has a prefix, e.g., "ata-" or "nvme-", etc. For SAS it should be "scsi-". That prefix, with trailing '-' removed, is <transport>. Quote Link to comment
doron Posted November 21, 2020 Author Share Posted November 21, 2020 7 minutes ago, limetech said: I looks at entries in /dev/disk/by-id which correspond to the device name. Each entry has a prefix, e.g., "ata-" or "nvme-", etc. For SAS it should be "scsi-". That prefix, with trailing '-' removed, is <transport>. Thanks. I can work with that. Note btw that in this schema, all SAS drives will be "scsi" but not all "scsi" will be SAS. The only dependable way I found to pinpoint a SAS drive is via smartctl -i, parsing out "Transport protocol" - a field which is returned only for SAS drives (an example pasted below). Found nothing similar in neither /sys nor /dev . I'll use that as a filter in the script, ergo "can work with that". smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: HGST Product: HUH721212AL4200 Revision: A3D0 Compliance: SPC-4 User Capacity: 12,000,138,625,024 bytes [12.0 TB] Logical block size: 4096 bytes LU is fully provisioned Rotation Rate: 7200 rpm Form Factor: 3.5 inches Logical Unit id: 0x5000cca2708b9bf8 Serial number: xxxxxxxx Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Sun Nov 22 00:44:17 2020 IST SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.