Jump to content

doron

Members
  • Content Count

    366
  • Joined

  • Last visited

  • Days Won

    3

doron last won the day on September 20

doron had the most liked content!

Community Reputation

50 Good

1 Follower

About doron

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Could be some other plugins did. It was done out of necessity...
  2. Perfect. Thx. Indeed. BTW my plugin installs a wrapper for smartctl which reinstates the "-n standby" thingie (until smartmontools.org fix it).
  3. Well, this drive is a SATA SSD: /dev/disk/by-id/scsi-1ATA_TS120GSSD220S_<redacted> It is under ESXi though (RDM), so you may think of it as an exception. It does. Thanks. Sure, here you go. This is with -A: smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === Current Drive Temperature: 26 C Drive Trip Temperature: 85 C Manufactured in week 36 of year 2018 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 739 Specified load-unload count over device lifetime: 600000 Accumulated load-unload cycles: 16374 Elements in grown defect list: 0 And this is with -a (little a): smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: HGST Product: HUH721212AL4200 Revision: A3D0 Compliance: SPC-4 User Capacity: 12,000,138,625,024 bytes [12.0 TB] Logical block size: 4096 bytes LU is fully provisioned Rotation Rate: 7200 rpm Form Factor: 3.5 inches Logical Unit id: 0x5000cca2708b9bf8 Serial number: <redacted> Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Sun Nov 22 02:14:43 2020 IST SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK Grown defects during certification <not available> Total blocks reassigned during format <not available> Total new blocks reassigned <not available> Power on minutes since format <not available> Current Drive Temperature: 28 C Drive Trip Temperature: 85 C Manufactured in week 36 of year 2018 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 740 Specified load-unload count over device lifetime: 600000 Accumulated load-unload cycles: 16375 Elements in grown defect list: 0 Vendor (Seagate Cache) information Blocks sent to initiator = 768214057353216 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 0 0 0 60350 60070.756 0 write: 0 0 0 0 7 24002.699 0 verify: 0 0 0 0 811 0.000 0 Non-medium error count: 0 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Completed - 148 - [- - -] # 2 Background short Completed - 128 - [- - -] # 3 Background short Completed - 22 - [- - -] Long (extended) Self-test duration: 65535 seconds [1092.2 minutes] Note that both spin the drive up.
  4. Thanks. I can work with that. Note btw that in this schema, all SAS drives will be "scsi" but not all "scsi" will be SAS. The only dependable way I found to pinpoint a SAS drive is via smartctl -i, parsing out "Transport protocol" - a field which is returned only for SAS drives (an example pasted below). Found nothing similar in neither /sys nor /dev . I'll use that as a filter in the script, ergo "can work with that". smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: HGST Product: HUH721212AL4200 Revision: A3D0 Compliance: SPC-4 User Capacity: 12,000,138,625,024 bytes [12.0 TB] Logical block size: 4096 bytes LU is fully provisioned Rotation Rate: 7200 rpm Form Factor: 3.5 inches Logical Unit id: 0x5000cca2708b9bf8 Serial number: xxxxxxxx Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Sun Nov 22 00:44:17 2020 IST SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled
  5. Hi @limetech - thanks for the heads up! (hehe I was just reading that other thread when your message came in). This sounds cool and exactly the right approach (the plugin will shrink to a few lines but hey - it was supposed to be a temporary stopgap anyway). Getting rid of the syslog dependency would be a blessing (btw I bumped into a few issues with Unraid's handling of rsyslog config but will deal with it in a separate thread - the plugin has an elaborate work around). One question: Where exactly is the value of <transport> derived from for this exercise? Thanks again for doing this.
  6. Thanks for confirming. (still a weird thing that I have never seen elsewhere but...)
  7. So this happens with the plugin installed; and when you remove the plugin - those "spindown 3"/"spindown 4" messages do not appear (or at least not in quick succession)?! If so, this is puzzling and I would like to try to get to the bottom of it. EDIT: I seem to be unable to reproduce it, and also can't see right now how the plugin will cause repeated "mdcmd spindown" to happen.
  8. Okay that's actually a feature. Since your drives clearly can't be spun down properly, the plugin avoids them (they are on the exclude list). If it can't help, at least it avoids collateral damage... Now, when you say "loops" - is there actually an endless loop of "spindown 3" - "spindown 4" - "spindown 3" - "spindown 4" or is this a result of your hitting the green button a few times in a row? Actually, if all your SAS drives are on the exclude list, there's unfortunately little point for you to run this plugin at this time 😞
  9. Just pushed out version 0.7 of the plugin. There are many changes, a few notable ones listed below. The main method of spinning a SAS drive down remains the same - meaning, that if you had issues (or worse, red x's) following spindown attempts in previous versions, there's a good chance this version will not improve this particular situation, so please test with care. - Adapt the syslog hook to various Unraid configs (between Unraid and Dynamix there are several different forms of syslog configs, which vary among them if you config syslog settings, so there's now a mechanism that will reconfigure the hook per different situations and will dynamically respond to changes in settings). @stigs, I'm guessing this might address your issue as well. - Filter out syslog lines (aka "spam"...) from some SAS devices rejecting ATA standby op (e0) - Introduce an exclusion list, which should gradually contain drive/controller combinations that are known to not respond favorably to spindown command - More consistent log messages and tags - Add new debug and testing tools - Many other changes, major code reorg Enjoy! Please report issues (or success).
  10. Thanks for reporting this. The next 0.7 version should(...) address your case. I'll hopefully push in within the next couple days. When you update to it, please report again.
  11. This is weird - unless this reflects your pushing the green buttons for disk 2 and 3 repeatedly several times in quick succession. Is this what happens? BTW these messages are generated due to Unraid trying ATA spindown commands against a SAS drives. The next version of the plugin (I've been sitting on it for a while, hoping to collect more data on drive/controller combos with which there's failures) filters these messages out from your log.
  12. Can you share a sample of the spam you received in the log?
  13. When a drive is marked disabled in the array (gets the red x), it will not come back up automatically. It needs to be rebuilt; there are a few guides as to how to do that. However the drive is not physically disabled - it is probably in good shape - just needs to be reintroduced into the array and rebuilt.
  14. Thanks for reporting that. Have you at any time tried the manual command to spin a drive down? Such as sg_start -r --pc=3 /dev/sdX I wonder whether it gets stuck the same way, and then gets a "task abort" a bit later. I haven't seen similar reports up until now.