This is the WebGUI of my backup NAS:
As you can see it claims all disks are sleeping. But that's not true:
smartctl -n standby -i /dev/sdb smartctl -n standby -i /dev/sdc smartctl -n standby -i /dev/sdd smartctl -n standby -i /dev/sdf smartctl -n standby -i /dev/sdg smartctl -n standby -i /dev/sdh
They return all:
Power mode is: ACTIVE or IDLE
"/dev/sde" (the SSD) is the only sleeping disk:
smartctl -n standby -i /dev/sde smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org Device is in STANDBY mode, exit(2)
Only for testing I executed this command:
mdcmd spindown 1
Now disk 1 sleeps:
smartctl -n standby -i /dev/sdh smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org Device is in STANDBY mode, exit(2)
Spin down delay is set to 30 minutes:
Fun fact:
It worked for 2 weeks without problems until I:
- opened this page: http://tower/Tools/SysDevs
- then this page: http://tower/Tools/HardwareProfile
- clicked on "Show Details", which spun up the disks
- this took a little bit long, so I opened http://tower/Main
- again http://tower/Tools/HardwareProfile
- again clicked on "Show Details" and now I was able to see hardware informations
- as it did not contain the brand of my RAM I executed "dmidecode -t memory", which did not display the brand name, but I found now the RAM part number
- I closed the WebGUI and that's it
It seems that clicking on "Show Details" killed the Cronjob to check the status of the disks?!
Update, 2020-01-25
Now I'm having the same problem on a complete different Unraid server, too.
A manual execution of the smartctl command (that is used by Unraid to check the standby status), returns the correct value:
"sdg" is in STANDBY mode
root@Thoth:~# smartctl --nocheck standby -A /dev/sdg smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org Device is in STANDBY mode, exit(2)
"sdl" is spinning:
root@Thoth:~# smartctl --nocheck standby -A /dev/sdl smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline - 96 3 Spin_Up_Time 0x0007 159 159 024 Pre-fail Always - 412 (Average 414) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 3195 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 128 128 020 Pre-fail Offline - 18 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 6711 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 86 22 Helium_Level 0x0023 100 100 025 Pre-fail Always - 100 192 Power-Off_Retract_Count 0x0032 095 095 000 Old_age Always - 6414 193 Load_Cycle_Count 0x0012 095 095 000 Old_age Always - 6414 194 Temperature_Celsius 0x0002 157 157 000 Old_age Always - 38 (Min/Max 24/55) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
But the dashboard doesn't show it AND sdl does not spindown anymore altough inactive for more than 30 minutes (my spindown setting):
So I started monitoring smartctl executions by this:
while true; do pid=$(pgrep 'smartctl' | head -1); if [[ -n "$pid" ]]; then ps -p "$pid" -o args && strace -v -t -p "$pid"; fi; done
I tested it by executing smartctl with a different terminal and it catches it directly.
Then I waited a long time and several errors were returned:
Broken Pipe errors (happen often)
strace: Process 8668 attached 13:53:01 ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[16]=[85, 06, 2c, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, e5, 00], mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=60000, flags=0, status=02, masked_status=01, sb[22]=[72, 01, 00, 1d, 00, 00, 00, 0e, 09, 0c, 00, 00, 00, ff, 00, 00, 00, 00, 00, 00, 00, 50], host_status=0, driver_status=0x8, resid=0, duration=0, info=0x1}) = 0 13:53:01 ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[16]=[85, 08, 0e, 00, 00, 00, 01, 00, 00, 00, 00, 00, 00, 00, ec, 00], mx_sb_len=32, iovec_count=0, dxfer_len=512, timeout=60000, flags=0, data[512]=[40, 00, ff, 3f, 37, c8, 10, 00, 00, 00, 00, 00, 3f, 00, 00, 00, 00, 00, 00, 00, 39, 31, 35, 30, 42, 41, 30, 38, 31, 32, 39, 31, ...], status=00, masked_status=00, sb[0]=[], host_status=0, driver_status=0, resid=0, duration=1, info=0}) = 0 13:53:01 brk(0x584000) = 0x584000 13:53:01 brk(0x5a7000) = 0x5a7000 13:53:01 brk(0x5c8000) = 0x5c8000 13:53:01 ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[16]=[85, 08, 0e, 00, d0, 00, 01, 00, 00, 00, 4f, 00, c2, 00, b0, 00], mx_sb_len=32, iovec_count=0, dxfer_len=512, timeout=60000, flags=0, data[512]=[04, 00, 05, 32, 00, 64, 64, 00, 00, 00, 00, 00, 00, 00, 09, 32, 00, 64, 64, 1a, 22, 00, 00, 00, 00, 00, 0c, 32, 00, 64, 64, 7f, ...], status=00, masked_status=00, sb[0]=[], host_status=0, driver_status=0, resid=0, duration=8, info=0}) = 0 13:53:01 ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[16]=[85, 08, 0e, 00, d1, 00, 01, 00, 01, 00, 4f, 00, c2, 00, b0, 00], mx_sb_len=32, iovec_count=0, dxfer_len=512, timeout=60000, flags=0, data[512]=[04, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, ...], status=00, masked_status=00, sb[0]=[], host_status=0, driver_status=0, resid=0, duration=0, info=0}) = 0 13:53:01 write(1, "=== START OF READ SMART DATA SEC"..., 41) = 41 13:53:01 ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[16]=[85, 06, 2c, 00, da, 00, 00, 00, 00, 00, 4f, 00, c2, 00, b0, 00], mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=60000, flags=0, status=02, masked_status=01, sb[22]=[72, 01, 00, 1d, 00, 00, 00, 0e, 09, 0c, 00, 00, 00, 00, 00, 00, 00, 4f, 00, c2, 00, 50], host_status=0, driver_status=0x8, resid=0, duration=8, info=0x1}) = 0 13:53:01 write(1, "SMART overall-health self-assess"..., 57) = 57 13:53:01 write(1, "\n", 1) = -1 EPIPE (Broken pipe) 13:53:01 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=8668, si_uid=0} --- 13:53:01 close(3) = 0 13:53:01 exit_group(0) = ? 13:53:01 +++ exited with 0 +++
strace: Process 20585 attached 14:09:01 brk(0x5c7000) = 0x5c7000 14:09:01 ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[16]=[85, 08, 0e, 00, d0, 00, 01, 00, 00, 00, 4f, 00, c2, 00, b0, 00], mx_sb_len=32, iovec_count=0, dxfer_len=512, timeout=60000, flags=0, data[512]=[10, 00, 01, 0b, 00, 64, 64, 00, 00, 00, 00, 00, 00, 00, 02, 05, 00, 84, 84, 60, 00, 00, 00, 00, 00, 00, 03, 07, 00, a0, a0, 9b, ...], status=00, masked_status=00, sb[0]=[], host_status=0, driver_status=0, resid=0, duration=5, info=0}) = 0 14:09:01 ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[16]=[85, 08, 0e, 00, d1, 00, 01, 00, 01, 00, 4f, 00, c2, 00, b0, 00], mx_sb_len=32, iovec_count=0, dxfer_len=512, timeout=60000, flags=0, data[512]=[10, 00, 01, 10, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 02, 36, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 03, 18, 00, 00, 00, 00, ...], status=00, masked_status=00, sb[0]=[], host_status=0, driver_status=0, resid=0, duration=3, info=0}) = 0 14:09:01 write(1, "=== START OF READ SMART DATA SEC"..., 41) = 41 14:09:01 ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[16]=[85, 06, 2c, 00, da, 00, 00, 00, 00, 00, 4f, 00, c2, 00, b0, 00], mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=60000, flags=0, status=02, masked_status=01, sb[22]=[72, 01, 00, 1d, 00, 00, 00, 0e, 09, 0c, 00, 00, 00, 00, 00, 00, 00, 4f, 00, c2, 00, 50], host_status=0, driver_status=0x8, resid=0, duration=1, info=0x1}) = 0 14:09:01 write(1, "SMART overall-health self-assess"..., 57) = 57 14:09:01 write(1, "\n", 1) = -1 EPIPE (Broken pipe) 14:09:01 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=20585, si_uid=0} --- 14:09:01 close(3) = 0 14:09:01 exit_group(0) = ? 14:09:01 +++ exited with 0 +++
Unable to detect device t (happens every now and then)
strace: Process 3649 attached strace: [ Process PID=3649 runs in x32 mode. ] strace: [ Process PID=3649 runs in 64 bit mode. ] 13:57:01 read(3, "8:0x61b6)\n \"ST(250|320|500|64"..., 4096) = 4096 13:57:01 read(3, " \"-v 12,raw48,Device_Power_Cyc"..., 4096) = 4096 13:57:01 read(3, " 9,minutes\"\n },\n { \"Maxtor Dia"..., 4096) = 4096 13:57:01 read(3, " // it might need"..., 4096) = 4096 13:57:01 read(3, "3[015]\",\n \"\", \"\", \"\"\n },\n {"..., 4096) = 4096 13:57:01 read(3, " \"(Hitachi )?HDT7210((16|25)SL"..., 4096) = 4096 13:57:01 read(3, " },\n { \"Toshiba 2.5\\\" HDD MK..4"..., 4096) = 4096 13:57:01 read(3, "04ACA500/FP1A\n \"TOSHIBA MD04A"..., 4096) = 4096 13:57:01 read(3, " \"\", \"\", \"\"\n },\n { \"Seagate"..., 4096) = 4096 13:57:01 read(3, "\"\", \"\"\n },\n { \"Seagate Barracu"..., 4096) = 4096 13:57:01 read(3, "T3000DM001\",\n \"\", \"\",\n \"-v"..., 4096) = 4096 13:57:01 read(3, "Constellation ES (SATA 6Gb/s)\", "..., 4096) = 4096 13:57:01 read(3, " \"\", \"\", \"\"\n },\n { \"Seagate M"..., 4096) = 4096 13:57:01 read(3, "\"-v 187,raw48,Uncorrectable_ECC_"..., 4096) = 4096 13:57:01 read(3, "bly explained by the WD firmware"..., 4096) = 4096 13:57:01 read(3, "viar Green\", // tested with WDC "..., 4096) = 4096 13:57:01 read(3, "/82.00A82,\n // WDC WD80EFAX"..., 4096) = 4096 13:57:01 read(3, "SB ID entries\n ////////////////"..., 4096) = 4096 13:57:01 read(3, " \"USB: Samsung; \",\n \"0x04e8:0"..., 4096) = 4096 13:57:01 read(3, "at\"\n },\n // Micron\n { \"USB: M"..., 4096) = 4096 13:57:01 read(3, " { \"USB: Maxtor OneTouch 4; \",\n"..., 4096) = 4096 13:57:01 read(3, " \"\", // 0x0114\n \"\", // 0x0"..., 4096) = 4096 13:57:01 read(3, " usbjmicron\"\n },\n { \"USB: Verb"..., 4096) = 3817 13:57:01 read(3, "", 4096) = 0 13:57:01 close(3) = 0 13:57:01 lstat("/dev/", {st_mode=S_IFDIR|0755, st_size=3920, ...}) = 0 13:57:01 write(1, "/dev/: Unable to detect device t"..., 36) = 36 13:57:01 write(1, "Please specify device type with "..., 47) = 47 13:57:01 write(1, "\nUse smartctl -h to get a usage "..., 41) = 41 13:57:01 exit_group(1) = ? 13:57:01 +++ exited with 1 +++
And those spindown / standby checks appeared (shortened)
14:40:01 openat(AT_FDCWD, "/dev/sdb", O_RDONLY|O_NONBLOCK) = 3 14:40:01 write(1, "Device is in STANDBY mode, exit("..., 35) = 35 14:53:44 openat(AT_FDCWD, "/dev/sdg", O_RDONLY|O_NONBLOCK) = 4 14:53:44 write(1, "Device is in STANDBY mode, exit("..., 35) = 35 14:55:01 openat(AT_FDCWD, "/dev/sdj", O_RDONLY|O_NONBLOCK) = 3 14:55:01 write(1, "Device is in STANDBY mode, exit("..., 35) = 35 14:59:01 openat(AT_FDCWD, "/dev/sdd", O_RDONLY|O_NONBLOCK) = 3 14:59:01 write(1, "Device is in STANDBY mode, exit("..., 35) = 35 15:02:01 openat(AT_FDCWD, "/dev/sdk", O_RDONLY|O_NONBLOCK) = 3 15:02:01 write(1, "Device is in STANDBY mode, exit("..., 35) = 35 15:02:01 openat(AT_FDCWD, "/dev/sdh", O_RDONLY|O_NONBLOCK) = 3 15:02:01 write(1, "Device is in STANDBY mode, exit("..., 35) = 35 15:11:01 openat(AT_FDCWD, "/dev/sdg", O_RDONLY|O_NONBLOCK) = 3 15:11:01 write(1, "Device is in STANDBY mode, exit("..., 35) = 35 15:11:01 openat(AT_FDCWD, "/dev/sdi", O_RDONLY|O_NONBLOCK) = 3 15:11:01 write(1, "Device is in STANDBY mode, exit("..., 35) = 35 15:11:01 openat(AT_FDCWD, "/dev/sdb", O_RDONLY|O_NONBLOCK) = 3 15:11:01 write(1, "Device is in STANDBY mode, exit("..., 35) = 35 15:11:01 openat(AT_FDCWD, "/dev/sdf", O_RDONLY|O_NONBLOCK) = 3 15:11:01 write(1, "Device is in STANDBY mode, exit("..., 35) = 35 15:30:01 openat(AT_FDCWD, "/dev/sdd", O_RDONLY|O_NONBLOCK) = 3 15:30:01 write(1, "Device is in STANDBY mode, exit("..., 35) = 35 15:33:01 openat(AT_FDCWD, "/dev/sdk", O_RDONLY|O_NONBLOCK) = 3 15:33:01 write(1, "Device is in STANDBY mode, exit("..., 35) = 35
In the last hour it did not check sdc, sdl and sde. Why?
And why are these checks done in a random order and with a different amount of disks?
Recommended Comments
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.