Jump to content

Array keeps spinning up


Recommended Posts

Hello again guys,

 

I see that I have another problem that is probably not related to my previous one, but perhaps it is connected to my hardware upgrade (new CPU and NVMe as cache instead of SSD SATA) or my update to Unraid version 6.12.4 (which I recently updated directly from 6.11.x). For the record I'm on the same Unraid installation since late 2016 (version 6.2) without any major problem.

 

My 3 hard drives never spin down more than a minute. As soon as they spin down (either after the 45min delay or after I manually spin them down), all 3 of them turn back on after a few seconds. I don't often check my Unraid dashboard (let's say once a week), so I can't precisely say when this started happening.

 

Here are my drives : 

 

image.thumb.png.50f781605cd8a03ba45bf48e1d67fd8d.png

 

My logs are full of this (every time in the same order, sdd > sdb > sdc), I guess the SMART polling is not in cause and occurs only because drives wake up (?) for any reason 

 

Oct  6 10:50:45 unRAID emhttpd: spinning down /dev/sdd
Oct  6 10:50:45 unRAID emhttpd: spinning down /dev/sdb
Oct  6 10:50:45 unRAID emhttpd: spinning down /dev/sdc
Oct  6 10:51:06 unRAID emhttpd: read SMART /dev/sdd
Oct  6 10:51:14 unRAID emhttpd: read SMART /dev/sdb
Oct  6 10:51:22 unRAID emhttpd: read SMART /dev/sdc
Oct  6 10:52:48 unRAID emhttpd: spinning down /dev/sdd
Oct  6 10:52:48 unRAID emhttpd: spinning down /dev/sdb
Oct  6 10:52:48 unRAID emhttpd: spinning down /dev/sdc
Oct  6 10:53:06 unRAID emhttpd: read SMART /dev/sdd
Oct  6 10:53:14 unRAID emhttpd: read SMART /dev/sdb
Oct  6 10:53:22 unRAID emhttpd: read SMART /dev/sdc
Oct  6 10:59:24 unRAID emhttpd: spinning down /dev/sdd
Oct  6 10:59:24 unRAID emhttpd: spinning down /dev/sdb
Oct  6 10:59:24 unRAID emhttpd: spinning down /dev/sdc
Oct  6 11:00:05 unRAID emhttpd: read SMART /dev/sdd
Oct  6 11:00:14 unRAID emhttpd: read SMART /dev/sdb
Oct  6 11:00:21 unRAID emhttpd: read SMART /dev/sdc

 

I have conducted the following tests:

  • I stopped all my Dockers and disabled Docker : no change
  • I deleted my Samba network drives on my Windows machine, just in case : no change
  • I removed the Disk Location plugin : no change
  • I restarted in safe mode (no plugins) and after starting the array : no change
  • I set the spin-down delay to 15min, unplugged my server network cable for 45min and when I plugged it back saw that the problem occurred twice : no change
  • I also have read lots of topics here on the forum and on r/unRAID, I saw few people having problems since the 6.12 but nearly every time with ZFS arrays which I don't have.
  • I installed the Open Files plugin, and the only open files don't seem to be problematic (here with Docker disabled) : 

 

auto_turbo.php     /usr/local/emhttp (working directory)
device_list        /usr/local/emhttp (working directory)
disk_load          /usr/local/emhttp (working directory)
file_manager       /usr/local/emhttp (working directory)
lsof               /usr/local/emhttp (working directory)
lsof               /usr/local/emhttp (working directory)
notify_poller      /usr/local/emhttp (working directory)
parity_list        /usr/local/emhttp (working directory)
php-fpm            /usr/local/emhttp (working directory)
php-fpm            /usr/local/emhttp (working directory)
php-fpm            /usr/local/emhttp (working directory)
php-fpm            /usr/local/emhttp (working directory)
php-fpm            /usr/local/emhttp (working directory)
run_cmd            /usr/local/emhttp (working directory)
session_check      /usr/local/emhttp (working directory)
sh                 /usr/local/emhttp (working directory)
sleep              /usr/local/emhttp (working directory)
tail               /usr/local/emhttp (working directory)
ttyd               /usr/local/emhttp (working directory)
ttyd               /usr/local/emhttp (working directory)
unraid-api         /usr/local/bin/unraid-api (working directory)
update_1           /usr/local/emhttp (working directory)
update_2           /usr/local/emhttp (working directory)
update_3           /usr/local/emhttp (working directory)
wg_poller          /usr/local/emhttp (working directory)

 

Here are my general disk settings : 

 

image.thumb.png.d314943b25683e50e0be49623639cbb6.png

 

Any ideas?

unraid-diagnostics-20231006-1855.zip

Link to comment

A common reason is having duplicate files on multiple disks of the same share name.   These shares are configured to write directly to the array and if you have any duplicates in there, it will spin up all drives.

 

A------s                          shareUseCache="no"      # Share exists on disk1, disk2
B-----s                           shareUseCache="no"      # Share exists on disk1, disk2
E----s                            shareUseCache="no"      # Share exists on disk1, disk2
H--------a                        shareUseCache="no"      # Share exists on disk1, disk2
M-----e                           shareUseCache="no"      # Share exists on disk1, disk2
P-------s                         shareUseCache="no"      # Share exists on disk1, disk2
P----s                            shareUseCache="no"      # Share exists on disk1, disk2
V----s                            shareUseCache="no"      # Share exists on disk1, disk2

 

 

Link to comment

Good point! I had never thought of that, but it now seems logical. I guess I always assumed that Unraid would handle that automatically for me.

 

Since I'm a bit tight on disk space right now (I only have ~5% free space left, so I'll have to add a disk), I was only able to consolidate a few shares onto a single disk, reducing from 8 to 4 shares on two disks :

 

H--------a                        shareUseCache="no"      # Share exists on disk1, disk2
M-----e                           shareUseCache="no"      # Share exists on disk1, disk2
P----s                            shareUseCache="no"      # Share exists on disk1, disk2
V----s                            shareUseCache="no"      # Share exists on disk1, disk2

 

Just for your information, I used the unBalance plugin and then modified the "Included disk(s)" value for each of those shares. By the way, I ran a "Docker Safe New Permissions" on all my shares following the error message I received when using unBalance.

 

I imagine that all of this probably has no connection to my issue, as nothing seems to be accessing my shares according to the tests I've conducted before. However, I agree that it would be progress to see only one disk wake up instead of three!

 

In the meantime, the problem still persists. I unplugged my unused SATA SSD just in case, but it didn't change anything. I also monitored the read/write counter on the main screen, and nothing changed in the minutes following the spin down.

 

My list of open files in the OpenFiles plugin is still quite small:

 

auto_turbo.php           /usr/local/emhttp (working directory)
device_list              /usr/local/emhttp (working directory)
disk_load                /usr/local/emhttp (working directory)
lsof                     /usr/local/emhttp (working directory)
lsof                     /usr/local/emhttp (working directory)
notify_poller            /usr/local/emhttp (working directory)
parity_list              /usr/local/emhttp (working directory)
php-fpm                  /usr/local/emhttp (working directory)
php-fpm                  /usr/local/emhttp (working directory)
php-fpm                  /usr/local/emhttp (working directory)
php-fpm                  /usr/local/emhttp (working directory)
php-fpm                  /usr/local/emhttp (working directory)
php-fpm                  /usr/local/emhttp (working directory)
php-fpm                  /usr/local/emhttp (working directory)
php-fpm                  /usr/local/emhttp (working directory)
php-fpm                  /usr/local/emhttp (working directory)
php-fpm                  /usr/local/emhttp (working directory)
session_check            /usr/local/emhttp (working directory)
sh                       /usr/local/emhttp (working directory)
sleep                    /usr/local/emhttp (working directory)
unraid-api               /usr/local/bin/unraid-api (working directory)
update_1                 /usr/local/emhttp (working directory)
update_2                 /usr/local/emhttp (working directory)
update_3                 /usr/local/emhttp (working directory)
wg_poller                /usr/local/emhttp (working directory)

 

I also noticed the following line in my /etc/cron.d/root. If I understand correctly, it's a script that runs every minute?

 

*/1 * * * * /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null

 

image.png.786edc68aeb509ceaa9dc882e9dad01c.png

 

I've gone through the script's code, and it performs tests on the array's disks, including temperature readings, SMART data, and disk usage. I haven't delved deeper, but could this potentially be related to my issue?

unraid-diagnostics-20231007-1316.zip

Edited by Londinium
Link to comment
4 hours ago, Londinium said:

I also noticed the following line in my /etc/cron.d/root.

Normal stuff, not the issue.

 

The dupeGuru docker might be a good option for you.  It may or may not be easy to determine yourself if you have duplicates or where they exist.  Considering you've used unBalance to shift files around, there's a good chance you have duplicates somewhere.  All it takes is one to cause this.   You can poke around your disks to compare.  Go to shares > disk shares > diskX -> Export=Yes to enable disk level shares.  Just be careful not to copy files between disk and user shares and perhaps set Export=No when you're done.   No single file should exist in the same location between two disks.  You will need to determine which file to keep as these are duplicate file names, but binary content could be different.

 

You can test how it works yourself.  Create a text file called dupetest.txt.  Edit the contents to something like "This file is located on disk1" and save it to a share on disk1.  Copy the file to the same location on disk2 and edit its contents to "This file is located on disk2" and save it.  Now browse to \\tower\share\dupetest.txt and open it.  Unraid will serve you one of the two versions.  Now, delete it and it will seem like the file did not delete.  If you open it again, it will be the other version.  Delete it again and the file is gone.

Link to comment

Thank you for your answer! Before testing with the dupeGuru docker and searching for duplicate files on my 8TB (I imagine it will take some time), I first tried stopping the array, which I assume stops the shares. I also lowered the spin-down delay to 15 minutes, and within 30 minutes, the drives stopped and instantly restarted twice : 

 

Oct  7 18:44:37 unRAID emhttpd: spinning down /dev/sdd
Oct  7 18:44:37 unRAID emhttpd: spinning down /dev/sdb
Oct  7 18:44:37 unRAID emhttpd: spinning down /dev/sdc
Oct  7 18:45:05 unRAID emhttpd: read SMART /dev/sdd
Oct  7 18:45:14 unRAID emhttpd: read SMART /dev/sdb
Oct  7 18:45:21 unRAID emhttpd: read SMART /dev/sdc
Oct  7 19:00:02 unRAID emhttpd: spinning down /dev/sdd
Oct  7 19:00:07 unRAID emhttpd: spinning down /dev/sdb
Oct  7 19:00:16 unRAID emhttpd: spinning down /dev/sdc
Oct  7 19:01:05 unRAID emhttpd: read SMART /dev/sdd
Oct  7 19:01:14 unRAID emhttpd: read SMART /dev/sdb
Oct  7 19:01:21 unRAID emhttpd: read SMART /dev/sdc

 

To be honest, I'm having trouble understanding how a duplicate file on two drives can keep my array running. I imagine it's a bug and not a feature, but if it's a bug do we know the logic behind why it keeps the drives spinning?

 

As for the unBalance plugin, I installed it following the earlier message about some of my shares being on multiple disks, so I'm pretty sure the plugin is not the cause. I had never used this plugin before, and I have never manually moved files from one disk to another. I only use Samba shares from my Windows machine.

Link to comment
2 hours ago, Londinium said:

To be honest, I'm having trouble understanding how a duplicate file on two drives can keep my array running.

 

Files amongst your shares can be spread across multiple disks.  It's normal to have the share folder on multiple disks, but the abnormal part is having the same filename in the same directory on two or more disks.  There are several reasons why this can happen; moving files from disk to disk is one of them.  If the files are copied to destination, but not removed from source, you have a duplicate situation.

 

Unraid fetches your files from the array disks and shows them to you as if they are in one spot.  Your SMB share can only contain unique filenames so if duplicates are encountered, Unraid picks one to return; probably lowest disk#.   It's the same reason why you cannot have duplicate file names in the same directory on your Windows system. 

 

Reading or writing to a duplicate file may also spin up those other disks and parity.  When one spins down another might spin up because the file exists there too.  It's weird behavior and the resolution is to make sure there are 0 duplicates across your array.

 

Link to comment

Thank you for your response! I understand that there is a possibility of having a duplicate file. I also understand that it's a delicate situation and it shouldn't happen, and Unraid it normally prevents this from happening. However, based on the tests I conducted earlier, such as disabling my Dockers, starting in safe mode (so with no plugins activated), using the OpenFiles plugin, and physically disconnecting my server from the network, I wonder what process would specifically try to access a duplicate file on one of these 4 shares, which contain only movies, music, family videos, and photos. I acknowledge that it's possible, but I imagine there would be some trace of it in a log somewhere, at least I hope so. I also assume that Unraid should be able to detect such a critical situation.

 

In the meantime, I compared the entire contents of my 4 shares with their respective contents on disk1/disk2. To do this, I used the 'find' command as follows:

 

find /mnt/disk1/music -type f > disk1_music.txt
find /mnt/disk2/music -type f > disk2_music.txt
find /mnt/user/music -type f > user_music.txt

 

This allowed me to create an exhaustive list of all file paths on my disks and shares instantly. I then imported this into VSCode, combined the disk1/disk2 files, removed any unnecessary beginning of each line, sorted everything alphabetically, and compared disk/user. Unfortunately, I apparently have no duplicates among the 202,982 files in these 4 shares spread across 2 disks. Given the speed (instantaneous) of listing the files, I imagine this could easily be integrated into a script and used in a plugin like Fix Common Problems, I will work on a User Script about this when I'll have some time.

 

In the meantime, I also pursued another lead because I noticed that my disks reactivated very precisely at the beginning of a new minute. For example, if I stop them at 12:31:01, they will only wake up 59 seconds later, at 12:32:00. However, if I stop them at 12:32:45, they will wake up only 15s later, at exactly 12:33:00. I repeated this test several times with only success, and this makes me think it could be a cron job that runs every minute. I listed (using crontab -l) and checked in /etc/cron.d/root, and the only cron job that runs at the beginning of every minute is the one I mentioned earlier :

 

*/1 * * * * /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null

 

So, I investigated further, and temporarily renamed /usr/local/sbin/mover to /usr/local/sbin/mover2.  I then manually stop my disks and waited for 5 minutes.  They only woke up after I renamed /usr/local/sbin/mover2 back to /usr/local/sbin/mover (and at the beginning of the next minute). And when I say "they only woke up" I mean I physically heard them waking up in my server, which is right next to me. I mention this because I assume that disabling this cron job would make the disk activity display on the main tab inaccurate.

 

Oct  8 12:31:01 unRAID crond[1304]: exit status 127 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null
Oct  8 12:31:08 unRAID emhttpd: spinning down /dev/sdd
Oct  8 12:31:08 unRAID emhttpd: spinning down /dev/sdb
Oct  8 12:31:08 unRAID emhttpd: spinning down /dev/sdc
Oct  8 12:31:14 unRAID kernel: mdcmd (42): set md_write_method 0
Oct  8 12:32:01 unRAID crond[1304]: exit status 127 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null
Oct  8 12:33:01 unRAID crond[1304]: exit status 127 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null
Oct  8 12:34:02 unRAID crond[1304]: exit status 127 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null
Oct  8 12:35:01 unRAID crond[1304]: exit status 127 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null
Oct  8 12:36:05 unRAID emhttpd: read SMART /dev/sdd
Oct  8 12:36:14 unRAID emhttpd: read SMART /dev/sdb
Oct  8 12:36:14 unRAID kernel: mdcmd (43): set md_write_method 1
Oct  8 12:36:21 unRAID emhttpd: read SMART /dev/sdc

 

I still don't know if this cron job is the cause, but I think it's a step forward since it's the first time I've been able to keep my disks powered off. As a side note, I compared the code of the monitor script between version 6.11.5 (the version I was previously on) and version 6.12.4, but aside from a few differences (in addition to the change in the variables access system), nothing caught my eye (but I'm definitely not very familiar with Unraid at that level).

 

Does anyone have any ideas?

Link to comment

The monitor task that runs every minute is standard on all Unraid systems and has been for many years.    Quite what can make it log an error exit code I have no idea.

 

the read SMART messages will be logged when Unraid thinks a drive has just been automatically spun up due to an access occurring.

 

 

Link to comment
2 hours ago, itimpi said:

Quite what can make it log an error exit code I have no idea.

 

If you are referring to this...

 

Oct  8 12:32:01 unRAID crond[1304]: exit status 127 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null
Oct  8 12:33:01 unRAID crond[1304]: exit status 127 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null
Oct  8 12:34:02 unRAID crond[1304]: exit status 127 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null
Oct  8 12:35:01 unRAID crond[1304]: exit status 127 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null

 

... then it's not a bug, it's only when I've disabled the monitor by renaming it, the cron couldn't just find the script to run.
I deliberately included these lines for context.

 

I know correlation is not causation but with all the tests I've conducted (including disabling docker, rebooting in safe mode and physically disconnecting the server from network), it's the only way my drives have stayed down. So I hope that it helps narrowing down the causes.

Link to comment
1 hour ago, Londinium said:

This allowed me to create an exhaustive list of all file paths on my disks and shares instantly. I then imported this into VSCode, combined the disk1/disk2 files, removed any unnecessary beginning of each line, sorted everything alphabetically, and compared disk/user.

 

You want to compare the files of disk1 against disk2.  No reason to compare the user share.  Try this.  Duplicates will save to /mnt/disk1/output2.txt.

find /mnt/disk* -type f > /mnt/disk1/output1.txt
sed -i 's/disk[0-9]/diskX/g' /mnt/disk1/output1.txt
sort /mnt/disk1/output1.txt | uniq -d --all-repeated=separate > /mnt/disk1/output2.txt

 

Link to comment

Thanks for the shortcut! That code snippet should definitely be included somewhere useful for everyone here 👍

 

Unfortunately, while I was comparing disk1+disk2 Vs. user (witch should trigger the same results) I already tried just in case disk1 Vs. disk2 with no results.

I still tried your way but same result, output2.txt is empty, sadly!

Link to comment
59 minutes ago, Londinium said:

useful for everyone here

It works for you because you have only two disks; lazily taylored for you.  As is, the output doesn't tell you which disk the dupe is on (with 2 it doesn't need to) and doesn't properly handle disks 10+

 

Bummer nothing was found.  I'm interested to find out the cause.

Link to comment

A new test conducted this morning without success. I restarted in safe mode with my spin-down delay set to 15 minutes, and I didn't start the array. Despite that, every 15 minutes, the disks stopped and then restarted at the beginning of the next minute after their shutdown. Afterward, I even physically disconnected my server from the network (hence the 'link is down' and following lines in the log), the disks still stopped after 15 minutes and started up again a the begging of the next minute after their shutdown.

 

Oct  9 11:08:50 unRAID emhttpd: spinning down /dev/sdd
Oct  9 11:08:50 unRAID emhttpd: spinning down /dev/sdb
Oct  9 11:08:50 unRAID emhttpd: spinning down /dev/sdc
Oct  9 11:09:05 unRAID emhttpd: read SMART /dev/sdd
Oct  9 11:09:14 unRAID emhttpd: read SMART /dev/sdb
Oct  9 11:09:21 unRAID emhttpd: read SMART /dev/sdc
Oct  9 11:24:03 unRAID emhttpd: spinning down /dev/sdd
Oct  9 11:24:07 unRAID emhttpd: spinning down /dev/sdb
Oct  9 11:24:16 unRAID emhttpd: spinning down /dev/sdc
Oct  9 11:25:06 unRAID emhttpd: read SMART /dev/sdd
Oct  9 11:25:14 unRAID emhttpd: read SMART /dev/sdb
Oct  9 11:25:22 unRAID emhttpd: read SMART /dev/sdc
Oct  9 11:34:19 unRAID kernel: tg3 0000:03:00.0 eth0: Link is down
Oct  9 11:34:19 unRAID dhcpcd[1129]: eth0: carrier lost
Oct  9 11:34:19 unRAID avahi-daemon[2862]: Withdrawing address record for 192.168.1.6 on eth0.
Oct  9 11:34:19 unRAID avahi-daemon[2862]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.1.6.
Oct  9 11:34:19 unRAID avahi-daemon[2862]: Interface eth0.IPv4 no longer relevant for mDNS.
Oct  9 11:34:19 unRAID dhcpcd[1129]: eth0: deleting route to 192.168.1.0/24
Oct  9 11:34:19 unRAID dhcpcd[1129]: eth0: deleting default route via 192.168.1.1
Oct  9 11:34:21 unRAID monitor: Stop running nchan processes
Oct  9 11:34:22 unRAID ntpd[1285]: Deleting interface #1 eth0, 192.168.1.6#123, interface stats: received=37, sent=37, dropped=0, active_time=2436 secs
Oct  9 11:34:22 unRAID ntpd[1285]: 162.159.200.123 local addr 192.168.1.6 -> <null>
Oct  9 11:40:03 unRAID emhttpd: spinning down /dev/sdd
Oct  9 11:40:08 unRAID emhttpd: spinning down /dev/sdb
Oct  9 11:40:16 unRAID emhttpd: spinning down /dev/sdc
Oct  9 11:41:05 unRAID emhttpd: read SMART /dev/sdd
Oct  9 11:41:14 unRAID emhttpd: read SMART /dev/sdb
Oct  9 11:41:21 unRAID emhttpd: read SMART /dev/sdc
Oct  9 11:56:02 unRAID emhttpd: spinning down /dev/sdd
Oct  9 11:56:08 unRAID emhttpd: spinning down /dev/sdb
Oct  9 11:56:16 unRAID emhttpd: spinning down /dev/sdc
Oct  9 11:56:24 unRAID ntpd[1285]: no peer for too long, server running free now
Oct  9 11:57:05 unRAID emhttpd: read SMART /dev/sdd
Oct  9 11:57:13 unRAID emhttpd: read SMART /dev/sdb
Oct  9 11:57:21 unRAID emhttpd: read SMART /dev/sdc

 

What I understand, and this has been very consistent since the beginning of my tests, is that the disks do indeed stop when they are supposed to stop (based on the spin-down delay). If they stop within the expected time frame, it means that no processes are accessing them during that time. However, systematically at the beginning of the next minute (which for me implies it's related to some kind of a scheduled task), all my disks turn on.

 

I'm not sure exactly when this problem started, but I imagine it's recent because I would have definitely noticed it. In the past few weeks, the only significant changes made to my server are:

  • The update to version 6.12.4 from 6.11.5 (I waited a long time to install 6.12, and it was only when I didn't see any more bug fixes for a few weeks after the release of 6.12.4 that I finally decided to install it).
  • The replacement of my CPU (Celeron G1610T > Xeon E3-1265L V2).
  • The replacement of a 4GB ECC RAM module with an 8GB one (followed by several hours of memtest).
  • The replacement of my SATA SSD cache drive with a much faster M.2 drive.

Having disabled as many things as possible on my server, having conducted all these tests, and having shown that stopping the cron monitor keeps my disks down, I'm running out of ideas.

Link to comment

I have finally decided to revert to version 6.11.5. My drives now remain stopped as they should. 
I suspect there may be an issue with version 6.12.x. 
I'll wait for either version 6.12.5 or 6.13.x to try an upgrade again. 
Anyway, thanks to everyone for trying to help me resolve this issue!

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...