Long array stop times, why?


Recommended Posts

I've noticed, since upgrading to unraid6, that stopping the array seems to take an awfully long time and it seems the sync command is the offender, checking /sys/block/sd[a-h]/stat gives me output like

 

# for each in $(ls /sys/block/sd[a-h]); do echo "${each/@/} : $(cat ${each/@/}/stat |awk '{print $9}')";done
/sys/block/sda : 0
/sys/block/sdb : 0
/sys/block/sdc : 0
/sys/block/sdd : 0
/sys/block/sde : 0
/sys/block/sdf : 0
/sys/block/sdg : 0
/sys/block/sdh : 141

 

(the 9th column in the output being in flight io requests as per https://www.kernel.org/doc/Documentation/block/stat.txt)

 

which shows that sdh is the only thing with anything to do and that is the drive being precleared. All other stats are not moving for the actual drives in the array.

 

Is preclear holding up the array stop? or is something else going on?

 

To give an example, I triggered an array stop at 0817 today and it's still going 50mins later. The web ui is unresponsive throughout this time but the system is up and running & I can ssh in and look at what is going on.

Link to comment

seems unfortunate that preclear affects stopping the array (and then makes the web ui completely unresponsive to boot)

 

is there any reason why preclear has to be run on the unraid host as opposed some random linux box? I've read through the script and it seems to just make use of a few unraid config files in a few places but that would be easy enough to stub.

Link to comment

is there any reason why preclear has to be run on the unraid host as opposed some random linux box? I've read through the script and it seems to just make use of a few unraid config files in a few places but that would be easy enough to stub.

Preclear can be run on any system.  It is common practise to boot a version of unRAID on another system for exactly this purpose.  It can also be run on a vanilla Linux system as long as you make sure any dependencies of the script are present.
Link to comment

Preclear can be run on any system.  It is common practise to boot a version of unRAID on another system for exactly this purpose.  It can also be run on a vanilla Linux system as long as you make sure any dependencies of the script are present.

ok thanks, I'll go that route in future then.

Link to comment

Preclear can be run on any system.  It is common practise to boot a version of unRAID on another system for exactly this purpose.  It can also be run on a vanilla Linux system as long as you make sure any dependencies of the script are present.

ok thanks, I'll go that route in future then.

 

Sync is a bit of a pig. If the array is spun down it can take a minute or more to come back, spinning up all drives in the process (the newperms script calls sync and has been my main experience with this irritating behavior). Never had a preclear prevent array being stopped (maybe never tried) and am a little skeptical that it is the reason. I have had open Windows explorer sessions with array drives open, and telnet sessions with current directory set to an array disk location hold up array shutdown. You get a pretty unhelpful stream of messages at the bottom of the web gui screen which are at least a tickler to go find what is holding up the shutdown. If you don't find it, the array will never stop. I expect you'd also not be able to bring up a new web gui session, although the existing session will continue to be updated. - but if it were closed it would likely just appear to hang with the symptoms you describe.

Link to comment
Never had a preclear prevent array being stopped (maybe never tried) and am a little skeptical that it is the reason.

FWIW I checked the logs this evening and can see that zero'ing the drive completed at 0950 this morning

 

# stat /tmp/zerosdh
  File: ‘/tmp/zerosdh’
  Size: 231873          Blocks: 456        IO Block: 4096   regular file
Device: 2h/2d   Inode: 123856      Links: 1
Access: (0666/-rw-rw-rw-)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2016-01-06 20:43:32.097103305 +0000
Modify: 2016-01-06 09:50:50.200307361 +0000
Change: 2016-01-06 09:50:50.200307361 +0000

 

and at the same time in /var/log/syslog we see

 

Jan  6 09:50:49 zalaga-unraid emhttp: shcmd (122): rm -f /boot/config/plugins/dynamix/mover.cron
Jan  6 09:50:49 zalaga-unraid emhttp: shcmd (123): /usr/local/sbin/update_cron &> /dev/null
Jan  6 09:50:49 zalaga-unraid emhttp: Unmounting disks...
Jan  6 09:50:49 zalaga-unraid kernel: mdcmd (131): stop
Jan  6 09:50:49 zalaga-unraid kernel: md1: stopping
Jan  6 09:50:49 zalaga-unraid kernel: md2: stopping
Jan  6 09:50:49 zalaga-unraid kernel: md3: stopping
Jan  6 09:50:49 zalaga-unraid kernel: md4: stopping
Jan  6 09:50:49 zalaga-unraid kernel: md5: stopping
Jan  6 09:50:49 zalaga-unraid emhttp: shcmd (124): rmmod md-mod |& logger
Jan  6 09:50:49 zalaga-unraid kernel: md: unRAID driver removed
Jan  6 09:50:49 zalaga-unraid emhttp: shcmd (125): modprobe md-mod super=/boot/config/super.dat slots=24 |& logger

 

This looks pretty conclusive that the array shutdown sync is on all disks in the system not just array disks

Link to comment

If current directory of the preclear command was in an array disk, that would explain it too. I've had shutdowns hang because I had an old screen session and directory was on the array. Sync is a Linux command. It works on all disks. Just not sure why it would hang on a disk under heavy i/o. May need a Linux expert to weigh in.

Link to comment

Fair point, i was thinking of it syncing a disk at a time which, as you say, it doesn't. Well that would explain it then anyway, preclear zeroing is constantly reading from urandom to generate data to write to the disk so attempting to sync is doomed to sit there forever, ie sync is trying to flush memory to disk while another process of constantly generating data in memory to write to disk.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.