WD20EARS "Freezing" on File copies during rebuild?

March 12, 201016 yr

I was previously running with a WD 2TB EADS parity drive and Seagate 1.5TB Data drives. I had picked up a WD 2TB EARS drive to try out. I put the jumper on 7-8 and it is starting the rebuild.

I'm not sure if this is a valid or not, but I'm also trying to copy some files to the drive being rebuilt. It is hanging up and not completing when I try to do so.

Is that even possible or do I have to wait for the array to rebuild correctly before trying to copy some more files onto it? I've been able to copy some files over to the drive being rebuilt, but other times it just hangs. ie, Since the files are exported and I have to rebuild I'm not sure what I'm doing is even valid

Can I move back to the 1.5TB drive? If so, how do I do that? When I tried to put the old drive back in it said I had to have a bigger drive than the failed drive.

Quote

March 12, 201016 yr

It is hanging up and not completing when I try to do so...

Abeta, you should know betta'

When asking such question, always attach a syslog.

Quote

March 12, 201016 yr

Author

It is hanging up and not completing when I try to do so...

Abeta, you should know betta'

When asking such question, always attach a syslog.

There wasn't anything in the syslog when the file was freezing or anything. New messages nothing. The last messages were the startup ones where it synchronized with the ftp log, etc. This has happened a few times.

The Teracopy will eventually just give up without anything that looks wrong from unRAIDs side that I can see. I've had it work sometimes and sometimes not during the same rebuild.

Here's the last snippets and full log attached:

Mar 11 20:01:06 Serenity kernel: mdcmd (18): check

Mar 11 20:01:06 Serenity kernel: md: recovery thread woken up ...

Mar 11 20:01:06 Serenity kernel: md: recovery thread rebuilding disk3 ...

Mar 11 20:01:06 Serenity emhttp: shcmd (12): mount -t reiserfs -o noacl,nouser_xattr,noatime,nodiratime /dev/md3 /mnt/disk3 >/dev/null 2>&1

Mar 11 20:01:06 Serenity kernel: md: using 1152k window, over a total of 1953514552 blocks.

Mar 11 20:01:06 Serenity kernel: REISERFS (device md3): found reiserfs format "3.6" with standard journal

Mar 11 20:01:06 Serenity kernel: REISERFS (device md3): using ordered data mode

Mar 11 20:01:06 Serenity kernel: REISERFS (device md1): found reiserfs format "3.6" with standard journal

Mar 11 20:01:06 Serenity kernel: REISERFS (device md1): using ordered data mode

Mar 11 20:01:06 Serenity kernel: REISERFS (device md4): found reiserfs format "3.6" with standard journal

Mar 11 20:01:06 Serenity kernel: REISERFS (device md4): using ordered data mode

Mar 11 20:01:06 Serenity kernel: REISERFS (device md2): found reiserfs format "3.6" with standard journal

Mar 11 20:01:06 Serenity kernel: REISERFS (device md2): using ordered data mode

Mar 11 20:01:06 Serenity kernel: REISERFS (device md3): journal params: device md3, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30

Mar 11 20:01:06 Serenity kernel: REISERFS (device md3): checking transaction log (md3)

Mar 11 20:01:06 Serenity kernel: REISERFS (device md4): journal params: device md4, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30

Mar 11 20:01:06 Serenity kernel: REISERFS (device md4): checking transaction log (md4)

Mar 11 20:01:06 Serenity kernel: REISERFS (device md2): journal params: device md2, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30

Mar 11 20:01:06 Serenity kernel: REISERFS (device md2): checking transaction log (md2)

Mar 11 20:01:06 Serenity kernel: REISERFS (device md1): journal params: device md1, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30

Mar 11 20:01:06 Serenity kernel: REISERFS (device md1): checking transaction log (md1)

Mar 11 20:01:06 Serenity sshd[2215]: Server listening on 0.0.0.0 port 22.

Mar 11 20:01:06 Serenity kernel: REISERFS (device md4): replayed 3 transactions in 0 seconds

Mar 11 20:01:06 Serenity kernel: REISERFS (device md1): replayed 2 transactions in 0 seconds

Mar 11 20:01:06 Serenity kernel: REISERFS (device md2): replayed 2 transactions in 0 seconds

Mar 11 20:01:06 Serenity init: Re-reading inittab

Mar 11 20:01:06 Serenity kernel: REISERFS (device md1): Using r5 hash to sort names

Mar 11 20:01:06 Serenity kernel: REISERFS (device md2): Using r5 hash to sort names

Mar 11 20:01:06 Serenity kernel: REISERFS (device md4): Using r5 hash to sort names

Mar 11 20:01:07 Serenity emhttp: shcmd (15): cp /var/spool/cron/crontabs/root- /var/spool/cron/crontabs/root

Mar 11 20:01:07 Serenity emhttp: shcmd (16): echo '# Generated mover schedule:' >>/var/spool/cron/crontabs/root

Mar 11 20:01:07 Serenity emhttp: shcmd (17): echo '40 3 * * * /usr/local/sbin/mover 2>&1 | logger' >>/var/spool/cron/crontabs/root

Mar 11 20:01:07 Serenity emhttp: shcmd (18): crontab /var/spool/cron/crontabs/root

Mar 11 20:01:09 Serenity apcupsd[1677]: NIS server startup succeeded

Mar 11 20:01:09 Serenity apcupsd[1677]: apcupsd 3.14.3 (20 January 2008) slackware startup succeeded

Mar 11 20:01:10 Serenity kernel: REISERFS (device md3): replayed 4 transactions in 4 seconds

Mar 11 20:01:10 Serenity kernel: REISERFS (device md3): Using r5 hash to sort names

Mar 11 20:01:10 Serenity emhttp: shcmd (20): rm /etc/samba/smb-shares.conf >/dev/null 2>&1

Mar 11 20:01:10 Serenity emhttp: _shcmd: shcmd (20): exit status: 1

Mar 11 20:01:10 Serenity emhttp: shcmd (21): cp /etc/exports- /etc/exports

Mar 11 20:01:10 Serenity emhttp: shcmd (22): mkdir /mnt/user

Mar 11 20:01:10 Serenity emhttp: shcmd (23): /usr/local/sbin/shfs /mnt/user -o noatime,big_writes,allow_other,default_permissions

Mar 11 20:01:11 Serenity emhttp: shcmd (24): killall -HUP smbd

Mar 11 20:01:11 Serenity emhttp: shcmd (25): /etc/rc.d/rc.nfsd restart | logger

Mar 11 20:01:36 Serenity ntpd[1430]: synchronized to 209.237.247.192, stratum 3

Mar 11 20:01:36 Serenity ntpd[1430]: time reset -0.276106 s

Mar 11 20:02:38 Serenity sshd[3589]: error: Could not get shadow information for root

Mar 11 20:02:38 Serenity sshd[3589]: Accepted password for root from 192.168.1.4 port 51713 ssh2

Mar 11 20:02:38 Serenity sshd[3593]: lastlog_filetype: Couldn't stat /var/log/lastlog: No such file or directory

Mar 11 20:02:38 Serenity sshd[3593]: lastlog_openseek: /var/log/lastlog is not a file or directory!

Mar 11 20:02:38 Serenity sshd[3593]: lastlog_filetype: Couldn't stat /var/log/lastlog: No such file or directory

Mar 11 20:02:38 Serenity sshd[3593]: lastlog_openseek: /var/log/lastlog is not a file or directory!

Mar 11 20:09:57 Serenity ntpd[1430]: synchronized to 209.237.247.192, stratum 3

Mar 12 06:28:24 Serenity kernel: mdcmd (6537): spindown 4

Total lines: 773

unRAID-Syslog.txt

Quote

March 12, 201016 yr

A quick glance at your syslog didn't reveal anything funny going on.

Three suggestions come to mind though. First, switch to CFQ scheduler. CFQ works much smarter than the defauld NOOP that you have there.

Second, stop disabling the NCQ. They fixed that bug many kernels ago. And third, try setting 'max_sectors_kb' for the disks to 128:

for i in /sys/block/[hs]d? ; do echo 128 > $i/queue/max_sectors_kb ; done 2>/dev/null

With the above three tweaks, I've successfully eliminated similar freezes on my system.

They may or may not fix your problem, but it's worth the try. Please report back how it goes.

Quote

March 12, 201016 yr

Author

A quick glance at your syslog didn't reveal anything funny going on.

Three suggestions come to mind though. First, switch to CFQ scheduler. CFQ works much smarter than the defauld NOOP that you have there.

Second, stop disabling the NCQ. They fixed that bug many kernels ago. And third, try setting 'max_sectors_kb' for the disks to 128:
for i in /sys/block/[hs]d? ; do echo 128 > $i/queue/max_sectors_kb ; done 2>/dev/null
With the above three tweaks, I've successfully eliminated similar freezes on my system.

They may or may not fix your problem, but it's worth the try. Please report back how it goes.

These are the default settings in unRAID 4.5.3. How do I implement your suggestions? I think I can use the script in unmenu to do CFQ. How do I do the second two? I see a disk read-ahead 256 in unmenu? Thx!

Quote

March 12, 201016 yr

A quick glance at your syslog didn't reveal anything funny going on.

Three suggestions come to mind though. First, switch to CFQ scheduler. CFQ works much smarter than the defauld NOOP that you have there.

Second, stop disabling the NCQ. They fixed that bug many kernels ago. And third, try setting 'max_sectors_kb' for the disks to 128:
for i in /sys/block/[hs]d? ; do echo 128 > $i/queue/max_sectors_kb ; done 2>/dev/null
With the above three tweaks, I've successfully eliminated similar freezes on my system.

They may or may not fix your problem, but it's worth the try. Please report back how it goes.
These are the default settings in unRAID 4.5.3. How do I implement your suggestions? I think I can use the script in unmenu to do CFQ. How do I do the second two? I see a disk read-ahead 256 in unmenu? Thx!

The read-ahead in unmenu has nothing to do with what I'm talking about.

For #1: Add the following boot code to your 'syslinux.cfg' file: elevator=cfq

The unRAID section in your 'syslinux.cfg' should look something like this:

label unRAID OS
 menu default
 kernel bzimage
 append  elevator=cfq  initrd=bzroot

NOTE: Make sure that you use a text editor that can do Linux-style line endings! Read this:

http://lime-technology.com/wiki/index.php?title=FAQ#Why_do_my_scripts_have_problems_with_end-of-lines.3F

For #2: Go to the unRAID management web page, go to tab "Settings", and near the bottom of that page, in "Disk Settings" set the "Force NCQ disabled" to No.

For #3: Put that line in your 'go' script:

for i in /sys/block/[hs]d? ; do echo 128 > $i/queue/max_sectors_kb ; done 2>/dev/null

After you've done #1, #2, and #3, reboot your server. Then see if you can reproduce the problem. Let us know how it goes.

Quote

March 12, 201016 yr

Author

Thanks for the steps. I can definitely try to configure those steps tonight...but my data rebuild finished this morning.

How do I "fake" out unRAID and convince it the drive is new again and make it rebuild for me to test? For example, if this is a corner case and not really supported it won't matter I guess I just need to make sure I wait for a complete rebuild before trying to copy files to the drive being rebuilt.

Quote

March 12, 201016 yr

How do I "fake" out unRAID and convince it the drive is new again and make it rebuild for me to test? For example, if this is a corner case and not really supported it won't matter I guess

It's not like I've observed your particular "corner case" before.

So I would rather describe it somewhat more generally.

Something like, unRAID freezing on high volume I/O, on your hardware.

How to simulate that? You could start a couple of preclear scripts, start a large copy with 'mc' between disk within the server, and then on top of all that start a large samba copy from a windows computer to the unRAID server. While all that is going on, see how you browse your shared disks. If all that doesn't choke it, then you're good to go. But if it freezes on you, then try to apply the above mentioned tweaks and see if it will make a difference.

Quote

March 12, 201016 yr

Author

Thx. One last question on the editor, it didn't list MC as being valid. I assume MC is ok as I think I've used that before?

ETA: Never mind, there's an option to edit the syslinux.cfg file in unMENU

Quote

March 12, 201016 yr

Thx. One last question on the editor, it didn't list MC as being valid. I assume MC is ok as I think I've used that before?

ETA: Never mind, there's an option to edit the syslinux.cfg file in unMENU

It is lower case "mc"

Quote

March 13, 201016 yr

Second, stop disabling the NCQ. They fixed that bug many kernels ago.

The default in 4.5.3 is to have NCQ disabled. Should we be enabling this even if not seeing any issues? What benefits will we see with it enabled and are there any potential issues if using it?

Quote

March 13, 201016 yr

Second, stop disabling the NCQ. They fixed that bug many kernels ago.

The default in 4.5.3 is to have NCQ disabled. Should we be enabling this even if not seeing any issues? What benefits will we see with it enabled and are there any potential issues if using it?

Google NCQ, test it both ways, and figure it out.

Quote

March 13, 201016 yr

Second, stop disabling the NCQ. They fixed that bug many kernels ago.

The default in 4.5.3 is to have NCQ disabled. Should we be enabling this even if not seeing any issues? What benefits will we see with it enabled and are there any potential issues if using it?

Google NCQ, test it both ways, and figure it out.

I've been reading quite a bit about it and it sounds like whether one would see benefits is not straightforward. I'm just curious in an unraid environment where streaming large files if there could be a benefit. I'll most certainly try some testing. Your response earlier in the thread recommending to stop disabling it got me wondering if it is something I should be considering.

Quote

March 13, 201016 yr

I've been reading quite a bit about it and it sounds like whether one would see benefits is not straightforward.

Exactly. That's why a little testing will give you your answer. You'll either see benefits, or you won't.

Quote

March 14, 201016 yr

Hell OP, your problem is not limited to your HDD, I am using Hitachi DeskStar and receiving the same problem during inital parity resync. Start a copy which would then slow down then just stop and time out.

Quote

March 14, 201016 yr

Author

Hell OP, your problem is not limited to your HDD, I am using Hitachi DeskStar and receiving the same problem during inital parity resync. Start a copy which would then slow down then just stop and time out.

Hmmmm.....so in your case it isn't restricted to the drive being rebuilt. I'm pretty sure I've done this before either on a parity resync or to a drive other than the one being rebuilt but possibly not with unRAID 4.5.3.

Quote

March 14, 201016 yr

Hmmmm.....so in your case it isn't restricted to the drive being rebuilt.

As I suggested earlier, the freezings and the time-outs are not restricted to your particular "corner case", as you called it.

Another useful fix (in addition to the three tweaks in my earlier post) is to increase Samba's "max open files" to 16500 as described in this post:

http://lime-technology.com/forum/index.php?topic=5004.msg46112#msg46112

Also, when you need to copy large amounts of stuff from a windows machine to your unRAID server, TeraCopy does a much better job than windows explorer.

Quote

March 14, 201016 yr

Author

I've seen the Samba file problem and it mostly goes away with teracopy, but my specific case, it's copying one file that has been timing out for me . I've put the changes you mentioned in but haven't been able to replicate the problem. I might start the parity check and try copying files as well to see how it behaves.

Quote

March 14, 201016 yr

I've put the changes you mentioned in but haven't been able to replicate the problem.

Instead of 'but' you mean 'and', right?

Quote

March 14, 201016 yr

Author

I've put the changes you mentioned in but haven't been able to replicate the problem.

Instead of 'but' you mean 'and', right?

They mean the same thing to me but to clarify I have not been able to replicate the problem at this time but I have not been able to do exactly the same thing that was causing my problem in the first place. Namely it was rebuilding the drive when I was trying to copy files to it.

Quote

May 11, 201016 yr

i'm experiancing the exact same problem as described here http://lime-technology.com/forum/index.php?topic=6347.0

fresh install, initial rebuild and unable to copy files without them "hanging" half way through and then timing out.

Suggestions or just wait for the build to finish?

Quote

WD20EARS "Freezing" on File copies during rebuild?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)