unRAID Server Release 5.0-rc2 Available


Recommended Posts

OK. here is an interesting bug or issue.

 

I decided to expand my RC2 array.

I added a new drive and hit the "yes I want to add the new drive" button, it decided to add the drive AND run a parity check at the same time.

Then on top of that, it looks like it tried to spin the drives down while running a parity check and formatting a drive.

 

I'll admit I am not running 100% clean unraid.

 

 

I have VM tools

plus the following unMenu packages: apcupsd, Monthly Parity Check, Clean Powerdown, overtemp powerdown, screen, SSMTP and mail alerts.

None of these should have caused this glitch..

running default gui.

 

Syslog attached.

Everything else seems to be OK.

LSI spin-up spin-down.

LSI on expander.

Server booted in both baremetal and as a VM.

 

Test speeds seem ok. about on par with beta12 so far.

 

From the system log it appears your new disk was "pre-cleared".  In this case the operation performed by the s/w is called "Protected Expansion".  The disk is marked as being set to "all zeros" (by the pre-clear), so it is simply added to the array (since data values of all zeros does not change parity.  However, to be safe, the s/w also kicks off a "Parity Check/Correct" operation just to be sure  ;)  If for some reason the newly added disk was not set to all zeros, then later if a disk failed, it would not be able to be rebuilt correctly.  You can Cancel the Parity Check if you want.

 

The spindowns are odd, perhaps one of your plugins is doing this, but I'll try to duplicate.

Link to comment
  • Replies 158
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Ok, issues with Realktek incoming!

 

8112L is giving me < 3MB/sec (writes) to the cache drive. Before with b12 it was 50MB/sec. Essentially, useless :(

 

May  6 20:05:07 Tower logger: Sun May  6 10:23:10 BST 2012 - Hard Drives active, resetting counter
May  6 20:05:07 Tower kernel: r8169 0000:02:00.0: eth0: link up (Network)
May  6 20:05:38 Tower last message repeated 15 times
May  6 20:05:41 Tower last message repeated 7 times
May  6 20:05:42 Tower kernel: net_ratelimit: 2 callbacks suppressed
May  6 20:05:42 Tower kernel: r8169 0000:02:00.0: eth0: link up (Network)
May  6 20:06:04 Tower last message repeated 32 times
May  6 20:06:07 Tower logger: Sun May  6 10:23:10 BST 2012 - Hard Drives active, resetting counter
May  6 20:06:14 Tower kernel: r8169 0000:02:00.0: eth0: link up (Network)
May  6 20:06:48 Tower last message repeated 11 times
May  6 20:07:04 Tower last message repeated 17 times
May  6 20:07:07 Tower logger: Sun May  6 10:23:10 BST 2012 - Hard Drives active, resetting counter
May  6 20:07:14 Tower kernel: r8169 0000:02:00.0: eth0: link up (Network)
May  6 20:07:50 Tower last message repeated 24 times
May  6 20:08:07 Tower last message repeated 13 times
May  6 20:08:07 Tower logger: Sun May  6 10:23:10 BST 2012 - Hard Drives active, resetting counter
May  6 20:08:08 Tower kernel: r8169 0000:02:00.0: eth0: link up (Network)
May  6 20:08:15 Tower last message repeated 12 times
May  6 20:08:17 Tower kernel: net_ratelimit: 3 callbacks suppressed
May  6 20:08:17 Tower kernel: r8169 0000:02:00.0: eth0: link up (Network)

 

As can be seen, it is clearly b0rked.

 

Syslog is attached.

 

Ideas?

syslog-2012-05-06.txt

Link to comment

I have never noticed it do a parity check in the past while formatting a new drive. perhaps i just don't add them often enough to remember that.

and yes, it was precleared.

 

as far as the spin-downs, i would not worry to much unless it has been reported by others.

it is just another log to look at.

 

Thanks for getting back to me.

 

PS. speeds are pretty decent.

Write directly to array @ about 45MB/s

Write to SSD cache @ about 115MB/s

Read from array, 70-90MB/s

(Using a 20GB ISO. All number are slower for small clusters of files)

Parity check, about 100MB/s @ 25% complete (slower then last check with b12 but i did add a drive to expander. I might have lost bandwidth)

Edit: not sure what happened. parity check sped up to 110-122MB/s. something must have been running that I didnt see.

Link to comment

Hello, i have just upgraded my box from 4.7 to 5rc2 and everything went smooth from what i can see. i just reformatted my flash drive and copied over the config files and shares from the original (no plugins were added) that had the network/share settings.

 

the problem is with shares and file access. i don't have any special security setup just the default settings but for some reason i cannot copy or make directories to most of the disk shares directly and one of the file shares that is supposed to be on one disk will not let me access it via windows explorer (i can access via the disk share though) and i get a access denied message. i have tried running the new permisions option under the utils but even after a reboot still no access. any help is appreciated, attached is my syslog.

 

syslog-5-6-2012.txt

Link to comment

I just upgraded my 5.0b6a machine to RC2.

 

I'm getting errors similar to the following in the syslog that I've never seen before:

May  6 10:35:58 Hyperion kernel: ata11: sas eh calling libata port error handler (Errors)
May  6 10:35:58 Hyperion kernel: ata12: sas eh calling libata port error handler (Errors)
May  6 10:35:58 Hyperion kernel: ata13: sas eh calling libata port error handler (Errors)
May  6 10:35:58 Hyperion kernel: ata14: sas eh calling libata port error handler (Errors)

 

The array booted up and came online automatically, and the drives all list the correct alignment.  I do have a AOC-SASLP-MV8 SATA card, as well as a pair of Monoprice 2 port SATA cards, in addition to my 4 on-board ports.  I'm using 9 total drives at this point (7 in the array, 1 parity, 1 mounted but unaffiliated drive).

 

As far as I can tell, the data is all OK.  Parity check is running, but I notice it's at about 2/3 normal speed (44MB/s).  I was just getting parity check speeds between 66-68MB/s on Beta 6a on the same parity check location (about 0% - 2%).

 

Any ideas on what is going on?  Should I cancel my parity check and revert to 6a, which was stable for me?

 

Edit: Ooops, forgot to attach the full syslog.

Those are not errors.  They are informational messages to you describing how the "sas" driver for each of ata11,12,13 and 14 will use the libata port error handler.

 

Thanks, Joe.  My preclear is almost done, and it looks like I haven't seen any actual errors, so I appear to be fine.  I have a Realtek Ethernet port, so once the preclear is complete I'll test my throughput speeds.  Hopefully all is well there, as I know there's been some driver issues.

Link to comment

Hello, i have just upgraded my box from 4.7 to 5rc2 and everything went smooth from what i can see. i just reformatted my flash drive and copied over the config files and shares from the original (no plugins were added) that had the network/share settings.

 

the problem is with shares and file access. i don't have any special security setup just the default settings but for some reason i cannot copy or make directories to most of the disk shares directly and one of the file shares that is supposed to be on one disk will not let me access it via windows explorer (i can access via the disk share though) and i get a access denied message. i have tried running the new permisions option under the utils but even after a reboot still no access. any help is appreciated, attached is my syslog.

 

Re-boot the windows computer.

Link to comment

Hello, i have just upgraded my box from 4.7 to 5rc2 and everything went smooth from what i can see. i just reformatted my flash drive and copied over the config files and shares from the original (no plugins were added) that had the network/share settings.

 

the problem is with shares and file access. i don't have any special security setup just the default settings but for some reason i cannot copy or make directories to most of the disk shares directly and one of the file shares that is supposed to be on one disk will not let me access it via windows explorer (i can access via the disk share though) and i get a access denied message. i have tried running the new permisions option under the utils but even after a reboot still no access. any help is appreciated, attached is my syslog.

Did you run the new permissions script under the utilities tab?

Link to comment

What is the verdict on NFS performance and reliability in RC2 so far? Still broken?

If RC2 is indeed stable with everything else and write performance is acceptable (need some more tests including with cache drives) then I it must be time to just draw a line and call it 5.0 Final. Fixing NFS should be next priority and hence version 5.01 / 5.1.

 

So, you're advocating releasing a 'stable' version where one of the advertised features may not work reliably?

 

 

I want NFS to work as much as everyone else, and consider it essential for my needs. I'm saying that the focus should be NFS and getting a Final with NFS fixed in it, if not a Final followed by a release that fixes NFS ONLY.

Link to comment

Hello, i have just upgraded my box from 4.7 to 5rc2 and everything went smooth from what i can see. i just reformatted my flash drive and copied over the config files and shares from the original (no plugins were added) that had the network/share settings.

 

the problem is with shares and file access. i don't have any special security setup just the default settings but for some reason i cannot copy or make directories to most of the disk shares directly and one of the file shares that is supposed to be on one disk will not let me access it via windows explorer (i can access via the disk share though) and i get a access denied message. i have tried running the new permisions option under the utils but even after a reboot still no access. any help is appreciated, attached is my syslog.

 

Re-boot the windows computer.

 

thanks that did it.

 

 

Link to comment

And that might be something like another kernel change or cause some other unsuitability.

We are in the home stretch... Tom might as will get the bugs fixed to the best of his ability... then release RC3. if that is golden. then go final with it.

 

Releasing a final v5 that needs known fixes is not a final in my opinion. that is a beta dressed up in new clothes.

 

 

RC2 is still only a day old. it needs a little while for hidden bugs to pop up.  i rather we stick with RC's for a short time until everyone is sure it is stable..

 

 

that said.. second Norco upgraded to RC2.

It is looking pretty good other then the mail issue from simple features.  no other noticeable bugs or syslog errors. 

I did not do a full parity check yet. massive lightning storms here. i powered off all non-essential servers/PC's

 

Link to comment

Ok, issues with Realktek incoming!

 

8112L is giving me < 3MB/sec (writes) to the cache drive. Before with b12 it was 50MB/sec. Essentially, useless :(

 

As can be seen, it is clearly b0rked.

 

Syslog is attached.

 

Ideas?

Something is initiating these rate limits, are you using other plugins?

Link to comment

Ok, issues with Realktek incoming!

 

8112L is giving me < 3MB/sec (writes) to the cache drive. Before with b12 it was 50MB/sec. Essentially, useless :(

 

As can be seen, it is clearly b0rked.

 

Syslog is attached.

 

Ideas?

Something is initiating these rate limits, are you using other plugins?

 

PS3 Media Server (which isn't being accessed), cache_dirs (which wasn't running - I was looking at the read/write stats) and SimpleFeatures.

 

I will run it again without them running but it is clearly a driver error, we've been here before.

Link to comment

I will run it again without them running but it is clearly a driver error, we've been here before.

I have the same Realtek 8112L chip, using the same 8169 driver as indicated in your syslog, and with me it is running fine. I can achieve line rate in both directions.

 

 

Link to comment

 

Madburg: What did I do to deserve this "thank you for wasting time" remark ???

 

He said 'for all others', so not you ;)

 

But I agree with madburg, anyone who wants to checkout this RC (or any beta) should do so on a clean unread install without any plugin. Verify if everything is working fine, and only after that has been done you should install any needed plugins and see if it breaks anything.

 

Test results from installs with plugins offer no information for Tom as the plugin itself can be the culprit.

Thank you, I thought the statement was quite clear it was not towards him, but maybe he felt guilty about something... Lol, no worries.

 

I have seen in some of the beta versions where attempts of spin downs on drives that are in use as well, so i just pointed out one example for you to log that behavior quickly (after 5 mins) so you can provide Tom the logs, so it could be corrected.

 

One other point johnm mentioned. UnRAID never started a parity check on any drive I first pre cleared. It would just add it, format (quite quickly) and array was ready to go, no parity sync ever kicked off. So this may need some looking into.

 

 

 

 

Link to comment

Ok, issues with Realktek incoming!

 

8112L is giving me < 3MB/sec (writes) to the cache drive. Before with b12 it was 50MB/sec. Essentially, useless :(

 

As can be seen, it is clearly b0rked.

 

Syslog is attached.

 

Ideas?

Something is initiating these rate limits, are you using other plugins?

 

PS3 Media Server (which isn't being accessed), cache_dirs (which wasn't running - I was looking at the read/write stats) and SimpleFeatures.

 

I will run it again without them running but it is clearly a driver error, we've been here before.

 

No one will guarantee when a new version comes up (beta/Rc) that a plugin still works 100% or cause issues, so it could not possible be clear (only clear things is no one should be applyin plugins when testing these versions. Once your done testing and happy and feel no other bugs affect u, then go ahead and try some plugins and report them on the post the owner of the plugin started) so don't load any at all, try again and then let Tom know if we're there again. With syslog of coarse. Don't take it personal, I'm stating what Tom is tired of stating.

 

 

 

Link to comment

Ok, issues with Realktek incoming!

 

8112L is giving me < 3MB/sec (writes) to the cache drive. Before with b12 it was 50MB/sec. Essentially, useless :(

 

May  6 20:05:07 Tower logger: Sun May  6 10:23:10 BST 2012 - Hard Drives active, resetting counter
May  6 20:05:07 Tower kernel: r8169 0000:02:00.0: eth0: link up (Network)
May  6 20:05:38 Tower last message repeated 15 times
May  6 20:05:41 Tower last message repeated 7 times
May  6 20:05:42 Tower kernel: net_ratelimit: 2 callbacks suppressed
May  6 20:05:42 Tower kernel: r8169 0000:02:00.0: eth0: link up (Network)
May  6 20:06:04 Tower last message repeated 32 times
May  6 20:06:07 Tower logger: Sun May  6 10:23:10 BST 2012 - Hard Drives active, resetting counter
May  6 20:06:14 Tower kernel: r8169 0000:02:00.0: eth0: link up (Network)
May  6 20:06:48 Tower last message repeated 11 times
May  6 20:07:04 Tower last message repeated 17 times
May  6 20:07:07 Tower logger: Sun May  6 10:23:10 BST 2012 - Hard Drives active, resetting counter
May  6 20:07:14 Tower kernel: r8169 0000:02:00.0: eth0: link up (Network)
May  6 20:07:50 Tower last message repeated 24 times
May  6 20:08:07 Tower last message repeated 13 times
May  6 20:08:07 Tower logger: Sun May  6 10:23:10 BST 2012 - Hard Drives active, resetting counter
May  6 20:08:08 Tower kernel: r8169 0000:02:00.0: eth0: link up (Network)
May  6 20:08:15 Tower last message repeated 12 times
May  6 20:08:17 Tower kernel: net_ratelimit: 3 callbacks suppressed
May  6 20:08:17 Tower kernel: r8169 0000:02:00.0: eth0: link up (Network)

 

As can be seen, it is clearly b0rked.

 

Syslog is attached.

 

Ideas?

 

as can be seen, it clearly happens after an AFP error... (well, thats what it looks like to me)

 

May  6 19:05:19 Tower afpd[14140]: dsi_stream_read: len:0, unexpected EOF

 

The server appears to work for hours (~6) before the network card has that trouble.

 

Does it work with SMB/NFS? (make sure to use UDP protocal for NFS! TCP does not work!)

You would probably want a fresh boot up to test properly.

Link to comment

I'm seeing some unusual spinup/spindown behavior on RC2.

 

For example, recently from my log:

May  6 20:27:25 Hyperion kernel: mdcmd (43): spindown 0 (Routine)
May  6 20:27:26 Hyperion kernel: mdcmd (44): spindown 1 (Routine)
May  6 20:27:26 Hyperion kernel: mdcmd (45): spindown 2 (Routine)
May  6 20:27:27 Hyperion kernel: mdcmd (46): spindown 3 (Routine)
May  6 20:27:27 Hyperion kernel: mdcmd (47): spindown 4 (Routine)
May  6 20:27:28 Hyperion kernel: mdcmd (48): spindown 5 (Routine)
May  6 20:27:28 Hyperion kernel: mdcmd (49): spindown 6 (Routine)

 

No spinups follow, but if I examine my drives they're all still spinning.  It appears that they're not spinning down properly, even though it's indicated in the logs they're being spun down.  Any idea where I should start looking, or what I can test to try to figure out what is going on?

 

 

Link to comment

A report on 'spurious spindowns', with my apologies for its length.

 

I too have had seemingly random spindowns, some dating back to the version 4 series.  I believe there may be 2 issues, one dealing with a negative reset of the system clock, and the other occurring during parity checks.

 

I can easily illustrate the first with this excerpt from a v4.7 syslog:

Apr 21 13:19:12 JacoBack emhttp: unRAID System Management Utility version 4.7

...

Apr 21 13:19:16 JacoBack ntpd[1765]: frequency initialized -150.733 PPM from /etc/ntp/drift

Apr 21 13:19:16 JacoBack ifplugd(eth0)[1485]: Program executed successfully.

Apr 21 13:19:25 JacoBack ntpd[1765]: synchronized to 198.137.202.16, stratum 2

Apr 21 13:18:55 JacoBack ntpd[1765]: time reset -30.533802 s

Apr 21 13:19:03 JacoBack kernel: mdcmd (26): spindown 0

Apr 21 13:19:04 JacoBack kernel: mdcmd (27): spindown 1

Apr 21 13:19:04 JacoBack kernel: mdcmd (28): spindown 2

Apr 21 13:19:04 JacoBack kernel: mdcmd (29): spindown 3

Apr 21 13:19:04 JacoBack kernel: mdcmd (30): spindown 4

Apr 21 13:19:05 JacoBack kernel: mdcmd (31): spindown 5

Apr 21 13:19:05 JacoBack kernel: mdcmd (32): spindown 6

Apr 21 13:19:05 JacoBack kernel: mdcmd (33): spindown 7

Apr 21 13:19:05 JacoBack kernel: mdcmd (34): spindown 8

Apr 21 13:19:06 JacoBack kernel: mdcmd (35): spindown 9

I had not used the machine in a while, so a 30 second adjustment was needed.  Notice what happens to the timestamps.  I have seen the same behavior other times too, and I believe it is because the spin down conditional logic is not expecting the clock to be before its own timer.  This causes a negative time difference, and this apparently passes the conditional, causing spin downs to proceed.  I would suggest a surrounding conditional to test for time difference greater than zero, so that conditions like this will be skipped until clocks are in sync again.  This only occurs shortly after boot, if at all, and only for short amounts of time, such as the 30 seconds above.  Adding 30 seconds or at worst a few minutes to the spin down delay seems quite harmless.

 

The other issue seems related to the parity check, in my case and the few others reporting it.  In my opinion, it looks like the spin down clock variables may be overwritten/corrupted, but I have no direct evidence.  Perhaps the following will help you.  It's from a 5.0rc1 syslog, but I believe the code base is essentially the same.  I've attached the full syslog, plus a second clean run with no issues, for comparison.

 

I suspect the first set of spindowns below is because of the first issue mentioned above, in that the time stamp is set back one second on the first spin down, creating a negative time difference.  I cannot tell for sure, because the NTP events are no longer being logged (I miss them!).

 

In summary, server completes booting, with array started, at 17:26:58, then at 17:26:57 spins all the drives down (should not have, I did not do it!).  About 6 minutes later (17:33:09) and not knowing the drives were spun down, I started a parity check, during which more spin downs occurred.  I have one drive (Disk 2 on ata5) that on occasion raises a few exceptions, rarely more than three, and usually harmless (can't find anything wrong, can't figure them out, so I live with them).  In this case, you will see a lot of them, and for readability I have removed all but the first line of each.  I suspect that most if not all of them are because it is trying to read from a spun down drive.  Therefore, I don't personally think they are relevant to this issue, just FUD, but I left them for completeness, in case you see a connection.  See the full syslog for complete details.

 

When I discovered the spin downs and performance issues, I then discovered the Webgui was non-responsive, could not find a way to terminate the parity check.  At 17:48:48, I gave up and used Ctl-Alt-Del to call the Powerdown function (took over 3 minutes to finally shut down).  Hope something below is helpful, as I said before, looks to me like an overwritten or incorrectly re-initialized time variable.  Personally, my vote would be for releasing v5.0 (with or without NFS), then fixing this spindown issue and anything else in a 5.0.1, 5.0.2, etc.

 

May  5 17:26:58 JacoBack emhttp_event: disks_mounted

...

May  5 17:26:58 JacoBack emhttp_event: svcs_restarted

May  5 17:26:58 JacoBack kernel: skge 0000:01:04.0: eth0: Link is up at 1000 Mbps, full duplex, flow control both

May  5 17:26:57 JacoBack kernel: mdcmd (47): spindown 0

May  5 17:26:58 JacoBack kernel: mdcmd (48): spindown 1

May  5 17:26:58 JacoBack kernel: mdcmd (49): spindown 2

May  5 17:26:58 JacoBack kernel: mdcmd (50): spindown 3

May  5 17:26:58 JacoBack kernel: mdcmd (51): spindown 4

May  5 17:26:59 JacoBack kernel: mdcmd (52): spindown 5

May  5 17:26:59 JacoBack kernel: mdcmd (53): spindown 6

May  5 17:26:59 JacoBack kernel: mdcmd (54): spindown 7

May  5 17:26:59 JacoBack kernel: mdcmd (55): spindown 8

May  5 17:27:00 JacoBack kernel: mdcmd (56): spindown 9

May  5 17:27:06 JacoBack init: Re-reading inittab

May  5 17:31:21 JacoBack cache_dirs: command args=-w -m 3 -M 3 -a -noleaf -i Vid*, version=1.5r

May  5 17:31:21 JacoBack cache_dirs: max_seconds=3, min_seconds=3, max_depth=9999, command=find -noleaf

May  5 17:31:21 JacoBack cache_dirs: VidLib3,VidLib4,Videos,Videos1,Videos2,Videos3,Videos4,Videos5,Videos6,Videos7,Videos8,Videos9, root_dirs

May  5 17:31:21 JacoBack cache_dirs: cache_dirs process ID 2072 started, To terminate it, type: cache_dirs -q

May  5 17:33:09 JacoBack kernel: mdcmd (57): check CORRECT

May  5 17:33:09 JacoBack kernel: md: recovery thread woken up ...

May  5 17:33:09 JacoBack kernel: md: recovery thread checking parity...

May  5 17:33:10 JacoBack kernel: md: using 1152k window, over a total of 976762552 blocks.

May  5 17:33:43 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:38:47 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:38:49 JacoBack kernel: mdcmd (58): spindown 2

May  5 17:39:20 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:39:54 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:39:56 JacoBack kernel: mdcmd (59): spindown 3

May  5 17:39:56 JacoBack kernel: mdcmd (60): spindown 4

May  5 17:39:57 JacoBack kernel: mdcmd (61): spindown 5

May  5 17:40:06 JacoBack kernel: mdcmd (62): spindown 6

May  5 17:40:06 JacoBack kernel: mdcmd (63): spindown 7

May  5 17:40:06 JacoBack kernel: mdcmd (64): spindown 8

May  5 17:40:07 JacoBack kernel: mdcmd (65): spindown 9

May  5 17:40:37 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:41:11 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:41:49 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:42:03 JacoBack kernel: mdcmd (66): spindown 2

May  5 17:42:34 JacoBack kernel: ata5.00: limiting speed to UDMA/100:PIO4

May  5 17:42:34 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:42:50 JacoBack kernel: mdcmd (67): spindown 3

May  5 17:42:50 JacoBack kernel: mdcmd (68): spindown 4

May  5 17:42:51 JacoBack kernel: mdcmd (69): spindown 5

May  5 17:42:51 JacoBack kernel: mdcmd (70): spindown 6

May  5 17:42:51 JacoBack kernel: mdcmd (71): spindown 7

May  5 17:42:51 JacoBack kernel: mdcmd (72): spindown 8

May  5 17:42:52 JacoBack kernel: mdcmd (73): spindown 9

May  5 17:43:35 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:44:21 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:44:54 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:45:29 JacoBack kernel: ata5.00: limiting speed to UDMA/33:PIO4

May  5 17:45:29 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:46:04 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:46:44 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:47:18 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:48:02 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:48:48 JacoBack logger: Powerdown initiated

May  5 17:48:48 JacoBack logger: Shutting down Samba

May  5 17:48:48 JacoBack logger: Syncing the drives

May  5 17:48:49 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:49:33 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:50:06 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:50:41 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:50:49 JacoBack logger: Killing active pids on the array drives

May  5 17:50:49 JacoBack logger: Umounting the drives

May  5 17:51:26 JacoBack kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

May  5 17:52:01 JacoBack logger: Stopping the Array

May  5 17:52:01 JacoBack kernel: mdcmd (74): stop

May  5 17:52:02 JacoBack kernel: md: md_do_sync: got signal, exit...

May  5 17:52:03 JacoBack kernel: md: recovery thread sync completion status: -4

May  5 17:52:03 JacoBack kernel: md1: stopping

May  5 17:52:03 JacoBack kernel: md2: stopping

May  5 17:52:03 JacoBack kernel: md3: stopping

May  5 17:52:03 JacoBack kernel: md4: stopping

May  5 17:52:03 JacoBack kernel: md5: stopping

May  5 17:52:03 JacoBack kernel: md6: stopping

May  5 17:52:03 JacoBack kernel: md7: stopping

May  5 17:52:03 JacoBack kernel: md8: stopping

May  5 17:52:03 JacoBack kernel: md9: stopping

May  5 17:52:03 JacoBack logger: Saving current syslog to /boot/logs/syslog.txt

syslog_RobJ_2012-05-05_1st50rc1.zip

Link to comment

After wondering about my spindown issue above, I decided I would try a throughput test.

 

I have a Realtek RTL8111E in my MSI H61M-E33 (Sandy Bridge Core i3) motherboard.  I was able to achieve about a 40MBps read speed, and a 35MBps write speed, which are both about as high as I've ever seen this hardware perform.  So at least for my particular chipset, the included Realtek drivers are working well.

Link to comment

Ok, issues with Realktek incoming!

 

I took a look at your syslog.  It looks to me that your Realtek performed without issue for quite awhile, until May 6 19:05:45, and then began spewing out link up messages.  That looks to me as if either the driver or firmware had crashed, which would probably result in terrible performance!  There is no direct evidence about performance speed here, except the link up messages imply lost time with each, and therefore slowdowns.  When you recorded the extremely slow network speed, was that before or after 19:05:45?  Try testing again, and see if a freshly booted server is just as slow.

 

You have an intriguing line in your syslog:

May  6 10:21:58 Tower kernel: r8169 0000:02:00.0: eth0: unable to load firmware patch rtl_nic/rtl8168d-2.fw (-2)

Perhaps you or someone else with more time can research this, and determine if there is a firmware patch for your chipset, and how it would be applied, if that is even possible.  Personally, I think you may be better off adding a PCIe network card to your system, and disabling the Realtek.

 

I noticed 2 other things in your syslog.  One was a minor kernel 'crash', look for the '--[ cut here ]--' 5 minutes after the parity check starts.  Seems harmless, but definitely not normal.  The other is that you have something added to your system that is spewing out useless and repeated syslog messages once every minute.  That's 60 an hour, 1440 a day.  Harmless but annoying.

 

Edit:  I should give credit to pantner, I just realized that I basically restated in different words what he had already said!

Link to comment

I have disabled all the addons, because I was seeing errors while running Parity sync, and once it does that it don't give options to change anything, it just stops, and you don't know until F5 is hit. It was stopping somewhere after 6%, after removing all addons, it went to atleast 20%. What can I do to correct this problem?

I am thinking it may have something to do with Plex, as I see a lot of info coming up on disc 5. (That is where I HAD Plex. ) I have long sense gotten rid of Plex, but I do not seem to have the correct permissions to delete some 2,000+ files and the master folder.

log_may_6.zip

Link to comment

I have disabled all the addons, because I was seeing errors while running Parity sync, and once it does that it don't give options to change anything, it just stops, and you don't know until F5 is hit. It was stopping somewhere after 6%, after removing all addons, it went to atleast 20%. What can I do to correct this problem?

I am thinking it may have something to do with Plex, as I see a lot of info coming up on disc 5. (That is where I HAD Plex. ) I have long sense gotten rid of Plex, but I do not seem to have the correct permissions to delete some 2,000+ files and the master folder.

 

have you looked through your syslog? Do you have any USB Drives?

 

May  6 22:31:03 The_Matrix kernel: usb 2-1.1: USB disconnect, device number 89
May  6 22:31:04 The_Matrix kernel: usb 2-1.1: new low speed USB device number 90 using ehci_hcd
May  6 22:31:04 The_Matrix kernel: generic-usb 0003:0764:0501.01CD: hiddev0,hidraw2: USB HID v1.10 Device [CPS CP550HG] on usb-0000:00:1d.0-1.1/input0
May  6 22:31:25 The_Matrix kernel: usb 2-1.1: USB disconnect, device number 90
May  6 22:31:26 The_Matrix kernel: usb 2-1.1: new low speed USB device number 91 using ehci_hcd
May  6 22:31:26 The_Matrix kernel: generic-usb 0003:0764:0501.01CE: hiddev0,hidraw2: USB HID v1.10 Device [CPS CP550HG] on usb-0000:00:1d.0-1.1/input0
May  6 22:31:47 The_Matrix kernel: usb 2-1.1: USB disconnect, device number 91
May  6 22:31:48 The_Matrix kernel: usb 2-1.1: new low speed USB device number 92 using ehci_hcd
May  6 22:31:48 The_Matrix kernel: generic-usb 0003:0764:0501.01CF: hiddev0,hidraw2: USB HID v1.10 Device [CPS CP550HG] on usb-0000:00:1d.0-1.1/input0
May  6 22:32:09 The_Matrix kernel: usb 2-1.1: USB disconnect, device number 92
May  6 22:32:10 The_Matrix kernel: usb 2-1.1: new low speed USB device number 93 using ehci_hcd
May  6 22:32:10 The_Matrix kernel: generic-usb 0003:0764:0501.01D0: hiddev0,hidraw2: USB HID v1.10 Device [CPS CP550HG] on usb-0000:00:1d.0-1.1/input0
May  6 22:32:31 The_Matrix kernel: usb 2-1.1: USB disconnect, device number 93
May  6 22:32:32 The_Matrix kernel: usb 2-1.1: new low speed USB device number 94 using ehci_hcd
May  6 22:32:32 The_Matrix kernel: generic-usb 0003:0764:0501.01D1: hiddev0,hidraw2: USB HID v1.10 Device [CPS CP550HG] on usb-0000:00:1d.0-1.1/input0
May  6 22:32:53 The_Matrix kernel: usb 2-1.1: USB disconnect, device number 94
May  6 22:32:54 The_Matrix kernel: usb 2-1.1: new low speed USB device number 95 using ehci_hcd
May  6 22:32:54 The_Matrix kernel: generic-usb 0003:0764:0501.01D2: hiddev0,hidraw2: USB HID v1.10 Device [CPS CP550HG] on usb-0000:00:1d.0-1.1/input0
May  6 22:33:15 The_Matrix kernel: usb 2-1.1: USB disconnect, device number 95
May  6 22:33:16 The_Matrix kernel: usb 2-1.1: new low speed USB device number 96 using ehci_hcd
May  6 22:33:16 The_Matrix kernel: generic-usb 0003:0764:0501.01D3: hiddev0,hidraw2: USB HID v1.10 Device [CPS CP550HG] on usb-0000:00:1d.0-1.1/input0
May  6 22:33:37 The_Matrix kernel: usb 2-1.1: USB disconnect, device number 96
May  6 22:33:38 The_Matrix kernel: usb 2-1.1: new low speed USB device number 97 using ehci_hcd
May  6 22:33:38 The_Matrix kernel: generic-usb 0003:0764:0501.01D4: hiddev0,hidraw2: USB HID v1.10 Device [CPS CP550HG] on usb-0000:00:1d.0-1.1/input0
May  6 22:33:59 The_Matrix kernel: usb 2-1.1: USB disconnect, device number 97
May  6 22:34:00 The_Matrix kernel: usb 2-1.1: new low speed USB device number 98 using ehci_hcd
May  6 22:34:00 The_Matrix kernel: generic-usb 0003:0764:0501.01D5: hiddev0,hidraw2: USB HID v1.10 Device [CPS CP550HG] on usb-0000:00:1d.0-1.1/input0

 

repeats quite a lot.

 

You also may have a dieing disk

 

May  6 22:34:40 The_Matrix kernel: md: disk0 write error
May  6 22:34:40 The_Matrix kernel: handle_stripe write error: 1180213968/0, count: 1
May  6 22:34:40 The_Matrix kernel: md: disk0 write error
May  6 22:34:40 The_Matrix kernel: handle_stripe write error: 1180213976/0, count: 1
May  6 22:34:40 The_Matrix kernel: md: disk0 write error
May  6 22:34:40 The_Matrix kernel: handle_stripe write error: 1180213984/0, count: 1
May  6 22:34:40 The_Matrix kernel: md: disk0 write error
May  6 22:34:40 The_Matrix kernel: handle_stripe write error: 1180213992/0, count: 1
May  6 22:34:40 The_Matrix kernel: md: disk0 write error
May  6 22:34:40 The_Matrix kernel: handle_stripe write error: 1180214000/0, count: 1
May  6 22:34:40 The_Matrix kernel: md: disk0 write error
May  6 22:34:40 The_Matrix kernel: handle_stripe write error: 1180214008/0, count: 1
May  6 22:34:40 The_Matrix kernel: md: disk0 write error

 

though, if it is USB, it could be explained by the first error.

 

Have you had any of your disks reball? Can you run smart checks on them? If you do have USB Drives, make sure all the cables are secure and that the power (if any) is working correctly.

Almost looks like a USB Drive is starting to spin up, is either faulty and dies or doesn't have enough power to start correctly...then tries again, and again, and again.

Link to comment

You have an intriguing line in your syslog:

May  6 10:21:58 Tower kernel: r8169 0000:02:00.0: eth0: unable to load firmware patch rtl_nic/rtl8168d-2.fw (-2)

I got exactly the same line in my syslog. Have seen this message also in previous unRAID versions (though can't remember from which version it started to appear).

 

Seems to me that a patch is available in the code which doesn't apply to this hardware. More an informative message than an error.

 

Anyway in my case the Realtek chip is working without issues (read: slowdowns). I am using SMB not AFP though.

Link to comment

I have never noticed it do a parity check in the past while formatting a new drive. perhaps i just don't add them often enough to remember that.

and yes, it was precleared.

 

as far as the spin-downs, i would not worry to much unless it has been reported by others.

it is just another log to look at.

 

Thanks for getting back to me.

 

PS. speeds are pretty decent.

Write directly to array @ about 45MB/s

Write to SSD cache @ about 115MB/s

Read from array, 70-90MB/s

(Using a 20GB ISO. All number are slower for small clusters of files)

Parity check, about 100MB/s @ 25% complete (slower then last check with b12 but i did add a drive to expander. I might have lost bandwidth)

Edit: not sure what happened. parity check sped up to 110-122MB/s. something must have been running that I didnt see.

 

I also noticed spindowns during a full parity check. Am also running some plugins though so I cannot be sure where it comes from. Worthwhile to notice as an "orange". Even if true it does not damage anything, parity check takes a bit longer.

Link to comment

I started a parity check last night and upon start the "spin down" messages begin immediately to appear with intervals between 2 and 15 minutes. This went on for almost an hour.

May  6 22:59:10 vesta kernel: mdcmd (53): check CORRECT
May  6 22:59:10 vesta kernel: md: recovery thread woken up ...
May  6 22:59:10 vesta kernel: md: recovery thread checking parity...
May  6 22:59:10 vesta kernel: md: using 12288k window, over a total of 1953514552 blocks.
May  6 22:59:21 vesta kernel: mdcmd (54): spindown 1
May  6 22:59:22 vesta kernel: mdcmd (55): spindown 2
May  6 22:59:22 vesta kernel: mdcmd (56): spindown 3
May  6 22:59:23 vesta kernel: mdcmd (57): spindown 4
May  6 22:59:24 vesta kernel: mdcmd (58): spindown 5
May  6 22:59:25 vesta kernel: mdcmd (59): spindown 6
May  6 22:59:25 vesta kernel: mdcmd (60): spindown 7
May  6 23:01:18 vesta kernel: mdcmd (61): spindown 0
May  6 23:01:19 vesta kernel: mdcmd (62): spindown 1
May  6 23:01:20 vesta kernel: mdcmd (63): spindown 2
May  6 23:01:21 vesta kernel: mdcmd (64): spindown 3
May  6 23:01:21 vesta kernel: mdcmd (65): spindown 4
May  6 23:01:22 vesta kernel: mdcmd (66): spindown 5
May  6 23:01:23 vesta kernel: mdcmd (67): spindown 6
May  6 23:01:23 vesta kernel: mdcmd (68): spindown 7
May  6 23:01:25 vesta kernel: mdcmd (69): spindown 5
May  6 23:01:26 vesta kernel: mdcmd (70): spindown 6
May  6 23:01:26 vesta kernel: mdcmd (71): spindown 7
...
...
...
May  6 23:52:11 vesta kernel: mdcmd (196): spindown 4
May  6 23:52:12 vesta kernel: mdcmd (197): spindown 5
May  6 23:52:13 vesta kernel: mdcmd (198): spindown 6
May  6 23:52:13 vesta kernel: mdcmd (199): spindown 7
May  6 23:52:15 vesta kernel: mdcmd (200): spindown 5
May  6 23:52:15 vesta kernel: mdcmd (201): spindown 6
May  6 23:52:16 vesta kernel: mdcmd (202): spindown 7
May  6 23:57:22 vesta kernel: mdcmd (203): spindown 4
May  6 23:57:23 vesta kernel: mdcmd (204): spindown 5
May  6 23:57:24 vesta kernel: mdcmd (205): spindown 6
May  6 23:57:24 vesta kernel: mdcmd (206): spindown 7
May  6 23:57:25 vesta kernel: mdcmd (207): spindown 5
May  6 23:57:26 vesta kernel: mdcmd (208): spindown 6
May  6 23:57:27 vesta kernel: mdcmd (209): spindown 7
May  7 06:19:28 vesta kernel: md: sync done. time=26418sec
May  7 06:19:28 vesta kernel: md: recovery thread sync completion status: 0
May  7 06:19:28 vesta kernel: mdcmd (210): spindown 5
May  7 06:19:28 vesta kernel: mdcmd (211): spindown 6
May  7 06:19:29 vesta kernel: mdcmd (212): spindown 7

After the parity check was completed three out of the seven disks spun down immediately (it appears the inactivity timer is not taken into consideration, it is set to 2 hours).

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.