unRAID Server Release 5.0-rc2 Available


Recommended Posts

Ok, issues with Realktek incoming!

 

I took a look at your syslog.  It looks to me that your Realtek performed without issue for quite awhile, until May 6 19:05:45, and then began spewing out link up messages.  That looks to me as if either the driver or firmware had crashed, which would probably result in terrible performance!  There is no direct evidence about performance speed here, except the link up messages imply lost time with each, and therefore slowdowns.  When you recorded the extremely slow network speed, was that before or after 19:05:45?  Try testing again, and see if a freshly booted server is just as slow.

 

You have an intriguing line in your syslog:

May  6 10:21:58 Tower kernel: r8169 0000:02:00.0: eth0: unable to load firmware patch rtl_nic/rtl8168d-2.fw (-2)

Perhaps you or someone else with more time can research this, and determine if there is a firmware patch for your chipset, and how it would be applied, if that is even possible.  Personally, I think you may be better off adding a PCIe network card to your system, and disabling the Realtek.

 

I noticed 2 other things in your syslog.  One was a minor kernel 'crash', look for the '--[ cut here ]--' 5 minutes after the parity check starts.  Seems harmless, but definitely not normal.  The other is that you have something added to your system that is spewing out useless and repeated syslog messages once every minute.  That's 60 an hour, 1440 a day.  Harmless but annoying.

 

Edit:  I should give credit to pantner, I just realized that I basically restated in different words what he had already said!

 

I only started transferring files once those errors started.

 

I've disabled all the things I had running (inc additions by unmenu) but unmenu is still running so I can get at the syslog easily.

 

The network card crashes still happen.

 

This is the following behaviour:

 

1. Transfers ok until crash @ 50MB/sec (wired) or @ 12MB/sec (wireless).

2. Error occurs, rate drops to zero and from there on in it alternates between 0 and max speed (but not max speed for long, so average transfer rate is <3 MB/sec).

 

Clearly, not working properly.

 

As for dropping the card and getting a NIC, there are two reasons why this shouldn't be the fix:

 

1. There are many Realktek based network mobs on the compatibility list and many motherboards on sale today still use it. It isn't acceptable to have a product not work properly with a fairly popular network chip.

2. Added power, cost and loss of a PCI-E slot.

 

The kernel crash is the CPU, which I have ordered another one. This one tests for days at full load in windows but linux doesn't like it, never had a parity check error so it is clearly not in the bit of the CPU that matters, but I'm going to ditch it anyway.

 

Just to confirm, with previous betas I have managed to get 110MB/sec *constant* writes direct to the cache drive via AFP (can't remember which unfortunately, b8?, so unless the network adapter has given up on me I think it is a driver having issues!)

Link to comment
  • Replies 158
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

I don't know whether this provides any clues, but here is a sequence of events on my Ubuntu desktop machine, after I encountered a 'Stale nfs file handle':

 

While I was browsing around my Photos share, Nautilus became unresponsive (but there was no error message).

 

I used telnet to connect to my unRAID server and performed 'ls -l' on both the Photos directory and the 'user' parent:

root@Tower:~# ls -l /mnt/user/Photos
total 161248
drwxrwx--- 1 nobody users      4360 2011-06-20 01:11 100OLYMP1/
drwxrwx--- 1 nobody users       496 2003-10-26 15:42 100OLYMP2/
drwxrwx--- 1 nobody users       120 2012-01-27 18:00 101029/
drwxr-xr-x 1 nobody users        72 2012-01-27 14:00 110324/
drwxr-xr-x 1 nobody users        72 2012-01-27 14:00 110830/
drwxr-xr-x 1 nobody users        72 2012-01-27 14:00 111004/
drwxr-xr-x 1 nobody users        72 2012-01-27 14:00 111007/
drwxr-xr-x 1 nobody users       336 2011-10-17 20:07 111007YNL/
drwxrwxr-x 1 nobody users       592 2011-10-25 00:29 111022Baking/
drwxrwxr-x 1 nobody users        96 2012-01-27 18:00 111106/
drwxrwxr-x 1 nobody users       136 2012-01-27 14:00 111107_MYF_CrocPark/
drwxrwxr-x 1 nobody users       104 2012-01-27 14:00 111209_Bukidnon/
drwxrwxr-x 1 nobody users       136 2012-01-27 18:00 120108_CDO/
drwxrwxr-x 1 nobody users        48 2012-05-07 18:20 120507/
drwx------ 1 nobody users       720 2012-01-27 14:00 Ai-Ai\ Graduation/
drwxr-xr-x 1 nobody users      2224 2012-01-27 14:00 Import/
drwxr-xr-x 1 nobody users       240 2012-01-27 14:00 Methodist\ Youth\ Surigao/
drwxr-xr-x 1 nobody users       208 2011-06-20 01:11 Ruby\ in\ UK/
-rw-r--r-- 1 nobody users   3072000 2012-01-27 16:06 digikam4.db
-rw-r--r-- 1 nobody users 161874944 2012-01-27 16:06 thumbnails-digikam.db
root@Tower:~# ls -l /mnt/user
total 13024031
drwxr-xr-x 1 nobody              users         48 2011-10-17 19:24 111007YNL/
drwxrwx--- 1 nobody              users        424 2010-09-08 21:42 Athlon/
drwxr-xr-x 1 nobody              users      20352 2011-12-08 08:30 Downloaded\ Files/
-rw-rw---- 1 nobody              users 6542697514 2010-02-05 18:20 LoveStory_DVD.mkv
drwxrwx--- 1 nobody              users        128 2012-03-25 08:01 Maildir/
drwxrwx--- 1 nobody              users       6912 2012-05-06 16:14 Movies/
drwxrwx--- 1 nobody              users        384 2012-04-08 18:00 Music/
-rw-rw---- 1 nobody              users 2088899096 2010-02-07 01:20 NOTTING\ HILL.mkv
-rw-rw---- 1 nobody              users 4691562496 2010-06-13 21:05 National\ Treasure\ 2004\ 720p.avi
drwxrwxr-x 1 nobody              users        504 2011-12-27 08:50 Pete's\ N97/
drwxrwxr-x 1 nobody              users         72 2012-05-07 18:20 Photos/
drwxrwx--- 1 nobody              users        296 2011-09-14 08:08 Series/
drwxrwx--- 1 nobody              users        176 2012-04-08 07:55 Squeeze/
drwxr-xr-x 1 logitechmediaserver users       1392 2012-04-08 16:23 Squeeze-7.7.2/
drwxr-xr-x 1 root                root         480 2012-03-30 12:32 Temporary/
drwxrwxr-x 1 nobody              users        520 2011-11-12 19:58 UMC/
drwxrwx--- 1 nobody              users        600 2010-09-04 20:22 UMC2.07/
drwxrwx--- 1 nobody              users        600 2010-10-16 09:13 UMC2.08.1/
drwxrwxrwx 1 nobody              users        600 2011-06-07 10:42 UMC2.08.1x/
drwxr-xr-x 1 nobody              users        496 2011-09-13 08:41 UMC2.10/
drwxrwxr-x 1 nobody              users        584 2012-02-15 22:54 UMC2.11/
drwxrwx--- 1 nobody              users        696 2011-06-06 08:34 UMCold/
drwxrwx--- 1 nobody              users        504 2012-01-02 09:01 Videos/
drwxrwxr-x 1 nobody              users        168 2011-11-14 10:32 Wii/
drwxrwx--- 1 nobody              users        576 2011-10-18 13:54 Work/
drwxr-xr-x 1 root                root         504 2012-04-08 22:45 XFarG7/
drwxrwx--- 1 nobody              users         72 2010-10-29 02:18 ZA30/
drwxrwx--- 1 nobody              users        864 2010-10-29 01:21 ZK10/
drwxrwx--- 1 nobody              users        120 2010-10-25 15:36 ZVideo/
drwxrwx--- 1 nobody              users        184 2010-10-29 02:14 ZVideo2/
drwxrwx--- 1 nobody              users         72 2010-10-19 23:43 mediaserver/
drwxrwx--- 1 nobody              users        112 2011-02-26 13:12 mp3/
-rw-r--r-- 1 root                root       18020 2012-03-30 12:32 nolimetangere.odt
-rw-r--r-- 1 root                root      342926 2012-03-30 12:30 output.pdf
-rw-r--r-- 1 nobody              users      30312 2012-01-11 00:35 pro.odt
drwxrwx--- 1 nobody              users         80 2011-09-15 07:19 series/
root@Tower:~# 

 

I then looked at the same directories from Ubuntu:

peter@desktop:~$ ls -l /net/tower/mnt/user
ls: cannot access /net/tower/mnt/user/Photos: Stale NFS file handle
total 0
drwxr-xr-x 2 root root 0 May  7 18:19 Movies
drwxr-xr-x 2 root root 0 May  7 18:19 Music
d??? ? ?    ?    ?            ? Photos
drwxr-xr-x 2 root root 0 May  7 18:19 series
drwxr-xr-x 2 root root 0 May  7 18:19 Series
drwxr-xr-x 2 root root 0 May  7 18:19 UMC
drwxr-xr-x 2 root root 0 May  7 18:19 Videos
peter@desktop:~$ ls -l /net/tower/mnt/user/Photos
ls: cannot access /net/tower/mnt/user/Photos: Stale NFS file handle
peter@desktop:~$ sudo umount -f /net/tower/mnt/user/Photos
[sudo] password for peter: 
peter@desktop:~$ ls -l /net/tower/mnt/user
total 0
drwxr-xr-x 2 root root   0 May  7 18:19 Movies
drwxr-xr-x 2 root root   0 May  7 18:19 Music
drwxrwxr-x 1   99 users 72 May  7 18:20 Photos
drwxr-xr-x 2 root root   0 May  7 18:19 series
drwxr-xr-x 2 root root   0 May  7 18:19 Series
drwxr-xr-x 2 root root   0 May  7 18:19 UMC
drwxr-xr-x 2 root root   0 May  7 18:19 Videos
peter@desktop:~$ ls -l /net/tower/mnt/user
total 8
drwxrwx--- 1   99 users 6912 May  6 16:14 Movies
drwxrwx--- 1   99 users  384 Apr  8 18:00 Music
drwxrwxr-x 1   99 users   72 May  7 18:20 Photos
drwxrwx--- 1   99 users  296 Sep 14  2011 series
drwxr-xr-x 2 root root     0 May  7 18:19 Series
drwxrwxr-x 1   99 users  520 Nov 12 19:58 UMC
drwxrwx--- 1   99 users  504 Jan  2 09:01 Videos
peter@desktop:~$ 

 

I use autofs to mount nfs shares automatically, hence I don't have to issue the mount command.  Between the last two 'ls -l /net/tower/mnt/user' I had opened the Photos share in Nautilus - note that ownership of most folders has changed from 'root' to '99'.

 

Here is the line showing details of the 'Photos' share from the output of 'mount' from Ubuntu.

tower:/mnt/user/Photos on /net/tower/mnt/user/Photos type nfs (rw,nosuid,nodev,vers=3,hard,intr,nolock,udp,sloppy,addr=10.2.0.100)

Link to comment

I use autofs to mount nfs shares automatically, hence I don't have to issue the mount command

 

Can you replicate the problem when they're permanently mounted? i.e outwith autofs.

 

Autofs / automount can introduce an entire plethora of problems by themselves.

Link to comment

RC2 installed; no showstoppers. 

 

With 5.0b14, I was having an issue that disks would not stay in a spin down condition.  Shares were exported via AFP only; disks were exported via SMB only. 

 

After RC2 install, I returned to my desired configuration (Shares via AFP, disks via SMB) and the spin down issue has NOT returned as yet.  I had thought this was the fault of my Mac computers (NetBios polling SMB drives periodically, causing them to wake), but I guess it was an issue with the previous SMB implementations.  Whatever the problem, seems to have been fixed.

 

Still having an issue with spindown under RC2...turned off SMB completely at unRAID server, disks still won't spindown appropriately.  Even if I force them to spindown, they spin back up.  Curiously, it is only the first drive in every share.  For instance, I have two drives in the TV Shows share...only the first one won't spindown. 

 

Seems like the issue is with AFP.  I run a Mac Mini w/Lion Server in my basement that is 24/7 connected to the drives via AFP.  Perhaps there is some kind of periodic refresh query from AFP that goes out to the shares and wakes up the drives?  I don't know.  Not sure if this is something that can be addressed within unRAID at all, or if I need to alter configurations for my network.  It may just be the price I pay (literally, in terms of drive wear and energy usage) for using AFP.  AFP is so much easier than SMB - my preference is to stick with it.

 

Should not be a roadblock to a 5.0 final, but something for Limetech to take a look at in a subsequent release.

 

Phil

Link to comment

May  7 13:13:36 nas kernel: mdcmd (594): spindown 2

May  7 13:13:50 nas kernel: mdcmd (595): spindown 5

May  7 13:13:51 nas kernel: mdcmd (596): spindown 4

May  7 13:13:53 nas kernel: mdcmd (597): spindown 1

May  7 13:27:25 nas kernel: mdcmd (598): spindown 0

May  7 13:28:08 nas kernel: mdcmd (599): spindown 0

May  7 13:28:08 nas kernel: mdcmd (600): spindown 2

May  7 13:28:26 nas kernel: mdcmd (601): spindown 1

May  7 13:28:27 nas kernel: mdcmd (602): spindown 3

May  7 13:28:28 nas kernel: mdcmd (603): spindown 4

May  7 13:28:36 nas kernel: mdcmd (604): spindown 2

May  7 13:29:27 nas kernel: mdcmd (605): spindown 0

May  7 13:29:27 nas kernel: mdcmd (606): spindown 3

May  7 13:30:33 nas kernel: mdcmd (607): spindown 0

May  7 13:30:33 nas kernel: mdcmd (608): spindown 3

May  7 13:30:50 nas kernel: mdcmd (609): spindown 0

May  7 13:30:50 nas kernel: mdcmd (610): spindown 3

May  7 13:31:04 nas kernel: mdcmd (611): spindown 3

May  7 13:31:49 nas kernel: mdcmd (612): spindown 0

May  7 13:31:50 nas kernel: mdcmd (613): spindown 3

May  7 14:13:52 nas kernel: mdcmd (614): spindown 4

May  7 14:13:53 nas kernel: mdcmd (615): spindown 5

May  7 14:13:56 nas kernel: mdcmd (616): spindown 1

May  7 14:28:04 nas kernel: mdcmd (617): spindown 5

May  7 14:28:20 nas kernel: mdcmd (618): spindown 3

May  7 14:31:04 nas kernel: mdcmd (619): spindown 0

May  7 14:31:05 nas kernel: mdcmd (620): spindown 3

May  7 14:42:08 nas kernel: mdcmd (621): spindown 3

May  7 15:13:33 nas kernel: mdcmd (622): spindown 2

May  7 15:13:43 nas kernel: mdcmd (623): spindown 4

May  7 15:13:44 nas kernel: mdcmd (624): spindown 5

May  7 15:13:47 nas kernel: mdcmd (625): spindown 1

May  7 15:27:10 nas kernel: mdcmd (626): spindown 0

May  7 15:28:15 nas kernel: mdcmd (627): spindown 3

 

My spindown time is set to 30 min, so even if one process is wacking up some drives why do they spindown again15 min after the last one???

Link to comment

Hmm... I think I'm having intermittent NFS issues with RC2 that were not present under RC1.  Some folders appear empty after awhile but when I mount the specific disk a suspect folder resides on it's fully intact and populated.

 

My syslog shows the following errors after a time with the share volume mounted on an OS X machine (the same NFS errors that plagued me under betas >b12):

 

May  7 07:27:44 UnRAID rpc.statd[1203]: No canonical hostname found for 10.0.1.200
May  7 07:27:44 UnRAID rpc.statd[1203]: STAT_FAIL to UnRAID for SM_MON of 10.0.1.200
May  7 07:27:44 UnRAID kernel: lockd: cannot monitor The Matrix
May  7 07:27:44 UnRAID rpc.statd[1203]: No canonical hostname found for 10.0.1.200

Link to comment

Hmm... I think I'm having intermittent NFS issues with RC2 that were not present under RC1.  Some folders appear empty after awhile but when I mount the specific disk a suspect folder resides on it's fully intact and populated.

 

My syslog shows the following errors after a time with the share volume mounted on an OS X machine (the same NFS errors that plagued me under betas >b12):

 

May  7 07:27:44 UnRAID rpc.statd[1203]: No canonical hostname found for 10.0.1.200
May  7 07:27:44 UnRAID rpc.statd[1203]: STAT_FAIL to UnRAID for SM_MON of 10.0.1.200
May  7 07:27:44 UnRAID kernel: lockd: cannot monitor The Matrix
May  7 07:27:44 UnRAID rpc.statd[1203]: No canonical hostname found for 10.0.1.200

 

Probably best to post a syslog & state if you are running any addons; otherwise Tom isn't likely to get back to you in a hurry  ;D

Link to comment

Hi BRiT,

 

*** Update ***

 

I just found updated documentation for RC2 at http://lime-technology.com/wiki/index.php?title=Installing_unRAID_5.0_on_a_full_Slackware_Distro. I just have to find a way to get the latest version of slackware and I'll try this.

 

****

 

I've installed 5.0-rc2 on ESXi 5.0 on a board that don't support VT-d. As such, I use RDM to pass my disks to the VM. This used to work fine back on 5.0-b2 but if I remember well, I had to recompile the kernel with the VMware paravirtual driver.

 

Right now, using the LSI SAS driver in ESXi, I'm seeing the drive in Unraid but without any label/path in id-path/id-label. As such, Unraid seems unable to make sure the drive are the same so I can't assign them. Using the paravirtual driver, I see no drives which is not good.

 

Following your post on this:

 

For those looking to install this into a full Slackware distro or do their own custom development, the command to unbzroot is now:

 

xzcat bzroot | cpio -m -i -d -H newc --no-absolute-filenames

 

 

I've looked at the wiki on this subject and the steps to compile a new kernel use Slackware as the basis. The problem I have now are 1) Slackware website is not responding at all and 2) the latest and greatest kernel in Slackware is 2.6.xx which is far from 3.0.30.

 

Can you help me (and probably others) about how you proceed to compile this latest release of Unraid at this moment? Is there a simple recipe to follow that is easier then modifying at each step the procedure on the wiki?

 

Thank you.

 

ehfortin

Link to comment

Quick question for all those running rc2, Are any of you running an "Atheros AR813X/AR815X" NIC or a "Realtek® 8111F" NIC?

 

Just looking at a new unRAID build and need to ensure I get a board with a supported NIC. from this webpage http://greenleaf-technology-hwandsw.blogspot.co.nz/2011/08/search-for-new-budget-board-continues.html it looks like 5.0b10 was flawless on the Atheros AR813X/AR815X Chip.

 

My two boards that I'm looking at are the Gigabyte GA-Z77MX-D3H or Asus P8Z77-M PRO.

 

thanks and look forward to a reply.

Link to comment

I'm still having mover issues  :(

 

I manually copied everything from my cache drive to the relevant disks and that seemed to stop unRAID hanging when mover runs, but something is still not right... sys log attached.

 

The last entry in the system log was ten minutes ago and mover is still apparently running, but nothing is actually 'moving' off the cache drive  ???

sys_log.txt

Link to comment

now i have these two entires...

 

May 8 01:28:36 unRAID kernel: INFO: rcu_sched_state detected stall on CPU 0 (t=6000 jiffies)
May 8 01:31:37 unRAID kernel: INFO: rcu_sched_state detected stall on CPU 0 (t=24030 jiffies)

 

and the processor is running at 50% because of a 'system process'.

 

No idea what is happening??  :(

Link to comment

...

You have an intriguing line in your syslog:

May  6 10:21:58 Tower kernel: r8169 0000:02:00.0: eth0: unable to load firmware patch rtl_nic/rtl8168d-2.fw (-2)

 

Chasing this down here's what I discovered.  There are some classes of "firmware" (pre-compiled binaries) that are no longer kept with the rest of the linux source tree due to "licensing issues".  The Realtek f/w falls in this class.  So I had to chase down where this is kept now and found it here: http://git.kernel.org/?p=linux/kernel/git/dwmw2/linux-firmware.git;a=tree

 

This firmware is incorporated into upcoming -rc3.  Probably this will fix some some issues, so... please hold off on further Realtek NIC reports until -rc3.

Link to comment

As for dropping the card and getting a NIC, there are two reasons why this shouldn't be the fix:

 

1. There are many Realktek based network mobs on the compatibility list and many motherboards on sale today still use it. It isn't acceptable to have a product not work properly with a fairly popular network chip.

2. Added power, cost and loss of a PCI-E slot.

 

Just to confirm, with previous betas I have managed to get 110MB/sec *constant* writes direct to the cache drive via AFP (can't remember which unfortunately, b8?, so unless the network adapter has given up on me I think it is a driver having issues!)

 

I do agree with your 2 points.  The question I have though is whether your particular Realtek chipset is faulty, or is a Realtek variant that is not supported well by the current Realtek driver/module, or just needs the firmware patch!  Hoping for success with -rc3...

 

I completely forgot (typical of me!) to mention one other thing in your syslog.  You too had spurious spindowns during the parity check.

May  6 10:29:57 Tower kernel: mdcmd (23): check CORRECT

May  6 10:29:57 Tower kernel: md: recovery thread woken up ...

May  6 10:29:57 Tower kernel: md: recovery thread checking parity...

May  6 10:29:57 Tower kernel: md: using 10000k window, over a total of 1953514552 blocks.

...

May  6 10:35:50 Tower kernel: mdcmd (25): spindown 1

May  6 10:35:51 Tower kernel: mdcmd (26): spindown 2

May  6 10:35:52 Tower kernel: mdcmd (27): spindown 3

May  6 10:35:52 Tower kernel: mdcmd (28): spindown 4

May  6 10:35:54 Tower kernel: mdcmd (29): spindown 2

May  6 10:35:57 Tower kernel: mdcmd (30): spindown 3

May  6 10:35:57 Tower kernel: mdcmd (31): spindown 4

Link to comment

...

You have an intriguing line in your syslog:

May  6 10:21:58 Tower kernel: r8169 0000:02:00.0: eth0: unable to load firmware patch rtl_nic/rtl8168d-2.fw (-2)

 

Chasing this down here's what I discovered.  There are some classes of "firmware" (pre-compiled binaries) that are no longer kept with the rest of the linux source tree due to "licensing issues".  The Realtek f/w falls in this class.  So I had to chase down where this is kept now and found it here: http://git.kernel.org/?p=linux/kernel/git/dwmw2/linux-firmware.git;a=tree

 

This firmware is incorporated into upcoming -rc3.  Probably this will fix some some issues, so... please hold off on further Realtek NIC reports until -rc3.

 

Sorry to hijack this response but would you know if either of these two NIC chipsets are supported with the current kernel/drivers?

 

Atheros AR813X/AR815X

Realtek 8111F

 

 

Link to comment

Hi,

 

To get the kernel version that unRAID 5.0 RC2 uses, get it from the official Linux kernel site, Kernel.org.

 

wget http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.0.30.tar.bz2

 

Thank you. I found it in the documentation as well. What I didn't found however is how to recreate a bzroot from what have been compiled and tested in Slackware (I'm using Salix 13.37 as Slackware still doesn't answer...)? Do you have this documented somewhere?

 

Thanks.

 

ehfortin

Link to comment

I have disabled all the addons, because I was seeing errors while running Parity sync, and once it does that it don't give options to change anything, it just stops, and you don't know until F5 is hit. It was stopping somewhere after 6%, after removing all addons, it went to atleast 20%. What can I do to correct this problem?

I am thinking it may have something to do with Plex, as I see a lot of info coming up on disc 5. (That is where I HAD Plex. ) I have long sense gotten rid of Plex, but I do not seem to have the correct permissions to delete some 2,000+ files and the master folder.

 

In addition to pantner's comments, here's what I noticed concerning your parity drive: (this is about 2.5 hours after the parity build began)

May  6 22:34:18 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 1922:find reserved error, why?

May  6 22:34:18 The_Matrix kernel: sas: sas_ata_task_done: SAS error 2

May  6 22:34:18 The_Matrix kernel: sas: Enter sas_scsi_recover_host

May  6 22:34:18 The_Matrix kernel: ata12: sas eh calling libata cmd error handler

May  6 22:34:21 The_Matrix kernel: ata12: sas eh calling libata port error handler

May  6 22:34:21 The_Matrix kernel: usb 2-1.1: USB disconnect, device number 98

May  6 22:34:22 The_Matrix kernel: usb 2-1.1: new low speed USB device number 99 using ehci_hcd

May  6 22:34:22 The_Matrix kernel: generic-usb 0003:0764:0501.01D6: hiddev0,hidraw2: USB HID v1.10 Device [CPS CP550HG] on usb-0000:00:1d.0-1.1/input0

May  6 22:34:23 The_Matrix kernel: ata12.00: qc timeout (cmd 0x2f)

May  6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 1818:<7>mv_abort_task() mvi=f7700000 task=f742be00 slot=f77115d8 slot_idx=x0

May  6 22:34:23 The_Matrix kernel: ata12: failed to read log page 10h (errno=-5)

May  6 22:34:23 The_Matrix kernel: ata12.00: exception Emask 0x1 SAct 0x1 SErr 0x0 action 0x6 frozen

May  6 22:34:23 The_Matrix kernel: ata12.00: failed command: WRITE FPDMA QUEUED

May  6 22:34:23 The_Matrix kernel: ata12.00: cmd 61/00:00:50:8d:58/02:00:46:00:00/40 tag 0 ncq 262144 out

May  6 22:34:23 The_Matrix kernel:          res 01/04:00:50:8b:58/00:02:46:00:00/40 Emask 0x3 (HSM violation)

May  6 22:34:23 The_Matrix kernel: ata12.00: status: { ERR }

May  6 22:34:23 The_Matrix kernel: ata12.00: error: { ABRT }

May  6 22:34:23 The_Matrix kernel: ata12: hard resetting link

May  6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x89800.

May  6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x1001001

May  6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy0 Unplug Notice

May  6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x199800.

May  6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x1011081

May  6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[0]

May  6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2278:plugin interrupt but phy0 is gone

May  6 22:34:25 The_Matrix kernel: mvsas 0000:02:00.0: Phy0 : No sig fis

...

May  6 22:34:25 The_Matrix kernel: sas: sas_form_port: phy0 belongs to port0 already(1)!

May  6 22:34:25 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[0]:rc= 0

May  6 22:34:25 The_Matrix kernel: sas: sas_ata_hard_reset: Found ATA device.

May  6 22:34:25 The_Matrix kernel: sas: sas_ata_task_done: SAS error 2

May  6 22:34:25 The_Matrix kernel: sas: sas_ata_task_done: SAS error 2

May  6 22:34:25 The_Matrix kernel: ata12.00: both IDENTIFYs aborted, assuming NODEV

May  6 22:34:25 The_Matrix kernel: ata12.00: revalidation failed (errno=-2)

May  6 22:34:30 The_Matrix kernel: ata12: hard resetting link

 

Notice the part in blue.  I only added the color, not the words.  Highly unusual!  Sounds like it is very confused!  I cannot offer any suggestions or explanation.

 

It does not appear to have ever been able to reestablish communications with your parity drive, which results in all the 'disk0 write errors', so you can ignore any further errors concerning disk 0 or ata12.  My only suggestion would be to try connecting your parity drive to a different controller.

Link to comment

Thank you. I found it in the documentation as well. What I didn't found however is how to recreate a bzroot from what have been compiled and tested in Slackware (I'm using Salix 13.37 as Slackware still doesn't answer...)? Do you have this documented somewhere?

 

It's been talked about in the past in the forums and documented in the Wiki. Here's a starting point: http://lime-technology.com/wiki/index.php?title=Building_a_custom_kernel

 

One particular item to note, replace zcat with xzcat if using the newer compression format.

Link to comment

As for dropping the card and getting a NIC, there are two reasons why this shouldn't be the fix:

 

1. There are many Realktek based network mobs on the compatibility list and many motherboards on sale today still use it. It isn't acceptable to have a product not work properly with a fairly popular network chip.

2. Added power, cost and loss of a PCI-E slot.

 

Just to confirm, with previous betas I have managed to get 110MB/sec *constant* writes direct to the cache drive via AFP (can't remember which unfortunately, b8?, so unless the network adapter has given up on me I think it is a driver having issues!)

 

I do agree with your 2 points.  The question I have though is whether your particular Realtek chipset is faulty, or is a Realtek variant that is not supported well by the current Realtek driver/module, or just needs the firmware patch!  Hoping for success with -rc3...

 

I completely forgot (typical of me!) to mention one other thing in your syslog.  You too had spurious spindowns during the parity check.

May  6 10:29:57 Tower kernel: mdcmd (23): check CORRECT

May  6 10:29:57 Tower kernel: md: recovery thread woken up ...

May  6 10:29:57 Tower kernel: md: recovery thread checking parity...

May  6 10:29:57 Tower kernel: md: using 10000k window, over a total of 1953514552 blocks.

...

May  6 10:35:50 Tower kernel: mdcmd (25): spindown 1

May  6 10:35:51 Tower kernel: mdcmd (26): spindown 2

May  6 10:35:52 Tower kernel: mdcmd (27): spindown 3

May  6 10:35:52 Tower kernel: mdcmd (28): spindown 4

May  6 10:35:54 Tower kernel: mdcmd (29): spindown 2

May  6 10:35:57 Tower kernel: mdcmd (30): spindown 3

May  6 10:35:57 Tower kernel: mdcmd (31): spindown 4

 

Didn't seem to effect the end result, 75MB/sec average, 7.5 hours to complete, which is give or take correct.

 

Clearly a common issue tho, I'm sure Tom is on the case!

 

PS: Tom, thanks for looking at the issues with the drivers. I'm sure we will crack it once and for all!

Link to comment

As for dropping the card and getting a NIC, there are two reasons why this shouldn't be the fix:

 

1. There are many Realktek based network mobs on the compatibility list and many motherboards on sale today still use it. It isn't acceptable to have a product not work properly with a fairly popular network chip.

2. Added power, cost and loss of a PCI-E slot.

 

Just to confirm, with previous betas I have managed to get 110MB/sec *constant* writes direct to the cache drive via AFP (can't remember which unfortunately, b8?, so unless the network adapter has given up on me I think it is a driver having issues!)

 

I do agree with your 2 points.  The question I have though is whether your particular Realtek chipset is faulty, or is a Realtek variant that is not supported well by the current Realtek driver/module, or just needs the firmware patch!  Hoping for success with -rc3...

 

I completely forgot (typical of me!) to mention one other thing in your syslog.  You too had spurious spindowns during the parity check.

May  6 10:29:57 Tower kernel: mdcmd (23): check CORRECT

May  6 10:29:57 Tower kernel: md: recovery thread woken up ...

May  6 10:29:57 Tower kernel: md: recovery thread checking parity...

May  6 10:29:57 Tower kernel: md: using 10000k window, over a total of 1953514552 blocks.

...

May  6 10:35:50 Tower kernel: mdcmd (25): spindown 1

May  6 10:35:51 Tower kernel: mdcmd (26): spindown 2

May  6 10:35:52 Tower kernel: mdcmd (27): spindown 3

May  6 10:35:52 Tower kernel: mdcmd (28): spindown 4

May  6 10:35:54 Tower kernel: mdcmd (29): spindown 2

May  6 10:35:57 Tower kernel: mdcmd (30): spindown 3

May  6 10:35:57 Tower kernel: mdcmd (31): spindown 4

 

Didn't seem to effect the end result, 75MB/sec average, 7.5 hours to complete, which is give or take correct.

 

Clearly a common issue tho, I'm sure Tom is on the case!

 

PS: Tom, thanks for looking at the issues with the drivers. I'm sure we will crack it once and for all!

 

unless all your drives are full, wouldn't each disk stop reading at different times? then spin down for inactivity?

 

ie.. disk 1 is 50% full and disk 2 is 75%, wouldn't it read both disks to parity up to 50% then only disk 2 continues.

 

 

Link to comment

unless all your drives are full, wouldn't each disk stop reading at different times? then spin down for inactivity?

 

ie.. disk 1 is 50% full and disk 2 is 75%, wouldn't it read both disks to parity up to 50% then only disk 2 continues.

No, parity is calculated on the ENTIRE disk up to its full physical size... it has nothing to do with files, or usage.

 

You don't even need to have a file system to calculate parity.  (you can calculate it on drives not yet formatted)

 

Furthermore, the parity check was started only minutes prior to the repeated spin-downs.  It is a bug.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.