Interstellar Posted May 7, 2012 Share Posted May 7, 2012 Ok, issues with Realktek incoming! I took a look at your syslog. It looks to me that your Realtek performed without issue for quite awhile, until May 6 19:05:45, and then began spewing out link up messages. That looks to me as if either the driver or firmware had crashed, which would probably result in terrible performance! There is no direct evidence about performance speed here, except the link up messages imply lost time with each, and therefore slowdowns. When you recorded the extremely slow network speed, was that before or after 19:05:45? Try testing again, and see if a freshly booted server is just as slow. You have an intriguing line in your syslog: May 6 10:21:58 Tower kernel: r8169 0000:02:00.0: eth0: unable to load firmware patch rtl_nic/rtl8168d-2.fw (-2) Perhaps you or someone else with more time can research this, and determine if there is a firmware patch for your chipset, and how it would be applied, if that is even possible. Personally, I think you may be better off adding a PCIe network card to your system, and disabling the Realtek. I noticed 2 other things in your syslog. One was a minor kernel 'crash', look for the '--[ cut here ]--' 5 minutes after the parity check starts. Seems harmless, but definitely not normal. The other is that you have something added to your system that is spewing out useless and repeated syslog messages once every minute. That's 60 an hour, 1440 a day. Harmless but annoying. Edit: I should give credit to pantner, I just realized that I basically restated in different words what he had already said! I only started transferring files once those errors started. I've disabled all the things I had running (inc additions by unmenu) but unmenu is still running so I can get at the syslog easily. The network card crashes still happen. This is the following behaviour: 1. Transfers ok until crash @ 50MB/sec (wired) or @ 12MB/sec (wireless). 2. Error occurs, rate drops to zero and from there on in it alternates between 0 and max speed (but not max speed for long, so average transfer rate is <3 MB/sec). Clearly, not working properly. As for dropping the card and getting a NIC, there are two reasons why this shouldn't be the fix: 1. There are many Realktek based network mobs on the compatibility list and many motherboards on sale today still use it. It isn't acceptable to have a product not work properly with a fairly popular network chip. 2. Added power, cost and loss of a PCI-E slot. The kernel crash is the CPU, which I have ordered another one. This one tests for days at full load in windows but linux doesn't like it, never had a parity check error so it is clearly not in the bit of the CPU that matters, but I'm going to ditch it anyway. Just to confirm, with previous betas I have managed to get 110MB/sec *constant* writes direct to the cache drive via AFP (can't remember which unfortunately, b8?, so unless the network adapter has given up on me I think it is a driver having issues!) Quote Link to comment
PeterB Posted May 7, 2012 Share Posted May 7, 2012 I don't know whether this provides any clues, but here is a sequence of events on my Ubuntu desktop machine, after I encountered a 'Stale nfs file handle': While I was browsing around my Photos share, Nautilus became unresponsive (but there was no error message). I used telnet to connect to my unRAID server and performed 'ls -l' on both the Photos directory and the 'user' parent: root@Tower:~# ls -l /mnt/user/Photos total 161248 drwxrwx--- 1 nobody users 4360 2011-06-20 01:11 100OLYMP1/ drwxrwx--- 1 nobody users 496 2003-10-26 15:42 100OLYMP2/ drwxrwx--- 1 nobody users 120 2012-01-27 18:00 101029/ drwxr-xr-x 1 nobody users 72 2012-01-27 14:00 110324/ drwxr-xr-x 1 nobody users 72 2012-01-27 14:00 110830/ drwxr-xr-x 1 nobody users 72 2012-01-27 14:00 111004/ drwxr-xr-x 1 nobody users 72 2012-01-27 14:00 111007/ drwxr-xr-x 1 nobody users 336 2011-10-17 20:07 111007YNL/ drwxrwxr-x 1 nobody users 592 2011-10-25 00:29 111022Baking/ drwxrwxr-x 1 nobody users 96 2012-01-27 18:00 111106/ drwxrwxr-x 1 nobody users 136 2012-01-27 14:00 111107_MYF_CrocPark/ drwxrwxr-x 1 nobody users 104 2012-01-27 14:00 111209_Bukidnon/ drwxrwxr-x 1 nobody users 136 2012-01-27 18:00 120108_CDO/ drwxrwxr-x 1 nobody users 48 2012-05-07 18:20 120507/ drwx------ 1 nobody users 720 2012-01-27 14:00 Ai-Ai\ Graduation/ drwxr-xr-x 1 nobody users 2224 2012-01-27 14:00 Import/ drwxr-xr-x 1 nobody users 240 2012-01-27 14:00 Methodist\ Youth\ Surigao/ drwxr-xr-x 1 nobody users 208 2011-06-20 01:11 Ruby\ in\ UK/ -rw-r--r-- 1 nobody users 3072000 2012-01-27 16:06 digikam4.db -rw-r--r-- 1 nobody users 161874944 2012-01-27 16:06 thumbnails-digikam.db root@Tower:~# ls -l /mnt/user total 13024031 drwxr-xr-x 1 nobody users 48 2011-10-17 19:24 111007YNL/ drwxrwx--- 1 nobody users 424 2010-09-08 21:42 Athlon/ drwxr-xr-x 1 nobody users 20352 2011-12-08 08:30 Downloaded\ Files/ -rw-rw---- 1 nobody users 6542697514 2010-02-05 18:20 LoveStory_DVD.mkv drwxrwx--- 1 nobody users 128 2012-03-25 08:01 Maildir/ drwxrwx--- 1 nobody users 6912 2012-05-06 16:14 Movies/ drwxrwx--- 1 nobody users 384 2012-04-08 18:00 Music/ -rw-rw---- 1 nobody users 2088899096 2010-02-07 01:20 NOTTING\ HILL.mkv -rw-rw---- 1 nobody users 4691562496 2010-06-13 21:05 National\ Treasure\ 2004\ 720p.avi drwxrwxr-x 1 nobody users 504 2011-12-27 08:50 Pete's\ N97/ drwxrwxr-x 1 nobody users 72 2012-05-07 18:20 Photos/ drwxrwx--- 1 nobody users 296 2011-09-14 08:08 Series/ drwxrwx--- 1 nobody users 176 2012-04-08 07:55 Squeeze/ drwxr-xr-x 1 logitechmediaserver users 1392 2012-04-08 16:23 Squeeze-7.7.2/ drwxr-xr-x 1 root root 480 2012-03-30 12:32 Temporary/ drwxrwxr-x 1 nobody users 520 2011-11-12 19:58 UMC/ drwxrwx--- 1 nobody users 600 2010-09-04 20:22 UMC2.07/ drwxrwx--- 1 nobody users 600 2010-10-16 09:13 UMC2.08.1/ drwxrwxrwx 1 nobody users 600 2011-06-07 10:42 UMC2.08.1x/ drwxr-xr-x 1 nobody users 496 2011-09-13 08:41 UMC2.10/ drwxrwxr-x 1 nobody users 584 2012-02-15 22:54 UMC2.11/ drwxrwx--- 1 nobody users 696 2011-06-06 08:34 UMCold/ drwxrwx--- 1 nobody users 504 2012-01-02 09:01 Videos/ drwxrwxr-x 1 nobody users 168 2011-11-14 10:32 Wii/ drwxrwx--- 1 nobody users 576 2011-10-18 13:54 Work/ drwxr-xr-x 1 root root 504 2012-04-08 22:45 XFarG7/ drwxrwx--- 1 nobody users 72 2010-10-29 02:18 ZA30/ drwxrwx--- 1 nobody users 864 2010-10-29 01:21 ZK10/ drwxrwx--- 1 nobody users 120 2010-10-25 15:36 ZVideo/ drwxrwx--- 1 nobody users 184 2010-10-29 02:14 ZVideo2/ drwxrwx--- 1 nobody users 72 2010-10-19 23:43 mediaserver/ drwxrwx--- 1 nobody users 112 2011-02-26 13:12 mp3/ -rw-r--r-- 1 root root 18020 2012-03-30 12:32 nolimetangere.odt -rw-r--r-- 1 root root 342926 2012-03-30 12:30 output.pdf -rw-r--r-- 1 nobody users 30312 2012-01-11 00:35 pro.odt drwxrwx--- 1 nobody users 80 2011-09-15 07:19 series/ root@Tower:~# I then looked at the same directories from Ubuntu: peter@desktop:~$ ls -l /net/tower/mnt/user ls: cannot access /net/tower/mnt/user/Photos: Stale NFS file handle total 0 drwxr-xr-x 2 root root 0 May 7 18:19 Movies drwxr-xr-x 2 root root 0 May 7 18:19 Music d??? ? ? ? ? ? Photos drwxr-xr-x 2 root root 0 May 7 18:19 series drwxr-xr-x 2 root root 0 May 7 18:19 Series drwxr-xr-x 2 root root 0 May 7 18:19 UMC drwxr-xr-x 2 root root 0 May 7 18:19 Videos peter@desktop:~$ ls -l /net/tower/mnt/user/Photos ls: cannot access /net/tower/mnt/user/Photos: Stale NFS file handle peter@desktop:~$ sudo umount -f /net/tower/mnt/user/Photos [sudo] password for peter: peter@desktop:~$ ls -l /net/tower/mnt/user total 0 drwxr-xr-x 2 root root 0 May 7 18:19 Movies drwxr-xr-x 2 root root 0 May 7 18:19 Music drwxrwxr-x 1 99 users 72 May 7 18:20 Photos drwxr-xr-x 2 root root 0 May 7 18:19 series drwxr-xr-x 2 root root 0 May 7 18:19 Series drwxr-xr-x 2 root root 0 May 7 18:19 UMC drwxr-xr-x 2 root root 0 May 7 18:19 Videos peter@desktop:~$ ls -l /net/tower/mnt/user total 8 drwxrwx--- 1 99 users 6912 May 6 16:14 Movies drwxrwx--- 1 99 users 384 Apr 8 18:00 Music drwxrwxr-x 1 99 users 72 May 7 18:20 Photos drwxrwx--- 1 99 users 296 Sep 14 2011 series drwxr-xr-x 2 root root 0 May 7 18:19 Series drwxrwxr-x 1 99 users 520 Nov 12 19:58 UMC drwxrwx--- 1 99 users 504 Jan 2 09:01 Videos peter@desktop:~$ I use autofs to mount nfs shares automatically, hence I don't have to issue the mount command. Between the last two 'ls -l /net/tower/mnt/user' I had opened the Photos share in Nautilus - note that ownership of most folders has changed from 'root' to '99'. Here is the line showing details of the 'Photos' share from the output of 'mount' from Ubuntu. tower:/mnt/user/Photos on /net/tower/mnt/user/Photos type nfs (rw,nosuid,nodev,vers=3,hard,intr,nolock,udp,sloppy,addr=10.2.0.100) Quote Link to comment
boof Posted May 7, 2012 Share Posted May 7, 2012 I use autofs to mount nfs shares automatically, hence I don't have to issue the mount command Can you replicate the problem when they're permanently mounted? i.e outwith autofs. Autofs / automount can introduce an entire plethora of problems by themselves. Quote Link to comment
mejutty Posted May 7, 2012 Share Posted May 7, 2012 Attached is my syslog I to have the spindown oddities, drives spundown straight after reboot and then spindowns in the syslog just after starting parity check. syslog.zip Quote Link to comment
Phil C. Posted May 7, 2012 Share Posted May 7, 2012 RC2 installed; no showstoppers. With 5.0b14, I was having an issue that disks would not stay in a spin down condition. Shares were exported via AFP only; disks were exported via SMB only. After RC2 install, I returned to my desired configuration (Shares via AFP, disks via SMB) and the spin down issue has NOT returned as yet. I had thought this was the fault of my Mac computers (NetBios polling SMB drives periodically, causing them to wake), but I guess it was an issue with the previous SMB implementations. Whatever the problem, seems to have been fixed. Still having an issue with spindown under RC2...turned off SMB completely at unRAID server, disks still won't spindown appropriately. Even if I force them to spindown, they spin back up. Curiously, it is only the first drive in every share. For instance, I have two drives in the TV Shows share...only the first one won't spindown. Seems like the issue is with AFP. I run a Mac Mini w/Lion Server in my basement that is 24/7 connected to the drives via AFP. Perhaps there is some kind of periodic refresh query from AFP that goes out to the shares and wakes up the drives? I don't know. Not sure if this is something that can be addressed within unRAID at all, or if I need to alter configurations for my network. It may just be the price I pay (literally, in terms of drive wear and energy usage) for using AFP. AFP is so much easier than SMB - my preference is to stick with it. Should not be a roadblock to a 5.0 final, but something for Limetech to take a look at in a subsequent release. Phil Quote Link to comment
spidi Posted May 7, 2012 Share Posted May 7, 2012 May 7 13:13:36 nas kernel: mdcmd (594): spindown 2 May 7 13:13:50 nas kernel: mdcmd (595): spindown 5 May 7 13:13:51 nas kernel: mdcmd (596): spindown 4 May 7 13:13:53 nas kernel: mdcmd (597): spindown 1 May 7 13:27:25 nas kernel: mdcmd (598): spindown 0 May 7 13:28:08 nas kernel: mdcmd (599): spindown 0 May 7 13:28:08 nas kernel: mdcmd (600): spindown 2 May 7 13:28:26 nas kernel: mdcmd (601): spindown 1 May 7 13:28:27 nas kernel: mdcmd (602): spindown 3 May 7 13:28:28 nas kernel: mdcmd (603): spindown 4 May 7 13:28:36 nas kernel: mdcmd (604): spindown 2 May 7 13:29:27 nas kernel: mdcmd (605): spindown 0 May 7 13:29:27 nas kernel: mdcmd (606): spindown 3 May 7 13:30:33 nas kernel: mdcmd (607): spindown 0 May 7 13:30:33 nas kernel: mdcmd (608): spindown 3 May 7 13:30:50 nas kernel: mdcmd (609): spindown 0 May 7 13:30:50 nas kernel: mdcmd (610): spindown 3 May 7 13:31:04 nas kernel: mdcmd (611): spindown 3 May 7 13:31:49 nas kernel: mdcmd (612): spindown 0 May 7 13:31:50 nas kernel: mdcmd (613): spindown 3 May 7 14:13:52 nas kernel: mdcmd (614): spindown 4 May 7 14:13:53 nas kernel: mdcmd (615): spindown 5 May 7 14:13:56 nas kernel: mdcmd (616): spindown 1 May 7 14:28:04 nas kernel: mdcmd (617): spindown 5 May 7 14:28:20 nas kernel: mdcmd (618): spindown 3 May 7 14:31:04 nas kernel: mdcmd (619): spindown 0 May 7 14:31:05 nas kernel: mdcmd (620): spindown 3 May 7 14:42:08 nas kernel: mdcmd (621): spindown 3 May 7 15:13:33 nas kernel: mdcmd (622): spindown 2 May 7 15:13:43 nas kernel: mdcmd (623): spindown 4 May 7 15:13:44 nas kernel: mdcmd (624): spindown 5 May 7 15:13:47 nas kernel: mdcmd (625): spindown 1 May 7 15:27:10 nas kernel: mdcmd (626): spindown 0 May 7 15:28:15 nas kernel: mdcmd (627): spindown 3 My spindown time is set to 30 min, so even if one process is wacking up some drives why do they spindown again15 min after the last one??? Quote Link to comment
Auggie Posted May 7, 2012 Share Posted May 7, 2012 Hmm... I think I'm having intermittent NFS issues with RC2 that were not present under RC1. Some folders appear empty after awhile but when I mount the specific disk a suspect folder resides on it's fully intact and populated. My syslog shows the following errors after a time with the share volume mounted on an OS X machine (the same NFS errors that plagued me under betas >b12): May 7 07:27:44 UnRAID rpc.statd[1203]: No canonical hostname found for 10.0.1.200 May 7 07:27:44 UnRAID rpc.statd[1203]: STAT_FAIL to UnRAID for SM_MON of 10.0.1.200 May 7 07:27:44 UnRAID kernel: lockd: cannot monitor The Matrix May 7 07:27:44 UnRAID rpc.statd[1203]: No canonical hostname found for 10.0.1.200 Quote Link to comment
chickensoup Posted May 7, 2012 Share Posted May 7, 2012 Hmm... I think I'm having intermittent NFS issues with RC2 that were not present under RC1. Some folders appear empty after awhile but when I mount the specific disk a suspect folder resides on it's fully intact and populated. My syslog shows the following errors after a time with the share volume mounted on an OS X machine (the same NFS errors that plagued me under betas >b12): May 7 07:27:44 UnRAID rpc.statd[1203]: No canonical hostname found for 10.0.1.200 May 7 07:27:44 UnRAID rpc.statd[1203]: STAT_FAIL to UnRAID for SM_MON of 10.0.1.200 May 7 07:27:44 UnRAID kernel: lockd: cannot monitor The Matrix May 7 07:27:44 UnRAID rpc.statd[1203]: No canonical hostname found for 10.0.1.200 Probably best to post a syslog & state if you are running any addons; otherwise Tom isn't likely to get back to you in a hurry Quote Link to comment
bonienl Posted May 7, 2012 Share Posted May 7, 2012 It appears to be a naming resolution problem. Any of your applications needs a name instead of an IP address? Quote Link to comment
ehfortin Posted May 7, 2012 Share Posted May 7, 2012 Hi BRiT, *** Update *** I just found updated documentation for RC2 at http://lime-technology.com/wiki/index.php?title=Installing_unRAID_5.0_on_a_full_Slackware_Distro. I just have to find a way to get the latest version of slackware and I'll try this. **** I've installed 5.0-rc2 on ESXi 5.0 on a board that don't support VT-d. As such, I use RDM to pass my disks to the VM. This used to work fine back on 5.0-b2 but if I remember well, I had to recompile the kernel with the VMware paravirtual driver. Right now, using the LSI SAS driver in ESXi, I'm seeing the drive in Unraid but without any label/path in id-path/id-label. As such, Unraid seems unable to make sure the drive are the same so I can't assign them. Using the paravirtual driver, I see no drives which is not good. Following your post on this: For those looking to install this into a full Slackware distro or do their own custom development, the command to unbzroot is now: xzcat bzroot | cpio -m -i -d -H newc --no-absolute-filenames I've looked at the wiki on this subject and the steps to compile a new kernel use Slackware as the basis. The problem I have now are 1) Slackware website is not responding at all and 2) the latest and greatest kernel in Slackware is 2.6.xx which is far from 3.0.30. Can you help me (and probably others) about how you proceed to compile this latest release of Unraid at this moment? Is there a simple recipe to follow that is easier then modifying at each step the procedure on the wiki? Thank you. ehfortin Quote Link to comment
WingmanNZ Posted May 7, 2012 Share Posted May 7, 2012 Quick question for all those running rc2, Are any of you running an "Atheros AR813X/AR815X" NIC or a "Realtek® 8111F" NIC? Just looking at a new unRAID build and need to ensure I get a board with a supported NIC. from this webpage http://greenleaf-technology-hwandsw.blogspot.co.nz/2011/08/search-for-new-budget-board-continues.html it looks like 5.0b10 was flawless on the Atheros AR813X/AR815X Chip. My two boards that I'm looking at are the Gigabyte GA-Z77MX-D3H or Asus P8Z77-M PRO. thanks and look forward to a reply. Quote Link to comment
BRiT Posted May 7, 2012 Share Posted May 7, 2012 To get the kernel version that unRAID 5.0 RC2 uses, get it from the official Linux kernel site, Kernel.org. wget http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.0.30.tar.bz2 Quote Link to comment
Rich Posted May 8, 2012 Share Posted May 8, 2012 I'm still having mover issues I manually copied everything from my cache drive to the relevant disks and that seemed to stop unRAID hanging when mover runs, but something is still not right... sys log attached. The last entry in the system log was ten minutes ago and mover is still apparently running, but nothing is actually 'moving' off the cache drive sys_log.txt Quote Link to comment
Rich Posted May 8, 2012 Share Posted May 8, 2012 now i have these two entires... May 8 01:28:36 unRAID kernel: INFO: rcu_sched_state detected stall on CPU 0 (t=6000 jiffies) May 8 01:31:37 unRAID kernel: INFO: rcu_sched_state detected stall on CPU 0 (t=24030 jiffies) and the processor is running at 50% because of a 'system process'. No idea what is happening?? Quote Link to comment
limetech Posted May 8, 2012 Author Share Posted May 8, 2012 ... You have an intriguing line in your syslog: May 6 10:21:58 Tower kernel: r8169 0000:02:00.0: eth0: unable to load firmware patch rtl_nic/rtl8168d-2.fw (-2) Chasing this down here's what I discovered. There are some classes of "firmware" (pre-compiled binaries) that are no longer kept with the rest of the linux source tree due to "licensing issues". The Realtek f/w falls in this class. So I had to chase down where this is kept now and found it here: http://git.kernel.org/?p=linux/kernel/git/dwmw2/linux-firmware.git;a=tree This firmware is incorporated into upcoming -rc3. Probably this will fix some some issues, so... please hold off on further Realtek NIC reports until -rc3. Quote Link to comment
RobJ Posted May 8, 2012 Share Posted May 8, 2012 As for dropping the card and getting a NIC, there are two reasons why this shouldn't be the fix: 1. There are many Realktek based network mobs on the compatibility list and many motherboards on sale today still use it. It isn't acceptable to have a product not work properly with a fairly popular network chip. 2. Added power, cost and loss of a PCI-E slot. Just to confirm, with previous betas I have managed to get 110MB/sec *constant* writes direct to the cache drive via AFP (can't remember which unfortunately, b8?, so unless the network adapter has given up on me I think it is a driver having issues!) I do agree with your 2 points. The question I have though is whether your particular Realtek chipset is faulty, or is a Realtek variant that is not supported well by the current Realtek driver/module, or just needs the firmware patch! Hoping for success with -rc3... I completely forgot (typical of me!) to mention one other thing in your syslog. You too had spurious spindowns during the parity check. May 6 10:29:57 Tower kernel: mdcmd (23): check CORRECT May 6 10:29:57 Tower kernel: md: recovery thread woken up ... May 6 10:29:57 Tower kernel: md: recovery thread checking parity... May 6 10:29:57 Tower kernel: md: using 10000k window, over a total of 1953514552 blocks. ... May 6 10:35:50 Tower kernel: mdcmd (25): spindown 1 May 6 10:35:51 Tower kernel: mdcmd (26): spindown 2 May 6 10:35:52 Tower kernel: mdcmd (27): spindown 3 May 6 10:35:52 Tower kernel: mdcmd (28): spindown 4 May 6 10:35:54 Tower kernel: mdcmd (29): spindown 2 May 6 10:35:57 Tower kernel: mdcmd (30): spindown 3 May 6 10:35:57 Tower kernel: mdcmd (31): spindown 4 Quote Link to comment
WingmanNZ Posted May 8, 2012 Share Posted May 8, 2012 ... You have an intriguing line in your syslog: May 6 10:21:58 Tower kernel: r8169 0000:02:00.0: eth0: unable to load firmware patch rtl_nic/rtl8168d-2.fw (-2) Chasing this down here's what I discovered. There are some classes of "firmware" (pre-compiled binaries) that are no longer kept with the rest of the linux source tree due to "licensing issues". The Realtek f/w falls in this class. So I had to chase down where this is kept now and found it here: http://git.kernel.org/?p=linux/kernel/git/dwmw2/linux-firmware.git;a=tree This firmware is incorporated into upcoming -rc3. Probably this will fix some some issues, so... please hold off on further Realtek NIC reports until -rc3. Sorry to hijack this response but would you know if either of these two NIC chipsets are supported with the current kernel/drivers? Atheros AR813X/AR815X Realtek 8111F Quote Link to comment
ehfortin Posted May 8, 2012 Share Posted May 8, 2012 Hi, To get the kernel version that unRAID 5.0 RC2 uses, get it from the official Linux kernel site, Kernel.org. wget http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.0.30.tar.bz2 Thank you. I found it in the documentation as well. What I didn't found however is how to recreate a bzroot from what have been compiled and tested in Slackware (I'm using Salix 13.37 as Slackware still doesn't answer...)? Do you have this documented somewhere? Thanks. ehfortin Quote Link to comment
RobJ Posted May 8, 2012 Share Posted May 8, 2012 I have disabled all the addons, because I was seeing errors while running Parity sync, and once it does that it don't give options to change anything, it just stops, and you don't know until F5 is hit. It was stopping somewhere after 6%, after removing all addons, it went to atleast 20%. What can I do to correct this problem? I am thinking it may have something to do with Plex, as I see a lot of info coming up on disc 5. (That is where I HAD Plex. ) I have long sense gotten rid of Plex, but I do not seem to have the correct permissions to delete some 2,000+ files and the master folder. In addition to pantner's comments, here's what I noticed concerning your parity drive: (this is about 2.5 hours after the parity build began) May 6 22:34:18 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 1922:find reserved error, why? May 6 22:34:18 The_Matrix kernel: sas: sas_ata_task_done: SAS error 2 May 6 22:34:18 The_Matrix kernel: sas: Enter sas_scsi_recover_host May 6 22:34:18 The_Matrix kernel: ata12: sas eh calling libata cmd error handler May 6 22:34:21 The_Matrix kernel: ata12: sas eh calling libata port error handler May 6 22:34:21 The_Matrix kernel: usb 2-1.1: USB disconnect, device number 98 May 6 22:34:22 The_Matrix kernel: usb 2-1.1: new low speed USB device number 99 using ehci_hcd May 6 22:34:22 The_Matrix kernel: generic-usb 0003:0764:0501.01D6: hiddev0,hidraw2: USB HID v1.10 Device [CPS CP550HG] on usb-0000:00:1d.0-1.1/input0 May 6 22:34:23 The_Matrix kernel: ata12.00: qc timeout (cmd 0x2f) May 6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 1818:<7>mv_abort_task() mvi=f7700000 task=f742be00 slot=f77115d8 slot_idx=x0 May 6 22:34:23 The_Matrix kernel: ata12: failed to read log page 10h (errno=-5) May 6 22:34:23 The_Matrix kernel: ata12.00: exception Emask 0x1 SAct 0x1 SErr 0x0 action 0x6 frozen May 6 22:34:23 The_Matrix kernel: ata12.00: failed command: WRITE FPDMA QUEUED May 6 22:34:23 The_Matrix kernel: ata12.00: cmd 61/00:00:50:8d:58/02:00:46:00:00/40 tag 0 ncq 262144 out May 6 22:34:23 The_Matrix kernel: res 01/04:00:50:8b:58/00:02:46:00:00/40 Emask 0x3 (HSM violation) May 6 22:34:23 The_Matrix kernel: ata12.00: status: { ERR } May 6 22:34:23 The_Matrix kernel: ata12.00: error: { ABRT } May 6 22:34:23 The_Matrix kernel: ata12: hard resetting link May 6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x89800. May 6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x1001001 May 6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy0 Unplug Notice May 6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x199800. May 6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x1011081 May 6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[0] May 6 22:34:23 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 2278:plugin interrupt but phy0 is gone May 6 22:34:25 The_Matrix kernel: mvsas 0000:02:00.0: Phy0 : No sig fis ... May 6 22:34:25 The_Matrix kernel: sas: sas_form_port: phy0 belongs to port0 already(1)! May 6 22:34:25 The_Matrix kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[0]:rc= 0 May 6 22:34:25 The_Matrix kernel: sas: sas_ata_hard_reset: Found ATA device. May 6 22:34:25 The_Matrix kernel: sas: sas_ata_task_done: SAS error 2 May 6 22:34:25 The_Matrix kernel: sas: sas_ata_task_done: SAS error 2 May 6 22:34:25 The_Matrix kernel: ata12.00: both IDENTIFYs aborted, assuming NODEV May 6 22:34:25 The_Matrix kernel: ata12.00: revalidation failed (errno=-2) May 6 22:34:30 The_Matrix kernel: ata12: hard resetting link Notice the part in blue. I only added the color, not the words. Highly unusual! Sounds like it is very confused! I cannot offer any suggestions or explanation. It does not appear to have ever been able to reestablish communications with your parity drive, which results in all the 'disk0 write errors', so you can ignore any further errors concerning disk 0 or ata12. My only suggestion would be to try connecting your parity drive to a different controller. Quote Link to comment
BRiT Posted May 8, 2012 Share Posted May 8, 2012 Thank you. I found it in the documentation as well. What I didn't found however is how to recreate a bzroot from what have been compiled and tested in Slackware (I'm using Salix 13.37 as Slackware still doesn't answer...)? Do you have this documented somewhere? It's been talked about in the past in the forums and documented in the Wiki. Here's a starting point: http://lime-technology.com/wiki/index.php?title=Building_a_custom_kernel One particular item to note, replace zcat with xzcat if using the newer compression format. Quote Link to comment
Interstellar Posted May 8, 2012 Share Posted May 8, 2012 As for dropping the card and getting a NIC, there are two reasons why this shouldn't be the fix: 1. There are many Realktek based network mobs on the compatibility list and many motherboards on sale today still use it. It isn't acceptable to have a product not work properly with a fairly popular network chip. 2. Added power, cost and loss of a PCI-E slot. Just to confirm, with previous betas I have managed to get 110MB/sec *constant* writes direct to the cache drive via AFP (can't remember which unfortunately, b8?, so unless the network adapter has given up on me I think it is a driver having issues!) I do agree with your 2 points. The question I have though is whether your particular Realtek chipset is faulty, or is a Realtek variant that is not supported well by the current Realtek driver/module, or just needs the firmware patch! Hoping for success with -rc3... I completely forgot (typical of me!) to mention one other thing in your syslog. You too had spurious spindowns during the parity check. May 6 10:29:57 Tower kernel: mdcmd (23): check CORRECT May 6 10:29:57 Tower kernel: md: recovery thread woken up ... May 6 10:29:57 Tower kernel: md: recovery thread checking parity... May 6 10:29:57 Tower kernel: md: using 10000k window, over a total of 1953514552 blocks. ... May 6 10:35:50 Tower kernel: mdcmd (25): spindown 1 May 6 10:35:51 Tower kernel: mdcmd (26): spindown 2 May 6 10:35:52 Tower kernel: mdcmd (27): spindown 3 May 6 10:35:52 Tower kernel: mdcmd (28): spindown 4 May 6 10:35:54 Tower kernel: mdcmd (29): spindown 2 May 6 10:35:57 Tower kernel: mdcmd (30): spindown 3 May 6 10:35:57 Tower kernel: mdcmd (31): spindown 4 Didn't seem to effect the end result, 75MB/sec average, 7.5 hours to complete, which is give or take correct. Clearly a common issue tho, I'm sure Tom is on the case! PS: Tom, thanks for looking at the issues with the drivers. I'm sure we will crack it once and for all! Quote Link to comment
Rich Posted May 8, 2012 Share Posted May 8, 2012 Just ran mover again, after sab had downloaded overnight (only thing on my cache drive) and the system hangs again, sys log attached. Any help would be appriciated as i'm having to 'move' everything myself atm. system_log.txt Quote Link to comment
generalz Posted May 8, 2012 Share Posted May 8, 2012 As for dropping the card and getting a NIC, there are two reasons why this shouldn't be the fix: 1. There are many Realktek based network mobs on the compatibility list and many motherboards on sale today still use it. It isn't acceptable to have a product not work properly with a fairly popular network chip. 2. Added power, cost and loss of a PCI-E slot. Just to confirm, with previous betas I have managed to get 110MB/sec *constant* writes direct to the cache drive via AFP (can't remember which unfortunately, b8?, so unless the network adapter has given up on me I think it is a driver having issues!) I do agree with your 2 points. The question I have though is whether your particular Realtek chipset is faulty, or is a Realtek variant that is not supported well by the current Realtek driver/module, or just needs the firmware patch! Hoping for success with -rc3... I completely forgot (typical of me!) to mention one other thing in your syslog. You too had spurious spindowns during the parity check. May 6 10:29:57 Tower kernel: mdcmd (23): check CORRECT May 6 10:29:57 Tower kernel: md: recovery thread woken up ... May 6 10:29:57 Tower kernel: md: recovery thread checking parity... May 6 10:29:57 Tower kernel: md: using 10000k window, over a total of 1953514552 blocks. ... May 6 10:35:50 Tower kernel: mdcmd (25): spindown 1 May 6 10:35:51 Tower kernel: mdcmd (26): spindown 2 May 6 10:35:52 Tower kernel: mdcmd (27): spindown 3 May 6 10:35:52 Tower kernel: mdcmd (28): spindown 4 May 6 10:35:54 Tower kernel: mdcmd (29): spindown 2 May 6 10:35:57 Tower kernel: mdcmd (30): spindown 3 May 6 10:35:57 Tower kernel: mdcmd (31): spindown 4 Didn't seem to effect the end result, 75MB/sec average, 7.5 hours to complete, which is give or take correct. Clearly a common issue tho, I'm sure Tom is on the case! PS: Tom, thanks for looking at the issues with the drivers. I'm sure we will crack it once and for all! unless all your drives are full, wouldn't each disk stop reading at different times? then spin down for inactivity? ie.. disk 1 is 50% full and disk 2 is 75%, wouldn't it read both disks to parity up to 50% then only disk 2 continues. Quote Link to comment
Joe L. Posted May 8, 2012 Share Posted May 8, 2012 unless all your drives are full, wouldn't each disk stop reading at different times? then spin down for inactivity? ie.. disk 1 is 50% full and disk 2 is 75%, wouldn't it read both disks to parity up to 50% then only disk 2 continues. No, parity is calculated on the ENTIRE disk up to its full physical size... it has nothing to do with files, or usage. You don't even need to have a file system to calculate parity. (you can calculate it on drives not yet formatted) Furthermore, the parity check was started only minutes prior to the repeated spin-downs. It is a bug. Quote Link to comment
generalz Posted May 8, 2012 Share Posted May 8, 2012 ahh this has been a bug for awhile then, i thought it was normal. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.