August 11, 201213 yr Both the SMB and emhttp are both still running, meaning that the unresponsiveness is not due to the processes getting killed.. Two ideas: (1) Since all the rest is also still running I am wondering if there might be some kind of performance issue going on.. Samba is having a couple of processes spawned, are you doing something to the server atthis moment ? Parity check ? Preclear? Watching a movie ? What plugins do you have running ? Possibly a plugin that is intensively scanning your shares ? Plex ? Itunes ? (2) The syslog shows that ata19 is having some troubles, can you get a SMART report on that drive and post it please ? Its a WD30EZRX, green 3TB WD so probably not that old, did you recently add it ?
August 11, 201213 yr Author Thank you for taking the time to look it over. I think I was ripping a movie to the server when I ran the test and its used as a media streaming server to my TVs. I also stream my music off it. Theres a decent chance my girlfriend was playing music through a Sonos while I ran the test. I haven't added any drives since I got it, they were all new. How can I find out which drive ata19 is? I can turn it off via the power button if need be but as the server is still working in all respects other than browser access it seems a bit needless.
August 12, 201213 yr Well... to make sure it isn't a performance issue I would advice that at the moment it happens again you stop all the activities you are doing on the server and see if the webinterface becomes available again.. So stop ripping (or wait until it is completed) and then stop all other activity (stop the movie and get your girlfriend a magazine ;-) If the webserver responds again then it is a performance issue. With respect to the drive, I noted the specific drive signature in the previous mail, does that help? Alternatively just do a SMART test (short) on all your drives (cannot hurt, always a good thing to do now and then), and check out if anything is makred "FAILING NOW", specifically check out the values of the following attributes: Current_Pending_Sector Offline_Uncorrectable If your drive has values other then 0 on either of those it is worthwhile to check them out a bit further.
August 12, 201213 yr Author The web interface is unresponsive even with no attempts to access the server. Here is another print screen of the requested command. It repeats itself a few times each minute from about 9am until now, the attached pic is only a segment of that.
August 12, 201213 yr Ok, if we assume that is correct then you can rule out performance issues. Remains the disk situation, have you run the SMART tests ?
August 12, 201213 yr Author Ok Ive attached the smart reports. Ive taken a quick look but they don't mean a lot to me yet. Thank you for taking the time to look. parity_smart.txt disk_1_smart.txt disk_2_smart.txt disk_3_smart.txt
August 12, 201213 yr Author I should point out these tests were done after a hard reboot and the server was responding correctly during the tests. disk_5_smart.txt disk_6_smart.txt disk_7_smart.txt disk_8_smart.txt
August 12, 201213 yr Author more. disk_9_smart.txt disk_10_smart.txt disk_11_smart.txt disk_12_smart.txt
August 12, 201213 yr Ok Ive attached the smart reports. Ive taken a quick look but they don't mean a lot to me yet. Thank you for taking the time to look. Parity drive: Fine ! Disk 1: Fine ! Disk 2: Fine ! Disk 3: Fine !
August 12, 201213 yr I should point out these tests were done after a hard reboot and the server was responding correctly during the tests. Disk5: Fine ! Disk6: Fine ! Disk7: Fine ! Disk8: One sector on this drive has been reallocated, that is not dramatic, part of the way a drive works. If that number increases over time it is a sign the drive deteriorating. This has not been the cause of continuing issues on your system
August 12, 201213 yr more. Disk9: Fine ! Disk10: Fine ! Disk11: One UDMA CRC error, nothing to worry about, not an issue, just like before, keep an eye to see if it does not change, this does not point to an error on the drive but some electrical / cable error. No need for any action if the number does not increase. Disk12: Three UDMA CRCR errors, see above
August 12, 201213 yr Basically I see nothing wrong with the drives.. Also that means I am kind of stumped... - Your webinterface becomes unresponsive; - Also happens when your server is not doing anything; - Drives are fine I had one additional idea but I cannot get it to work: At the moment hthe webserver is unresponsiveness you could telnet into your system and then give the following command in the console: telnet <ip of unraid server> 80 then you are in dialogue with your webserver and entering GET should get you the html code.. Unfortunately it does not seem to work on my system (and my webinterface is working). Anyone any idea ?
August 12, 201213 yr The web interface is unresponsive even with no attempts to access the server. Here is another print screen of the requested command. It repeats itself a few times each minute from about 9am until now, the attached pic is only a segment of that. This may be an issue. Why are there so many smbd processes? What add-ons are running?
August 12, 201213 yr The screen shot shows apparently hundreds of smbd processes. Typically, there are only a few. That is your issue.... (Of course, I do not know the solution, but at least you now have a symptom that is very unique)
August 13, 201213 yr The screen shot shows apparently hundreds of smbd processes. Typically, there are only a few. That is your issue.... (Of course, I do not know the solution, but at least you now have a symptom that is very unique) How do you have the share security configured? Are you using a domain controller? What does smbstatus show? This seems to be similar to your issue http://samba.2283325.n4.nabble.com/Re-hundreds-of-smb-D-processes-td2435734.html
August 13, 201213 yr Author That doesn't sound good. Would a command like that work for unraid as well? I have attached a print screen of the command. Whats odd is it seems to happen whenever I run a parity check and not at any other time. My server is used to stream media only. A sonos music library, TV shows and Movies streamed to 2 Dune media players with yaDIS jukebox. Thats about it! The music libraru updates once a night and yadis only updates the movie jukebox when I tell it do, every couple of days usually.
August 13, 201213 yr That doesn't sound good. Would a command like that work for unraid as well? If you've not added any addons, and not created a smb-extra.conf file on your config directory you can give their possible solution easily. Apparently something on your LAN is creating SAMBA connections, but not closing them. Eventually, it probably ends up with all the resources involved on your server being allocated, thus things stop working. Type: echo "deadtime = 60" >>/boot/config/smb-extra.conf That will create the smb-extra.conf file if it does not exist, and append the line if it does. You only need do this once, at the file will remain when the array is re-booted. Then, re-start SAMBA. You can do that by typing /boot/samba restart
August 13, 201213 yr Author Thank you for the advice, ill keep this thread updated. Ive entered the first command in but the 2nd didn't work so I rebooted the server as I thought that would do the same thing.
August 13, 201213 yr Author The tower is unresponsive through the web interface again. It only happens when the server has been running the parity check for 20 minutes or so.
August 14, 201213 yr Author As I can't run a parity check successfully I have to assume I have no parity backup. Unless we can fix this quickly Im going to have to move away from unraid which isn't something Id like to do.
August 14, 201213 yr i saw this in your syslog posted a while back. Aug 10 08:35:17 Tower kernel: sas: command 0xf0517f00, task 0xf072e000, timed out: BLK_EH_NOT_HANDLED Aug 10 08:35:17 Tower kernel: sas: Enter sas_scsi_recover_host Aug 10 08:35:17 Tower kernel: sas: trying to find task 0xf072e000 Aug 10 08:35:17 Tower kernel: sas: sas_scsi_find_task: aborting task 0xf072e000 Aug 10 08:35:17 Tower kernel: sas: sas_scsi_find_task: querying task 0xf072e000 Aug 10 08:35:17 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1747:mvs_query_task:rc= 5 Aug 10 08:35:17 Tower kernel: sas: sas_scsi_find_task: task 0xf072e000 failed to abort Aug 10 08:35:17 Tower kernel: sas: task 0xf072e000 is not at LU: I_T recover Aug 10 08:35:17 Tower kernel: sas: I_T nexus reset for dev 0700000000000000 Aug 10 08:35:18 Tower kernel: sas: sas_form_port: phy7 belongs to port6 already(1)! Aug 10 08:35:19 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[6]:rc= 0 Aug 10 08:35:19 Tower kernel: sas: I_T 0700000000000000 recovered Aug 10 08:35:19 Tower kernel: sas: sas_ata_task_done: SAS error 8d Aug 10 08:35:19 Tower kernel: ata15: sas eh calling libata port error handler Aug 10 08:35:19 Tower kernel: ata16: sas eh calling libata port error handler Aug 10 08:35:19 Tower kernel: ata17: sas eh calling libata port error handler Aug 10 08:35:19 Tower kernel: ata18: sas eh calling libata port error handler Aug 10 08:35:19 Tower kernel: ata19: sas eh calling libata port error handler Aug 10 08:35:19 Tower kernel: ata20: sas eh calling libata port error handler Aug 10 08:35:19 Tower kernel: ata21: sas eh calling libata port error handler Aug 10 08:35:19 Tower kernel: sas: sas_ata_task_done: SAS error 2 Aug 10 08:35:19 Tower kernel: ata21: failed to read log page 10h (errno=-5) Aug 10 08:35:19 Tower kernel: ata21.00: exception Emask 0x1 SAct 0x1 SErr 0x0 action 0x6 t0 Aug 10 08:35:19 Tower kernel: ata21.00: failed command: READ FPDMA QUEUED Aug 10 08:35:19 Tower kernel: ata21.00: cmd 60/00:00:38:0b:56/02:00:07:00:00/40 tag 0 ncq 262144 in Aug 10 08:35:19 Tower kernel: res 01/04:04:38:09:56/00:00:07:00:00/40 Emask 0x3 (HSM violation) Aug 10 08:35:19 Tower kernel: ata21.00: status: { ERR } Aug 10 08:35:19 Tower kernel: ata21.00: error: { ABRT } Aug 10 08:35:19 Tower kernel: ata21: hard resetting link Aug 10 08:35:20 Tower kernel: sas: sas_form_port: phy7 belongs to port6 already(1)! Aug 10 08:35:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[6]:rc= 0 Aug 10 08:35:22 Tower kernel: sas: sas_ata_hard_reset: Found ATA device. Aug 10 08:35:22 Tower kernel: ata21.00: configured for UDMA/133 Aug 10 08:35:22 Tower kernel: ata21: EH complete Aug 10 08:35:22 Tower kernel: sas: --- Exit sas_scsi_recover_host what controller are these devices on (ata16-21)? what version of unraid are you using presently?
August 14, 201213 yr Author Hi I am using version 5 RC5 and I have 2 sata controller cards in my system. Supermicro AOC-SASLP-MV8, 8-Port SAS/SATA Card found here; http://www.scan.co.uk/products/supermicro-aoc-saslp-mv8-8-port-sas-sata-300mb-s-pci-e-controller-card
Archived
This topic is now archived and is closed to further replies.