Red Dot

August 11, 201213 yr

Both the SMB and emhttp are both still running, meaning that the unresponsiveness is not due to the processes getting killed..

Two ideas:

(1)

Since all the rest is also still running I am wondering if there might be some kind of performance issue going on.. Samba is having a couple of processes spawned, are you doing something to the server atthis moment ?

Parity check ? Preclear? Watching a movie ? What plugins do you have running ? Possibly a plugin that is intensively scanning your shares ? Plex ? Itunes ?

(2)

The syslog shows that ata19 is having some troubles, can you get a SMART report on that drive and post it please ? Its a WD30EZRX, green 3TB WD so probably not that old, did you recently add it ?

Quote

August 11, 201213 yr

Author

Thank you for taking the time to look it over.

I think I was ripping a movie to the server when I ran the test and its used as a media streaming server to my TVs. I also stream my music off it. Theres a decent chance my girlfriend was playing music through a Sonos while I ran the test.

I haven't added any drives since I got it, they were all new. How can I find out which drive ata19 is? I can turn it off via the power button if need be but as the server is still working in all respects other than browser access it seems a bit needless.

Quote

August 12, 201213 yr

Well... to make sure it isn't a performance issue I would advice that at the moment it happens again you stop all the activities you are doing on the server and see if the webinterface becomes available again.. So stop ripping (or wait until it is completed) and then stop all other activity (stop the movie and get your girlfriend a magazine ;-)

If the webserver responds again then it is a performance issue.

With respect to the drive, I noted the specific drive signature in the previous mail, does that help? Alternatively just do a SMART test (short) on all your drives (cannot hurt, always a good thing to do now and then), and check out if anything is makred "FAILING NOW", specifically check out the values of the following attributes:

Current_Pending_Sector

Offline_Uncorrectable

If your drive has values other then 0 on either of those it is worthwhile to check them out a bit further.

Quote

August 12, 201213 yr

Author

The web interface is unresponsive even with no attempts to access the server.

Here is another print screen of the requested command. It repeats itself a few times each minute from about 9am until now, the attached pic is only a segment of that.

Quote

August 12, 201213 yr

Ok, if we assume that is correct then you can rule out performance issues. Remains the disk situation, have you run the SMART tests ?

Quote

August 12, 201213 yr

Author

Ok Ive attached the smart reports. Ive taken a quick look but they don't mean a lot to me yet.

Thank you for taking the time to look.

Quote

August 12, 201213 yr

Author

I should point out these tests were done after a hard reboot and the server was responding correctly during the tests.

Quote

August 12, 201213 yr

Author

more.

Quote

August 12, 201213 yr

Author

Done!

disk_13_smart.txt

disk_14_smart.txt

disk_15_smart.txt

Quote

August 12, 201213 yr

Ok Ive attached the smart reports. Ive taken a quick look but they don't mean a lot to me yet.

Thank you for taking the time to look.

Parity drive: Fine !

Disk 1: Fine !

Disk 2: Fine !

Disk 3: Fine !

Quote

August 12, 201213 yr

I should point out these tests were done after a hard reboot and the server was responding correctly during the tests.

Disk5: Fine !

Disk6: Fine !

Disk7: Fine !

Disk8: One sector on this drive has been reallocated, that is not dramatic, part of the way a drive works. If that number increases over time it is a sign the drive deteriorating. This has not been the cause of continuing issues on your system

Quote

August 12, 201213 yr

more.

Disk9: Fine !

Disk10: Fine !

Disk11: One UDMA CRC error, nothing to worry about, not an issue, just like before, keep an eye to see if it does not change, this does not point to an error on the drive but some electrical / cable error. No need for any action if the number does not increase.

Disk12: Three UDMA CRCR errors, see above

Quote

August 12, 201213 yr

Done!

Disk13: Fine !

Disk14: Fine !

Disk15: Fine !

Quote

August 12, 201213 yr

Basically I see nothing wrong with the drives..

Also that means I am kind of stumped...

- Your webinterface becomes unresponsive;

- Also happens when your server is not doing anything;

- Drives are fine

I had one additional idea but I cannot get it to work:

At the moment hthe webserver is unresponsiveness you could telnet into your system and then give the following command in the console:

telnet <ip of unraid server> 80

then you are in dialogue with your webserver and entering GET should get you the html code.. Unfortunately it does not seem to work on my system (and my webinterface is working).

Anyone any idea ?

Quote

August 12, 201213 yr

The web interface is unresponsive even with no attempts to access the server.

Here is another print screen of the requested command. It repeats itself a few times each minute from about 9am until now, the attached pic is only a segment of that.

This may be an issue. Why are there so many smbd processes? What add-ons are running?

Quote

August 12, 201213 yr

Author

I haven't installed any ad ons to the server.

Quote

August 12, 201213 yr

The screen shot shows apparently hundreds of smbd processes. Typically, there are only a few.

That is your issue.... (Of course, I do not know the solution, but at least you now have a symptom that is very unique)

Quote

August 13, 201213 yr

The screen shot shows apparently hundreds of smbd processes. Typically, there are only a few.

That is your issue.... (Of course, I do not know the solution, but at least you now have a symptom that is very unique)

How do you have the share security configured? Are you using a domain controller?

What does

smbstatus

show?

This seems to be similar to your issue

http://samba.2283325.n4.nabble.com/Re-hundreds-of-smb-D-processes-td2435734.html

Quote

August 13, 201213 yr

Author

That doesn't sound good. Would a command like that work for unraid as well?

I have attached a print screen of the command.

Whats odd is it seems to happen whenever I run a parity check and not at any other time.

My server is used to stream media only. A sonos music library, TV shows and Movies streamed to 2 Dune media players with yaDIS jukebox. Thats about it! The music libraru updates once a night and yadis only updates the movie jukebox when I tell it do, every couple of days usually.

Quote

August 13, 201213 yr

That doesn't sound good. Would a command like that work for unraid as well?

If you've not added any addons, and not created a smb-extra.conf file on your config directory you can give their possible solution easily. Apparently something on your LAN is creating SAMBA connections, but not closing them. Eventually, it probably ends up with all the resources involved on your server being allocated, thus things stop working.

Type:

echo "deadtime = 60" >>/boot/config/smb-extra.conf

That will create the smb-extra.conf file if it does not exist, and append the line if it does.

You only need do this once, at the file will remain when the array is re-booted.

Then, re-start SAMBA. You can do that by typing

/boot/samba restart

Quote

August 13, 201213 yr

Author

Thank you for the advice, ill keep this thread updated. Ive entered the first command in but the 2nd didn't work so I rebooted the server as I thought that would do the same thing.

Quote

August 13, 201213 yr

Author

The tower is unresponsive through the web interface again.

It only happens when the server has been running the parity check for 20 minutes or so.

Quote

August 14, 201213 yr

Author

As I can't run a parity check successfully I have to assume I have no parity backup. Unless we can fix this quickly Im going to have to move away from unraid which isn't something Id like to do.

Quote

August 14, 201213 yr

i saw this in your syslog posted a while back.

Aug 10 08:35:17 Tower kernel: sas: command 0xf0517f00, task 0xf072e000, timed out: BLK_EH_NOT_HANDLED
Aug 10 08:35:17 Tower kernel: sas: Enter sas_scsi_recover_host
Aug 10 08:35:17 Tower kernel: sas: trying to find task 0xf072e000
Aug 10 08:35:17 Tower kernel: sas: sas_scsi_find_task: aborting task 0xf072e000
Aug 10 08:35:17 Tower kernel: sas: sas_scsi_find_task: querying task 0xf072e000
Aug 10 08:35:17 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1747:mvs_query_task:rc= 5
Aug 10 08:35:17 Tower kernel: sas: sas_scsi_find_task: task 0xf072e000 failed to abort
Aug 10 08:35:17 Tower kernel: sas: task 0xf072e000 is not at LU: I_T recover
Aug 10 08:35:17 Tower kernel: sas: I_T nexus reset for dev 0700000000000000
Aug 10 08:35:18 Tower kernel: sas: sas_form_port: phy7 belongs to port6 already(1)!
Aug 10 08:35:19 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[6]:rc= 0
Aug 10 08:35:19 Tower kernel: sas: I_T 0700000000000000 recovered
Aug 10 08:35:19 Tower kernel: sas: sas_ata_task_done: SAS error 8d
Aug 10 08:35:19 Tower kernel: ata15: sas eh calling libata port error handler
Aug 10 08:35:19 Tower kernel: ata16: sas eh calling libata port error handler
Aug 10 08:35:19 Tower kernel: ata17: sas eh calling libata port error handler
Aug 10 08:35:19 Tower kernel: ata18: sas eh calling libata port error handler
Aug 10 08:35:19 Tower kernel: ata19: sas eh calling libata port error handler
Aug 10 08:35:19 Tower kernel: ata20: sas eh calling libata port error handler
Aug 10 08:35:19 Tower kernel: ata21: sas eh calling libata port error handler
Aug 10 08:35:19 Tower kernel: sas: sas_ata_task_done: SAS error 2
Aug 10 08:35:19 Tower kernel: ata21: failed to read log page 10h (errno=-5)
Aug 10 08:35:19 Tower kernel: ata21.00: exception Emask 0x1 SAct 0x1 SErr 0x0 action 0x6 t0
Aug 10 08:35:19 Tower kernel: ata21.00: failed command: READ FPDMA QUEUED
Aug 10 08:35:19 Tower kernel: ata21.00: cmd 60/00:00:38:0b:56/02:00:07:00:00/40 tag 0 ncq 262144 in
Aug 10 08:35:19 Tower kernel:          res 01/04:04:38:09:56/00:00:07:00:00/40 Emask 0x3 (HSM violation)
Aug 10 08:35:19 Tower kernel: ata21.00: status: { ERR }
Aug 10 08:35:19 Tower kernel: ata21.00: error: { ABRT }
Aug 10 08:35:19 Tower kernel: ata21: hard resetting link
Aug 10 08:35:20 Tower kernel: sas: sas_form_port: phy7 belongs to port6 already(1)!
Aug 10 08:35:22 Tower kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[6]:rc= 0
Aug 10 08:35:22 Tower kernel: sas: sas_ata_hard_reset: Found ATA device.
Aug 10 08:35:22 Tower kernel: ata21.00: configured for UDMA/133
Aug 10 08:35:22 Tower kernel: ata21: EH complete
Aug 10 08:35:22 Tower kernel: sas: --- Exit sas_scsi_recover_host

what controller are these devices on (ata16-21)?

what version of unraid are you using presently?

Quote

August 14, 201213 yr

Author

Hi

I am using version 5 RC5 and I have 2 sata controller cards in my system.

Supermicro AOC-SASLP-MV8, 8-Port SAS/SATA Card found here;

http://www.scan.co.uk/products/supermicro-aoc-saslp-mv8-8-port-sas-sata-300mb-s-pci-e-controller-card

Quote

Red Dot

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)