April 1, 20188 yr Shortly after my monthly non-correcting parity check started this morning Disk 5 dropped offline. Investigating, I see that I have a second disk problem - Disk 4 has file system errors and there is also a batch of nginx error messages in my syslog. I saw the nginx error messages on another server recently since upgrading to 6.5.0 too. I ran extended SMART tests on all this server's drives only a few days ago and they were all fine. My plan is to abort the parity check, shut down and check Disk 5's cables and if it comes back online I'll run an extended SMART test on it while checking Disk 4's file system. The thing about Disk 5 is that it's just got an empty file system at the moment. I use the fill-up strategy and Disks 1 to 3 are full, Disk 4 is filling and Disk 5 is the next one. So I have the choice of rebuilding Disk 5 (either onto a new disk if it has failed or onto itself, if it is OK) or doing a new config without Disk 5 and adding it later. I think I'll go for rebuilding, even though it's an empty file system, because with dual parity my other disks will remain protected. Longer term, I will replace the old SASLP controller with a Dell H310 and add more disks - there are currently four array disks (of which Disks 4 and 5 are two) and two unassigned disks (which only see occasional use) attached. lapulapu-diagnostics-20180401-1140.zip
April 1, 20188 yr 14 minutes ago, John_M said: I will replace the old SASLP Yep, looks to me like the typical SASLP issue, i.e., dropping disks for no reason, assuming SMART for disk5 looks OK since it's not on the diags.
April 1, 20188 yr Author 2 minutes ago, johnnie.black said: Yep, looks to me like the typical SASLP issue, i.e., dropping disks for no reason, assuming SMART for disk5 looks OK since it's not on the diags. Like you, I have a small collection of these things and I just can't throw them away. It's a great shame because AFAIK they are the only 8-port controllers that work (well, are supposed to work!) out of the box. I take it they work OK with Windows? Disk 5 came back after a reboot and SMART looks fine. Now running an extended test. Disk 4 file system is repaired. The damage was minor but I think I'll replace that HBA sooner rather than later.
April 1, 20188 yr 3 minutes ago, John_M said: I take it they work OK with Windows? Probably, never had reason to try them though, I did try them on FreeNAS but both the SASLP and the SAS2LP don't work, no driver support.
April 4, 20188 yr Author I replaced my SASLP-MV8 with a Dell H310 today and took the opportunity to replace the nasty stiff red SATA cables that connect disks to the motherboard ports with nicer flexible ones. I reset the disk tunables to the defaults. I'm not sure whether these are optimal for the new HBA but I should think they are more appropriate than the custom ones I had for the SASLP.
April 5, 20188 yr You can use these, better than default: Tunable (md_num_stripes): 4096 Tunable (md_sync_window): 2048 Tunable (md_sync_thresh): 2000
April 5, 20188 yr Author 6 hours ago, johnnie.black said: You can use these, better than default: Tunable (md_num_stripes): 4096 Tunable (md_sync_window): 2048 Tunable (md_sync_thresh): 2000 Excellent, thank you. I know you've given this information before and I tried to search for it but hadn't been able to find it.
April 5, 20188 yr 10 hours ago, johnnie.black said: You can use these, better than default: Tunable (md_num_stripes): 4096 Tunable (md_sync_window): 2048 Tunable (md_sync_thresh): 2000 Is there some summary somewhere how these parameter values depends on different hardware?
April 5, 20188 yr 13 minutes ago, pwm said: Is there some summary somewhere how these parameter values depends on different hardware? There are a couple of threads where Tom explains what they do, original tunables: https://lime-technology.com/forums/topic/4473-unraid-server-release-45-beta8-available/?page=2&tab=comments#comment-41691 And the thresh tunable added later: https://lime-technology.com/forums/topic/42369-unraid-server-release-614-available/?do=findComment&comment=416597 As to those values I use them because in my testing they work well with almost any hardware I tested (except SASLP/SAS2LP) and much better than default values, especially for larger arrays, though it will use a little more RAM, but nothing of consequence for anyone with 4GB+.
April 5, 20188 yr Author The new HBA and tunables have taken more than two hours off my parity check time (5 TB). Nice! SASLP: 2018-03-01, 16:15:57 11 hr, 15 min, 56 sec 123.3 MB/s OK 0 H310: 2018-04-05, 23:40:52 9 hr, 11 min, 49 sec 151.0 MB/s OK 0
April 6, 20188 yr Author However... I'm still getting huge volumes of nginx-related errors like these Quote Apr 5 16:47:53 Lapulapu nginx: 2018/04/05 16:47:53 [crit] 7027#7027: ngx_slab_alloc() failed: no memory Apr 5 16:47:53 Lapulapu nginx: 2018/04/05 16:47:53 [error] 7027#7027: shpool alloc failed Apr 5 16:47:53 Lapulapu nginx: 2018/04/05 16:47:53 [error] 7027#7027: nchan: Out of shared memory while allocating message of size 5147. Increase nchan_max_reserved_memory. Apr 5 16:47:53 Lapulapu nginx: 2018/04/05 16:47:53 [error] 7027#7027: *30243 nchan: error publishing message (HTTP status code 500), client: unix:, server: , request: "POST /pub/disks?buffer_length=1 HTTP/1.1", host: "localhost" Apr 5 16:47:53 Lapulapu nginx: 2018/04/05 16:47:53 [error] 7027#7027: MEMSTORE:00: can't create shared message for channel /disks and updating to 6.5.1-rc3 hasn't got rid of them. lapulapu-diagnostics-20180406-0134.zip
Archived
This topic is now archived and is closed to further replies.