Jump to content

JuliusZet

Members
  • Posts

    20
  • Joined

  • Last visited

Posts posted by JuliusZet

  1. Here is a temporary solution until limetech has implemented this feature:

    10 minutes ago, JuliusZet said:

    Hey everyone!

    I have found an easy solution how you can both set a custom port and a password for VNC at the same time:

     

    1. Shut down the VM.
    2. Edit the VM and switch to XML View.
    3. Search for the <graphics type='vnc' ...> element.
    4. Set the autoport attribute to 'no' and change the attributes port and websocket to any unused ports.
      The line should now look something like this:
          <graphics type='vnc' port='5900' autoport='no' websocket='5700' listen='0.0.0.0' keymap='de'>
    5. Append the attribute named "passwd".
      The line should now look something like this:
          <graphics type='vnc' port='5900' autoport='no' websocket='5700' listen='0.0.0.0' keymap='de' passwd='Cd8B8fmCehbdaFLfCsfZyTL6'>
    6. Click the Update button at the bottom of the page to apply the changes.
    7. Start the VM.

    You might notice, that the "passwd" attribute and value are no longer visible, after you clicked the Update button.

    But looking at the XML file itself from the console reveals, that they are actually there.

    (You can type "cat /etc/libvirt/qemu/your-vm-name.xml" into console to verify that.)

    I assume, that the unRAID Web GUI hides the "passwd" attribute and value for security reasons.

     

    This is however not an ideal solution, because every time you want to change a setting you would have to do this again.

    @limetech, it would be great, if you could add an option to change the VNC port of a VM from within the Form View. This would fix the issue.

     

    • Like 2
    • Thanks 2
  2. Hey everyone!

    I have found an easy solution how you can both set a custom port and a password for VNC at the same time:

     

    1. Shut down the VM.
    2. Edit the VM and switch to XML View.
    3. Search for the <graphics type='vnc' ...> element.
    4. Set the autoport attribute to 'no' and change the attributes port and websocket to any unused ports.
      The line should now look something like this:
          <graphics type='vnc' port='5900' autoport='no' websocket='5700' listen='0.0.0.0' keymap='de'>
    5. Append the attribute named "passwd".
      The line should now look something like this:
          <graphics type='vnc' port='5900' autoport='no' websocket='5700' listen='0.0.0.0' keymap='de' passwd='Cd8B8fmCehbdaFLfCsfZyTL6'>
    6. Click the Update button at the bottom of the page to apply the changes.
    7. Start the VM.

    You might notice, that the "passwd" attribute and value are no longer visible, after you clicked the Update button.

    But looking at the XML file itself from the console reveals, that they are actually there.

    (You can type "cat /etc/libvirt/qemu/your-vm-name.xml" into console to verify that.)

    I assume, that the unRAID Web GUI hides the "passwd" attribute and value for security reasons.

     

    This is however not an ideal solution, because every time you want to change a setting you would have to do this again.

    @limetech, it would be great, if you could add an option to change the VNC port of a VM from within the Form View. This would fix the issue.

  3. 22 hours ago, ken-ji said:

    I can only guess here, but are all the VM interfaces on the same physical network?

    Yes, everything is on the same network. This is my network topology:

     

    979202288_NetworkTopology.thumb.png.d8a3c67c244506ad9b9171b74d06025a.png

     

    About my future plans to work around this issue:

     

    I bought myself a new router that can handle multiple networks. (Believe it or not, my current router can not do this ...) I want every interface to have its own network. I will set everything up next weekend. Let's see if this resolves the issue.

     

    Thank you very much for your assistance by the way! Very much appreciated! :) Have a great day!

  4. 13 hours ago, ken-ji said:

    How are your interfaces configured? I have only two ports with a similar configuration - only one management IP on br0 and no other IPs on br1 - and my managed switch has never seen the wrong mac address on the other interface.

    Thanks for your reply!

     

    I attached my diagnostics zip file, that includes information about my interface configuration.

     

    unraid-server-diagnostics-20190221-0853.zip

     

    I assume the issue is related to a bug in unRAID or in the driver for the network cards.

  5. I have a similar issue, probably related to the same bug in unRAID. Have a look at the following trace captured from my router:

     

    Wireshark_jaljnEDqNt.thumb.png.a51ce5163537a94dc7a9b58064ec5687.png

     

    I have one management interface for unRAID itself (192.168.10.10) and 9 interfaces to be used by VMs (no IP addresses).

     

    However every time my router sends out an ARP request to find out the MAC address for the unRAID management interface, every interface responses. Only one of these responses has the right MAC address in it. So at the moment I can only reach my unRAID web GUI und network shares if I'm lucky ...

     

    I could not find out yet why this happens. Maybe someone else has any clues?

  6. Hi there,

     

    I am having issues with my HP ProLiant DL380p Gen8 Server. For more details you can quickly scan over this thread here:

    Although I kind of solved the issue in the end by changing some tunables, at least so that parity checks work now, there are still issues persisting. For example:

     

    - High CPU usage (~25 % system) when working with PCIe lanes (Transferring data through the HBA, Downloading files through the Network card)

    - Very high CPU usage (>40 % in idle) when turning on VMs with many CPU cores passed through from different CPUs.

     

    My server basically works but I want to solve this issue. My old server (HP ProLiant DL180 G6) did not have this issue.

     

    I thought maybe someone here also has a ProLiant server and has / had the same issue?

     

    I will provide further details if needed.

  7. 1 hour ago, johnnie.black said:

    Not likely, but it won't hurt to try, use these:

     

    Tunable (md_num_stripes): 4096
    Tunable (md_sync_window): 2048
    Tunable (md_sync_thresh): 2000

     

    Hi johnnie.black,

     

    I just figured it out!

     

    What I just did is I applied your Tunables from your previous post and started a Parity Check. The NMIs started immediately. Canceled the Partiy Check and started it again just to double check ... Yep ... disk speeds are bad and CPU usage is high right from the start of the Parity Check.

    Then I reset the Tunable values to default. Started the Parity Check again and ... it happened like before: At first everything looks normal (disk speeds as expected from 4 HDDs and CPU usage ~2 %) but after a few seconds the NMIs start.

     

    So I went like: Hmm... When I increase the Tunables it goes worse. What if I now decreased them? I thought taking the default values and halving them would be a good start:

     

    Tunable (nr_requests): 64

    Tunable (md_num_stripes): 640
    Tunable (md_sync_window): 192
    Tunable (md_sync_thresh): 96

     

    Wow! I did not think that this would work! Look at the system stats graphs:

    Screenshot_1.thumb.png.2f0da8065c2ac1bc47a6fc20c536db3a.png

     

    The only thing I need to figure out now is what these strange "spikes" are. The syslog shows no NMIs, no errors, nothing! Don't get me wrong, I'm really very glad that it is finally working! However these spikes are not normal.

     

    I think that it would help if I had some up-to-date information about Tunables what exactly they stand for and what they do. I'm sure that this was discussed already on the forum somewhere but I can't find it. Could you link a post where it is explained? That would be very great!

     

    Thank you very much!

  8. @johnnie.black Do you think that this issue could be related to Tunables? I did not change them bacause I do not understand what they do and the wikipedia and forum posts are outdated (http://lime-technology.com/wiki/Improving_unRAID_Performance#User_Tunables and

    ). So mine are all set to default values. Are there Tunables that I could change to try and fix this issue? I am sorry if I am bothering you but I really want to fix this issue and I dont't know how I should get started and who else I could talk to.

  9. When I put some real load on the server like starting many VMs simultaniously I see this in my syslog:

     

    Jul 26 21:09:36 unRAID-Server kernel: perf: interrupt took too long (4229 > 2500), lowering kernel.perf_event_max_sample_rate to 47000
    Jul 26 21:09:37 unRAID-Server kernel: perf: interrupt took too long (6320 > 5286), lowering kernel.perf_event_max_sample_rate to 31000
    Jul 26 21:09:46 unRAID-Server kernel: perf: interrupt took too long (8617 > 7900), lowering kernel.perf_event_max_sample_rate to 23000
    Jul 26 21:09:51 unRAID-Server kernel: perf: interrupt took too long (12258 > 10771), lowering kernel.perf_event_max_sample_rate to 16000
    Jul 26 21:09:56 unRAID-Server kernel: perf: interrupt took too long (16051 > 15322), lowering kernel.perf_event_max_sample_rate to 12000
    Jul 26 21:10:08 unRAID-Server kernel: perf: interrupt took too long (21657 > 20063), lowering kernel.perf_event_max_sample_rate to 9000
    Jul 26 21:10:25 unRAID-Server kernel: perf: interrupt took too long (27495 > 27071), lowering kernel.perf_event_max_sample_rate to 7000
    Jul 26 21:11:06 unRAID-Server kernel: perf: interrupt took too long (35995 > 34368), lowering kernel.perf_event_max_sample_rate to 5000
    Jul 26 21:12:45 unRAID-Server kernel: perf: interrupt took too long (46427 > 44993), lowering kernel.perf_event_max_sample_rate to 4000
    Jul 26 21:15:29 unRAID-Server kernel: perf: interrupt took too long (58952 > 58033), lowering kernel.perf_event_max_sample_rate to 3000
    Jul 26 21:19:25 unRAID-Server kernel: perf: interrupt took too long (76236 > 73690), lowering kernel.perf_event_max_sample_rate to 2000

     

    Maybe that has got something to do with my NMIs?

  10. 30 minutes ago, johnnie.black said:

    It's likely the iLO Event log, I wouldn't expect NMIs to be logged, but there could be some other hardware issue logged.

    Nope, there is nothing unusual there.

     

    1 hour ago, pwm said:

    Have you tried to move the card to another slot? If maybe you get an interrupt collision where the wrong driver gets activated and starts looking at hardware not even involved in the disk copy operation.

     

    Edit: And do you have hardware on the motherboard that you don't need and can turn off in the BIOS - audio? serial ports? Additional SATA controller? ...

    What I did in the meantime:

    - BIOS Reset

    - Deactivated all unneccessary devices (on-board SATA Controller + on-board RAID-Controller)

    - Re-created the unRAID-USB-flash device

     

    But that didn't seem to help at all.

     

    Screenshot_1.thumb.png.7d6ec152358daed592917e80d2e09f8b.png

     

    An excerpt from my syslog during the parity check (started at 19:41:20):

    Jul 26 19:41:20 unRAID-Server emhttpd: req (2): startState=STARTED&file=&cmdCheck=Check&optionCorrect=correct&csrf_token=****************
    Jul 26 19:41:20 unRAID-Server kernel: mdcmd (40): check correct
    Jul 26 19:41:20 unRAID-Server kernel: md: recovery thread: check P ...
    Jul 26 19:41:20 unRAID-Server kernel: md: using 1536k window, over a total of 1953514552 blocks.
    Jul 26 19:42:12 unRAID-Server sSMTP[5210]: Creating SSL connection to host
    Jul 26 19:42:12 unRAID-Server sSMTP[5210]: SSL connection using ECDHE-RSA-AES256-GCM-SHA384
    Jul 26 19:42:14 unRAID-Server sSMTP[5210]: Sent mail for [email protected] (221 2.0.0 fwd26.t-online.de closing. / Closing.) uid=0 username=root outbytes=760
    Jul 26 19:42:46 unRAID-Server kernel: INFO: rcu_sched self-detected stall on CPU
    Jul 26 19:42:46 unRAID-Server kernel: 	21-...: (59999 ticks this GP) idle=1ee/140000000000001/0 softirq=1792/1792 fqs=13455 
    Jul 26 19:42:46 unRAID-Server kernel: 	 (t=60001 jiffies g=6532 c=6531 q=24126)
    Jul 26 19:42:46 unRAID-Server kernel: NMI backtrace for cpu 21
    Jul 26 19:42:46 unRAID-Server kernel: CPU: 21 PID: 3954 Comm: unraidd Not tainted 4.14.49-unRAID #1
    Jul 26 19:42:46 unRAID-Server kernel: Hardware name: HP ProLiant DL380p Gen8, BIOS P70 05/21/2018
    Jul 26 19:42:46 unRAID-Server kernel: Call Trace:
    Jul 26 19:42:46 unRAID-Server kernel: <IRQ>
    Jul 26 19:42:46 unRAID-Server kernel: dump_stack+0x5d/0x79
    Jul 26 19:42:46 unRAID-Server kernel: INFO: rcu_sched detected stalls on CPUs/tasks:
    Jul 26 19:42:46 unRAID-Server kernel: nmi_cpu_backtrace+0x9b/0xba
    Jul 26 19:42:46 unRAID-Server kernel: ? irq_force_complete_move+0xf3/0xf3
    Jul 26 19:42:46 unRAID-Server kernel: nmi_trigger_cpumask_backtrace+0x56/0xd4
    Jul 26 19:42:46 unRAID-Server kernel: rcu_dump_cpu_stacks+0x8e/0xb8
    Jul 26 19:42:46 unRAID-Server kernel: rcu_check_callbacks+0x212/0x5f0
    Jul 26 19:42:46 unRAID-Server kernel: update_process_times+0x23/0x45
    Jul 26 19:42:46 unRAID-Server kernel: tick_sched_timer+0x33/0x61
    Jul 26 19:42:46 unRAID-Server kernel: __hrtimer_run_queues+0x78/0xc1
    Jul 26 19:42:46 unRAID-Server kernel: hrtimer_interrupt+0x87/0x157
    Jul 26 19:42:46 unRAID-Server kernel: smp_apic_timer_interrupt+0x75/0x85
    Jul 26 19:42:46 unRAID-Server kernel: apic_timer_interrupt+0x7d/0x90
    Jul 26 19:42:46 unRAID-Server kernel: </IRQ>
    Jul 26 19:42:46 unRAID-Server kernel: RIP: 0010:memcmp+0x2/0x1d
    Jul 26 19:42:46 unRAID-Server kernel: RSP: 0018:ffffc9000727bcd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
    Jul 26 19:42:46 unRAID-Server kernel: RAX: 0000000000000000 RBX: ffff88080578bd20 RCX: 0000000000000409
    Jul 26 19:42:46 unRAID-Server kernel: RDX: 0000000000000ff8 RSI: ffff8808057cc008 RDI: ffff8808057cc000
    Jul 26 19:42:46 unRAID-Server kernel: RBP: 0000000000000258 R08: 0000000000000000 R09: ffff8808057cc000
    Jul 26 19:42:46 unRAID-Server kernel: R10: ffff8808057cb000 R11: ffff8808057ca000 R12: ffff88081a045800
    Jul 26 19:42:46 unRAID-Server kernel: R13: 0000000000000001 R14: 0000000000000003 R15: ffff8808057cc000
    Jul 26 19:42:46 unRAID-Server kernel: check_parity+0x14f/0x30b [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: handle_stripe+0xefc/0x1293 [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: unraidd+0xb8/0x111 [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: ? md_open+0x2c/0x2c [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: ? md_thread+0xbc/0xcc [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: ? handle_stripe+0x1293/0x1293 [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: md_thread+0xbc/0xcc [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: ? wait_woken+0x68/0x68
    Jul 26 19:42:46 unRAID-Server kernel: kthread+0x111/0x119
    Jul 26 19:42:46 unRAID-Server kernel: ? kthread_create_on_node+0x3a/0x3a
    Jul 26 19:42:46 unRAID-Server kernel: ret_from_fork+0x35/0x40
    Jul 26 19:42:46 unRAID-Server kernel: 	21-...: (59999 ticks this GP) idle=1ee/140000000000001/0 softirq=1792/1792 fqs=13456 
    Jul 26 19:42:46 unRAID-Server kernel: 	(detected by 26, t=60005 jiffies, g=6532, c=6531, q=24126)
    Jul 26 19:42:46 unRAID-Server kernel: Sending NMI from CPU 26 to CPUs 21:
    Jul 26 19:42:46 unRAID-Server kernel: NMI backtrace for cpu 21
    Jul 26 19:42:46 unRAID-Server kernel: CPU: 21 PID: 3954 Comm: unraidd Not tainted 4.14.49-unRAID #1
    Jul 26 19:42:46 unRAID-Server kernel: Hardware name: HP ProLiant DL380p Gen8, BIOS P70 05/21/2018
    Jul 26 19:42:46 unRAID-Server kernel: task: ffff88081ad53600 task.stack: ffffc90007278000
    Jul 26 19:42:46 unRAID-Server kernel: RIP: 0010:memcmp+0x2/0x1d
    Jul 26 19:42:46 unRAID-Server kernel: RSP: 0018:ffffc9000727bcd0 EFLAGS: 00000246
    Jul 26 19:42:46 unRAID-Server kernel: RAX: 0000000000000000 RBX: ffff88080578bd20 RCX: 0000000000000fba
    Jul 26 19:42:46 unRAID-Server kernel: RDX: 0000000000000ff8 RSI: ffff8808057cc008 RDI: ffff8808057cc000
    Jul 26 19:42:46 unRAID-Server kernel: RBP: 0000000000000258 R08: 0000000000000000 R09: ffff8808057cc000
    Jul 26 19:42:46 unRAID-Server kernel: R10: ffff8808057cb000 R11: ffff8808057ca000 R12: ffff88081a045800
    Jul 26 19:42:46 unRAID-Server kernel: R13: 0000000000000001 R14: 0000000000000003 R15: ffff8808057cc000
    Jul 26 19:42:46 unRAID-Server kernel: FS:  0000000000000000(0000) GS:ffff88081f8c0000(0000) knlGS:0000000000000000
    Jul 26 19:42:46 unRAID-Server kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jul 26 19:42:46 unRAID-Server kernel: CR2: 0000151a721d77a0 CR3: 0000000001c0a001 CR4: 00000000001606e0
    Jul 26 19:42:46 unRAID-Server kernel: Call Trace:
    Jul 26 19:42:46 unRAID-Server kernel: check_parity+0x14f/0x30b [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: handle_stripe+0xefc/0x1293 [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: unraidd+0xb8/0x111 [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: ? md_open+0x2c/0x2c [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: ? md_thread+0xbc/0xcc [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: ? handle_stripe+0x1293/0x1293 [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: md_thread+0xbc/0xcc [md_mod]
    Jul 26 19:42:46 unRAID-Server kernel: ? wait_woken+0x68/0x68
    Jul 26 19:42:46 unRAID-Server kernel: kthread+0x111/0x119
    Jul 26 19:42:46 unRAID-Server kernel: ? kthread_create_on_node+0x3a/0x3a
    Jul 26 19:42:46 unRAID-Server kernel: ret_from_fork+0x35/0x40
    Jul 26 19:42:46 unRAID-Server kernel: Code: 48 63 c1 4c 39 c0 73 19 49 8b 3c c1 48 85 ff 74 10 4c 89 d6 e8 71 ff ff ff 84 c0 75 09 ff c1 eb df b9 ea ff ff ff 89 c8 c3 31 c9 <48> 39 d1 74 13 0f b6 04 0f 44 0f b6 04 0e 48 ff c1 44 29 c0 74 
    Jul 26 19:43:14 unRAID-Server emhttpd: req (3): startState=STARTED&file=&csrf_token=****************&cmdNoCheck=Cancel
    Jul 26 19:43:14 unRAID-Server kernel: mdcmd (41): nocheck 
    Jul 26 19:43:15 unRAID-Server kernel: md: md_do_sync: got signal, exit...
    Jul 26 19:43:15 unRAID-Server kernel: md: recovery thread: completion status: -4
    Jul 26 19:44:01 unRAID-Server sSMTP[5754]: Creating SSL connection to host
    Jul 26 19:44:01 unRAID-Server sSMTP[5754]: SSL connection using ECDHE-RSA-AES256-GCM-SHA384
    Jul 26 19:44:02 unRAID-Server sSMTP[5754]: Sent mail for [email protected] (221 2.0.0 fwd14.t-online.de closing. / Closing.) uid=0 username=root outbytes=78
    

     

    Edit: I saw some "Advanced CPU Settings" in the BIOS but I did not touch them. Maybe some of these setting is causing those errors?

  11. 2 hours ago, johnnie.black said:

    That's bad news, I would guess then it's a problem with the server/board, check to see if there's a system event log, usually server boards have them, there might be more info there.

    I can only find tow kinds of logs:

    - The "iLO Event Log" which shows me stuff like "Server reset." or "Power on request received by: Automatic Power Recovery.".

    - The "Integrated Management Log" which shows me stuff like "Firmware flashed (ProLiant System BIOS - P70 05/21/2018)" or "Maintenance note: Intelligent Provisioning was loaded."

     

    I can not find anything unusual related to NMIs there.

    Where would I find the system event log you mentioned earlier?

  12. 9 minutes ago, pwm said:

    Have you tried to move the card to another slot? If maybe you get an interrupt collision where the wrong driver gets activated and starts looking at hardware not even involved in the disk copy operation.

     

    Yes, I did this with my HP H240 before. Same results. :(

     

    10 minutes ago, pwm said:

    Edit: And do you have hardware on the motherboard that you don't need and can turn off in the BIOS - audio? serial ports? Additional SATA controller? ...

    Yes, that's a good idea! I will give this a try now.

     

    Thank you for your participation and have a great day!

  13. On 7/17/2018 at 8:58 AM, johnnie.black said:

    NMIs are still happening during the check, but the current controller uses the same driver as the previous one, LSI will use a different driver, so if the issues are related to the controller the LSI should work without problems, but can't say for sure.

    Hello again,

     

    my LSI SAS 9207-8i just arrived. I have installed it and it works perfectly. Except during Parity Checks... NMIs are still there.

     

    Screenshot_1.thumb.png.b551c082f1d26edfb1e314837f762c36.png

     

    Even the plugin system.stats is outputting improssible values because of the "system overload".

     

    I dont't even know anymore what to do now. I'm very frustrated right now.

    unraid-server-diagnostics-20180726-1558.zip

    unraid-server-syslog-20180726-1558.zip

  14. 12 hours ago, johnnie.black said:

    I would recommended getting one the recommended LSI HBA models, any LSI with a SAS2008/2308/3008 chipset in IT mode, e.g., 9201-8i, 9211-8i, 9207-8i, 9300-8i, etc and clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflashed.

    Yesterday I uninstalled the HP H240 HBA and connected the SFF-8087 cables from the backplane to the embedded HP P420i (which I previously configured to operate in HBA mode).

     

    Overnight I successfully completed a Parity Check, however I noticed some strange "spikes":

     

    1676768952_Unbenannt2.thumb.PNG.c98dc5c3baf40b87b1bf4466f0c53d02.PNG

     

    It's the same thing happening here like with the HP H240 HBA. The disks read speeds suddenly drop while the CPU usage rises. The only difference here is that with the HP H240 HBA things remained bad. With the HP P420i it looks like it could sort of "recover" somehow.

     

    I am suspecting the new SFF-8087 cables since I had to swap the original ones that worked perfectly with the embedded controller with the ones that came with the new HP H240 HBA. (The original cables had angled connectors so they didn't fit in the ports of the HBA.)

     

    Now I have two questions:

    - Could this issue be related to defective cables / loose connections?

    - I'm thinking about buying an LSI SAS 9207-8i. Are there chances that this issue still persists with the LSI HBA?

  15. 5 hours ago, johnnie.black said:

    Various NMI events, these are usually hardware related, you can try looking for a bios update, using the controller in a different slot or replacing the controller by a different model.

    Updating the BIOS and changing the PCIe slot of the controller did not fix the issue. The problem still persists.

     

    Could someone explain what I am experiencing here? Could this be a driver issue?

  16. 32 minutes ago, johnnie.black said:

    Various NMI events, these are usually hardware related, you can try looking for a bios update, using the controller in a different slot or replacing the controller by a different model.

    Thank you very much for your reply!

     

    When I get home, I am going to

    - update my BIOS

    - see if that fixed the issue, if not:

    - put the controller in a different slot

     

    Have a great day!

  17. Hello everyone,

     

    I am running a HP ProLiant DL380p Gen8 Server with the following Hardware-Configuration:

    - CPU: 2x Intel® Xeon® CPU E5-2680 v2 @ 2.80GHz (10 cores / 20 threads per CPU )

    - RAM: 64 GB Single-bit ECC (8x 8 GB DDR3-1333)

    - Storage-Controllers:

      - HP Smart Array P420i Controller (embedded, not in use)

      - HP Smart HBA H240 (in PCIe x8 Slot Number 6/6, in use)

    - Storage:

      - 4x Seagate IronWolf Pro 2 TB (Server-HDDs)

      - 2x Samsung SM863 480 GB (Server-SSDs)

     

    Last weekend I got myself a new Storage-Controller, the HP Smart HBA H240. Previously I was using the embedded controller (HP Smart Array P420i Controller) in HBA mode. However speeds were not as I expected, that's why I bought a plane HBA. With its firmware updated to the latest version it just works great!

     

    But today, along with my first scheduled Parity Check at 4:00 CEST, I ran into a problem: Disk Speeds were at < 10 MB/sec and CPU usage was very high. The WebGUI, the Terminal and all VMs were therefore very unresponsive.

     

    This is how it looks like, when I start a Parity Check manually:

     

    Unbenannt.thumb.PNG.37c77cef7fb08d01d09ce7c9221becf7.PNG

     

    At first everything looks normal. But not even a minute into the Parity-Check the disk speeds suddenly go below 10 MB/sec while at the exact same time the CPU usage rises.

     

    Here is an excerpt of my syslog from the time, where I started the Parity-Check:

     

    Jul 16 12:03:18 unRAID-Server emhttpd: req (7): startState=STARTED&file=&cmdCheck=Check&optionCorrect=correct&csrf_token=****************
    Jul 16 12:03:18 unRAID-Server kernel: mdcmd (52): check correct
    Jul 16 12:03:18 unRAID-Server kernel: md: recovery thread: check P ...
    Jul 16 12:03:18 unRAID-Server kernel: md: using 1536k window, over a total of 1953514552 blocks.
    Jul 16 12:04:01 unRAID-Server sSMTP[25194]: Creating SSL connection to host
    Jul 16 12:04:01 unRAID-Server sSMTP[25194]: SSL connection using ECDHE-RSA-AES256-GCM-SHA384
    Jul 16 12:04:03 unRAID-Server sSMTP[25194]: Sent mail for [email protected] (221 2.0.0 fwd30.t-online.de closing. / Closing.) uid=0 username=root outbytes=760
    Jul 16 12:06:01 unRAID-Server kernel: INFO: rcu_sched self-detected stall on CPU
    Jul 16 12:06:01 unRAID-Server kernel: 	35-...: (59999 ticks this GP) idle=216/140000000000001/0 softirq=24868/24868 fqs=13592 
    Jul 16 12:06:01 unRAID-Server kernel: 	 (t=60001 jiffies g=118852 c=118851 q=22580)
    Jul 16 12:06:01 unRAID-Server kernel: NMI backtrace for cpu 35
    Jul 16 12:06:01 unRAID-Server kernel: CPU: 35 PID: 4029 Comm: unraidd Not tainted 4.14.49-unRAID #1
    Jul 16 12:06:01 unRAID-Server kernel: Hardware name: HP ProLiant DL380p Gen8, BIOS P70 01/22/2018
    Jul 16 12:06:01 unRAID-Server kernel: Call Trace:
    Jul 16 12:06:01 unRAID-Server kernel: <IRQ>
    Jul 16 12:06:01 unRAID-Server kernel: dump_stack+0x5d/0x79
    Jul 16 12:06:01 unRAID-Server kernel: INFO: rcu_sched detected stalls on CPUs/tasks:
    Jul 16 12:06:01 unRAID-Server kernel: nmi_cpu_backtrace+0x9b/0xba
    Jul 16 12:06:01 unRAID-Server kernel: ? irq_force_complete_move+0xf3/0xf3
    Jul 16 12:06:01 unRAID-Server kernel: nmi_trigger_cpumask_backtrace+0x56/0xd4
    Jul 16 12:06:01 unRAID-Server kernel: rcu_dump_cpu_stacks+0x8e/0xb8
    Jul 16 12:06:01 unRAID-Server kernel: rcu_check_callbacks+0x212/0x5f0
    Jul 16 12:06:01 unRAID-Server kernel: update_process_times+0x23/0x45
    Jul 16 12:06:01 unRAID-Server kernel: tick_sched_timer+0x33/0x61
    Jul 16 12:06:01 unRAID-Server kernel: __hrtimer_run_queues+0x78/0xc1
    Jul 16 12:06:01 unRAID-Server kernel: hrtimer_interrupt+0x87/0x157
    Jul 16 12:06:01 unRAID-Server kernel: smp_apic_timer_interrupt+0x75/0x85
    Jul 16 12:06:01 unRAID-Server kernel: apic_timer_interrupt+0x7d/0x90
    Jul 16 12:06:01 unRAID-Server kernel: </IRQ>
    Jul 16 12:06:01 unRAID-Server kernel: RIP: 0010:xor_avx_4+0x53/0x2d8
    Jul 16 12:06:01 unRAID-Server kernel: RSP: 0018:ffffc9000909bca0 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff10
    Jul 16 12:06:01 unRAID-Server kernel: RAX: ffff880809239000 RBX: 0000000000000000 RCX: ffff880809237000
    Jul 16 12:06:01 unRAID-Server kernel: RDX: ffff880809236000 RSI: ffff880809239000 RDI: 0000000000001000
    Jul 16 12:06:01 unRAID-Server kernel: RBP: ffff880809237000 R08: ffff880809238000 R09: ffff880809238000
    Jul 16 12:06:01 unRAID-Server kernel: R10: ffff880809237000 R11: ffff880809236000 R12: ffff880809236000
    Jul 16 12:06:01 unRAID-Server kernel: R13: ffff880809239000 R14: 0000000000000003 R15: ffff880809239000
    Jul 16 12:06:01 unRAID-Server kernel: check_parity+0x125/0x30b [md_mod]
    Jul 16 12:06:01 unRAID-Server kernel: handle_stripe+0xefc/0x1293 [md_mod]
    Jul 16 12:06:01 unRAID-Server kernel: unraidd+0xb8/0x111 [md_mod]
    Jul 16 12:06:01 unRAID-Server kernel: ? md_open+0x2c/0x2c [md_mod]
    Jul 16 12:06:01 unRAID-Server kernel: ? md_thread+0xbc/0xcc [md_mod]
    Jul 16 12:06:01 unRAID-Server kernel: ? handle_stripe+0x1293/0x1293 [md_mod]
    Jul 16 12:06:01 unRAID-Server kernel: md_thread+0xbc/0xcc [md_mod]
    Jul 16 12:06:01 unRAID-Server kernel: ? wait_woken+0x68/0x68
    Jul 16 12:06:01 unRAID-Server kernel: kthread+0x111/0x119
    Jul 16 12:06:01 unRAID-Server kernel: ? kthread_create_on_node+0x3a/0x3a
    Jul 16 12:06:01 unRAID-Server kernel: ? SyS_exit_group+0xb/0xb
    Jul 16 12:06:01 unRAID-Server kernel: ret_from_fork+0x35/0x40
    Jul 16 12:06:01 unRAID-Server kernel: 	35-...: (59999 ticks this GP) idle=216/140000000000001/0 softirq=24868/24868 fqs=13593 
    Jul 16 12:06:01 unRAID-Server kernel: 	(detected by 3, t=60011 jiffies, g=118852, c=118851, q=22604)
    Jul 16 12:06:01 unRAID-Server kernel: Sending NMI from CPU 3 to CPUs 35:
    Jul 16 12:06:01 unRAID-Server kernel: NMI backtrace for cpu 35
    Jul 16 12:06:01 unRAID-Server kernel: CPU: 35 PID: 4029 Comm: unraidd Not tainted 4.14.49-unRAID #1
    Jul 16 12:06:01 unRAID-Server kernel: Hardware name: HP ProLiant DL380p Gen8, BIOS P70 01/22/2018
    Jul 16 12:06:01 unRAID-Server kernel: task: ffff88081b485100 task.stack: ffffc90009098000
    Jul 16 12:06:01 unRAID-Server kernel: RIP: 0010:memcmp+0x7/0x1d
    Jul 16 12:06:01 unRAID-Server kernel: RSP: 0018:ffffc9000909bcd0 EFLAGS: 00000287
    Jul 16 12:06:01 unRAID-Server kernel: RAX: 0000000000000000 RBX: ffff88080a1fcc68 RCX: 00000000000000eb
    Jul 16 12:06:01 unRAID-Server kernel: RDX: 0000000000000ff8 RSI: ffff880809239008 RDI: ffff880809239000
    Jul 16 12:06:01 unRAID-Server kernel: RBP: 0000000000000258 R08: 0000000000000000 R09: ffff880809239000
    Jul 16 12:06:01 unRAID-Server kernel: R10: ffff880809238000 R11: ffff880809237000 R12: ffff880819073c00
    Jul 16 12:06:01 unRAID-Server kernel: R13: 0000000000000001 R14: 0000000000000003 R15: ffff880809239000
    Jul 16 12:06:01 unRAID-Server kernel: FS:  0000000000000000(0000) GS:ffff88103f7c0000(0000) knlGS:0000000000000000
    Jul 16 12:06:01 unRAID-Server kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jul 16 12:06:01 unRAID-Server kernel: CR2: 000014fa5b793000 CR3: 0000000001c0a004 CR4: 00000000001606e0
    

     

    I have also attached my diagnostics as well as my full syslog. If you need further details please let me know.

     

    I would be very thankful to everybody helping me out here!

     

    Best regards

    JuliusZet

    unraid-server-diagnostics-20180716-1232.zip

    unraid-server-syslog-20180716-1228.zip

×
×
  • Create New...