Call Traces on new 6.3.5 install


snowmirage

Recommended Posts

I'm starting a new build using an EVGA SR-2 motherboard I plan to pass a LSI controller and a couple video cards through to VMs.  After the install install I setup the "Fix Common Problems" plugin and its reporting an Error
 

Call Traces found on your server
 

 

Your server has issued one or more call traces. This could be caused by a Kernel Issue, Bad Memory, etc. You should post your diagnostics and ask for assistance on the unRaid forums

 

tower-diagnostics-20170910-1742.zip

 
I've attached the diagnostic files above.  If someone could help give me an idea what may be going on I'd greatly appreciate the help.  

The first VM I'm trying to setup in unraid is failing to boot the OS install (FreeNAS) and the errors seem to indicate possible memory errors on the off chance I have some bad memory I'll run MemTest overnight and see if that finds any errors.
Link to comment

Do you get the same error with PCIe ACS overrides disabled since I see you have it enabled?  You really should try with it disabled unless there is no other way to get the IOMMU setup the way you want.  My livingroom server for instance I can pass the iGPU without using PCIe ACS overrides.

 

But the first call trace in your log is immediately after this:

"WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 64MB of RAM."

 

Since it says a BIOS bug I would look for a BIOS update for your MB.

  • Like 1
Link to comment
15 hours ago, BobPhoenix said:

Do you get the same error with PCIe ACS overrides disabled since I see you have it enabled?

If you are referring to the message alerting me there are Call Traces then YES.  I saw that before enabling PCIe ACS overrides
If you mean the errors in the VM, I do not know I only tried a couple times and ACS override was always on.  I will turn it off and see if it changes anything.

 

 

15 hours ago, BobPhoenix said:

But the first call trace in your log is immediately after this:

"WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 64MB of RAM."

 

Since it says a BIOS bug I would look for a BIOS update for your MB.

O no :(     Now I am scared.   I'm fairly sure one of the first things I did when I got this beauty of a motherboard was update the bios.  If thats the case I'm afraid the root of the problems may be beyond my control and I'll be one very .... very...... very sad panda.   But I will check the bios version as soon as I can *crossing all fingers and toes*


*EDIT*
BTW I ran Passmarks Memtest for about 12hrs and no errors.  So the memory appears to be ok.

Edited by snowmirage
Link to comment
5 hours ago, snowmirage said:

If you are referring to the message alerting me there are Call Traces then YES.

That's what I kind of figured since many run with that enabled with no problems but I had to ask as that is what I would try first if I hadn't see the same error with it off.

 

If you don't have the most currently BIOS I would upgrade it. 

 

If you do have the most current then I would contact the manufacturer and see if they have ideas. 

 

I had to DOWNGRADE the bios on my Tyan S5512 MBs to get a TV Tuner card to be recognized.  However I tried to downgrade a SuperMicro X9SCM without seeking help first and bricked my MB.  So when I encountered the same problem after switching to the Tyan MB I emailed Tyan tech support and they gave me instructions on how to properly downgrade the bios.  I would suggest you do the same if you decide to try to downgrade your bios.  Upgrades usually don't require any help from manufacturer since they usually take into account any problem you might get in the upgrade procedure.

Link to comment
1 hour ago, LordShad0w said:

I just got the same thing when I checked today. Any help would be appreciated as I have not changed anything in my configuration and things have been running well.

tower-diagnostics-20170911-0813.zip

Yours had the following error at the call trace:

WARNING: CPU: 2 PID: 3434 at fs/btrfs/extent-tree.c:134 btrfs_put_block_group+0x42/0x59

Leads me to think it is a file system problem.  I do NOT know however if it is your docker image or from a disk you might have formatted with btrfs.  If you have your cache drive in a cache pool then I would try checking the file system to see if any errors show up.  If your cache drive and array drives are all XFS then I would delete your docker image and recreate your dockers.

 

Someone else that is better at reading call traces may have another better idea.

Link to comment

I upgrade the BIOS to the latest yesterday and at first I thought I was still seeing the call trace error.  Then it occured to me what I may have been seeing was still the "we saw an error at one point" flag, from the previous error.

I acknowledged that error and waited a while and didn't see anything come up (no error in that app complaining about call traces).  I also before the bios upgrade disabled PCIe ACS overrides.

After that I was able to boot the freenas VM and install with out the same error I had before.  Now however after rebooting the freenas VM after installing its OS it keeps just hanging, and I've noticed the "you have call traces" warning seems to come back right about when that happens.

 

It got late last night and I had to give up, but later today I'll grab the diagnostic info again. 

@BobPhoenix
I'll have to try to figure out how you found that call trace in the diagnostic info or in what logging system I should look.  Then I can at least watch that log while the VM boots to look for related errors.

 

Link to comment

Right near the top (line 50) of your log I saw this:

Sep 10 17:31:02 Tower kernel: WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 64MB of RAM.
Sep 10 17:31:02 Tower kernel: ------------[ cut here ]------------
Sep 10 17:31:02 Tower kernel: WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mtrr/cleanup.c:978 mtrr_trim_uncached_memory+0x38c/0x3b3
Sep 10 17:31:02 Tower kernel: Modules linked in:
Sep 10 17:31:02 Tower kernel: CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.30-unRAID #1
Sep 10 17:31:02 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EVGA Classified SR-2, BIOS 080016  07/05/2011
Sep 10 17:31:02 Tower kernel: ffffffff81c03e58 ffffffff813a4a1b 0000000000000000 ffffffff81931211
Sep 10 17:31:02 Tower kernel: ffffffff81c03e98 ffffffff8104d0d9 000003d200c40000 0000000004000000
Sep 10 17:31:02 Tower kernel: 0000000000c64000 ffffffff81d678d0 0000000000000001 000000000009b800
Sep 10 17:31:02 Tower kernel: Call Trace:
Sep 10 17:31:02 Tower kernel: [<ffffffff813a4a1b>] dump_stack+0x61/0x7e
Sep 10 17:31:02 Tower kernel: [<ffffffff8104d0d9>] __warn+0xb8/0xd3
Sep 10 17:31:02 Tower kernel: [<ffffffff8104d1a1>] warn_slowpath_null+0x18/0x1a
Sep 10 17:31:02 Tower kernel: [<ffffffff81cdb489>] mtrr_trim_uncached_memory+0x38c/0x3b3
Sep 10 17:31:02 Tower kernel: [<ffffffff81cd63bb>] setup_arch+0x499/0x8b9
Sep 10 17:31:02 Tower kernel: [<ffffffff81ccbb1e>] start_kernel+0x5f/0x3dc
Sep 10 17:31:02 Tower kernel: [<ffffffff81ccb120>] ? early_idt_handler_array+0x120/0x120
Sep 10 17:31:02 Tower kernel: [<ffffffff81ccb2d6>] x86_64_start_reservations+0x2a/0x2c
Sep 10 17:31:02 Tower kernel: [<ffffffff81ccb3be>] x86_64_start_kernel+0xe6/0xf3
Sep 10 17:31:02 Tower kernel: ---[ end trace 0000000000000000 ]---

In the Call Trace: lines I saw the mtrr_trim... mentioned so I looked for a WARNING line above the call trace that mentioned MTRR in it. Which leads to the first line above.  That and the "BIOS bug:" in the same warning anyway.

Link to comment

Thanks for the tip..... for some reason it never occurred to me it could literally be doing a "find the text 'call trace' in all these logs" To late to look at it tonight hopefully I'll get time this weekend to get back at it.  I'll post my findings on the off chance someone else runs into the same issue and I manage to find a solution.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.