Jump to content

[SOLVED] [6.9.2] Kernel panic on rebuild


Go to solution Solved by pengrus,

Recommended Posts

Hi!  My backup server threw a 2TB disk (Enterprise of course, still have 10-year-old WD20EARS in this thing no problem), so I replaced with a 4TB Seagate.  But every time I try to start the array and rebuild the data, it gets to some random point and dies, crashing the server and requiring a hard reboot.  The new drive is fine as far as I can tell, no SMART errors.  Memtest ran for a day with no errors.  I've attached diagnostics from before array start and after the crash, and tails of syslog and the kernel, though it appears nothing is valuable there.  I've also attached a screenshot of the error that finally crashes the server (loudly today with the beeping).  I have searched and found other threads mentioning something similar, but there haven't been any real fixable causes discovered, hope someone out there can help!

 

Thank you.

 

-P

archive-diagnostics-20220316-1144.zip archive-diagnostics-20220317-1748.zip juststarted.txt kerneljuststarted.txt

Archive_crashes.png

Edited by pengrus
Solved!
Link to comment
  • pengrus changed the title to [6.9.2] Kernel panic on rebuild
6 hours ago, JorgeB said:

Post the output of:

 

cat /proc/interrupts

 

SASLP tends to like IRQ16, if it's getting disable during high load it's a problem.

 

Well, it's definitely mvsas...how do I fix that??

 

root@Archive:~# cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
  0:   11987036          0          0          0          0          0          0          0   IO-APIC   2-edge      timer
  1:          0          4          0          0          0          0          0          0   IO-APIC   1-edge      i8042
  8:          0          0          4          0          0          0          0          0   IO-APIC   8-edge      rtc0
  9:          0          0          0          0          0          0          0          0   IO-APIC   9-fasteoi   acpi
 12:          6          0          0          0          0          0          0          0   IO-APIC  12-edge      i8042
 16:          0          0          0          0     142514          0          0          0   IO-APIC  16-fasteoi   mvsas
 18:          0          0          0          4          0          0          0          0   IO-APIC  18-fasteoi   i801_smbus
 19:          0          0          0          0          0      77385          0          0   IO-APIC  19-fasteoi   ata_piix, ata_piix
 21:          0          0          0          0          0          0         72          0   IO-APIC  21-fasteoi   ehci_hcd:usb1
 23:          0          0          0          0          0          0          0    2654154   IO-APIC  23-fasteoi   ehci_hcd:usb2
 24:    7115066          0          0          0          0          0          0          0  HPET-MSI   3-edge      hpet3
 25:          0    6391924          0          0          0          0          0          0  HPET-MSI   4-edge      hpet4
 26:          0          0    6033243          0          0          0          0          0  HPET-MSI   5-edge      hpet5
 27:          0          0          0    5389967          0          0          0          0  HPET-MSI   6-edge      hpet6
 28:          0          0          0          0    4701097          0          0          0  HPET-MSI   7-edge      hpet7
 29:          0          0          0          0          0          0          0          0  DMAR-MSI   0-edge      dmar0
 30:          0          0          0          0          0          0          0          0   PCI-MSI 49152-edge      PCIe PME, aerdrv
 31:          0          0          0          0          0          0          0          0   PCI-MSI 81920-edge      PCIe PME, aerdrv
 32:          0          0          0          0          0          0          0          0   PCI-MSI 458752-edge      PCIe PME
 33:          0          0          0          0          0          0          0          0   PCI-MSI 466944-edge      PCIe PME
 34:          0          0          0          0          0          0          0          0   PCI-MSI 468992-edge      PCIe PME
 35:          0          0          0          0          0     555414          0          0   PCI-MSI 2097152-edge      eth0-rx-0
 36:          0          0          0          0          0          0     209403          0   PCI-MSI 2097153-edge      eth0-tx-0
 37:          0          0          0          0          0          0          0          2   PCI-MSI 2097154-edge      eth0
 38:          0          0          0          0          0          0          0      86345   PCI-MSI 524288-edge      mpt2sas0-msix0
NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
LOC:        181        190        187        184        181    5772395    5427425    8262557   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
IWI:    1070909     857156     782246     731932     645449     944730     664880    1390869   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:      74066      29253      25872      22715      22290      30056      26554      29858   Rescheduling interrupts
CAL:     266472      84006      63573      55369      70468      22849      16713      11208   Function call interrupts
TLB:       2859       3516       3542       3300       3399       3428       3186       2658   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:        217        218        218        218        218        218        218        218   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0          0          0          0          0   Posted-interrupt notification event
NPI:          0          0          0          0          0          0          0          0   Nested posted-interrupt event
PIW:          0          0          0          0          0          0          0          0   Posted-interrupt wakeup event

 

Thanks!

 

-P

 

Link to comment
  • pengrus changed the title to [SOLVED] [6.9.2] Kernel panic on rebuild
  • Solution

Thanks to @JorgeB for pointing me to the controller.  For those that might have one still chugging away, what looks like is happening is that under load the Marvell-based (AOC-SASLP-MV-8 in this case) controller will freak out over IRQ16 (or 13, sometimes) and crash the server.  The disk being rebuilt wasn't even on the controller, but you need all the disks to participate so...

 

Anyway, so I went and found a different post (also featuring @JorgeB and @saarg in starring roles) that recommended disabling IOMMU by appending "iommu=pt" to syslinux.cfg.

 

And now my drive is rebuilt.  I have some more LSIs on order to replace so I can have IOMMU back, but this works for now!

 

Thanks again.

 

-P

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...