Jump to content

Periodic reboot every ~90 minutes with no trace in syslog


Go to solution Solved by generalkenobi,

Recommended Posts

TL;DR: Unraid server keeps restarting every 90 mins. Tried disabling VM and Docker, rolling back versions, and memory tests—all no fix. Swapped CPU and changed from PCI SATA to HBA adapter a month ago; was stable but now unstable again. Running on i9-13900K, Asus ProArt Z690, 64GB RAM. No clues in rsyslog or kernel panics. Open to any suggestions or insights from diagnostics.

 

I have had issues with Unraid restarting roughly every 90 minutes or so. I have the diagnostics attached.

 

Things I've tried:

  • Disable VM Manager and Docker and leave disabled - no fix.
  • Change Docker to ipvlan - no fix.
  • Rollback from 6.12.4 to 6.12.2 - no fix.
  • Rollback from 6.12.2 to 6.11.5 - no fix.
  • I pulled out memory sticks and memtest for 24 hours straight with all passes.
  • I had a cache SSD disk ("cache_ssd") that had some bad sectors. I removed it from the system just in case, though it is still connected. I just haven't the time to open the case.
  • A month ago, I swapped to a new but the same model CPU, thinking I had kernel panics caused by the CPU.

 

I previously had issues per my previous post. However, the system was stable for a month straight with no issues when I changed from a PCI SATA card to an HBA adapter.

 

I have an i9-13900K on an Asus ProArt Z690-CREATOR WIFI with 64 GB. No GPU installed. I'm using the onboard 10G port for primary connection. I have a 10GBe PCI card to get me a direct connection to my backup Unraid server to rsync data back and forth. But that's been offline for a while. I have (8) 18TB disks with (1) parity. And I have a NVMe drive and SSD marked as cache, but I'm really just using them to serve docker containers to not stress the array.

 

I set up a rsyslog server at attached that as well. It leaves no trace before a reboot. I don't see any kernel panics the way I did a month ago.

 

I am at a loss for what to do next. Any wild ideas from anyone? Or someone that sees something in the diagnostics that I don't?

tower-diagnostics-20230912-2120.zip messages

Link to comment

Thanks. Those are good points. 

I turned on syslog to flash and got these logs. The last restart was around 12:40 today. It booted back up at "Sep 13 12:40:20." This restart is after I had stopped the array and let it sit idle.

The lines preceding that are:

 

Sep 13 12:37:03 Tower  emhttpd: spinning down /dev/sdf
Sep 13 12:37:12 Tower  emhttpd: spinning down /dev/sdd
Sep 13 12:37:20 Tower  emhttpd: spinning down /dev/sdk
Sep 13 12:37:20 Tower  emhttpd: spinning down /dev/sdi
Sep 13 12:37:37 Tower  emhttpd: spinning down /dev/sdg
Sep 13 12:37:46 Tower  emhttpd: spinning down /dev/sde
Sep 13 12:37:46 Tower  emhttpd: spinning down /dev/sdl
Sep 13 12:38:09 Tower  emhttpd: read SMART /dev/sdf
Sep 13 12:38:09 Tower  emhttpd: spinning down /dev/sdj
Sep 13 12:38:09 Tower  emhttpd: spinning down /dev/sdc
Sep 13 12:38:18 Tower  emhttpd: read SMART /dev/sdd
Sep 13 12:38:18 Tower  emhttpd: spinning down /dev/sdh

 

Which doesn't look bad to me. I'll try a power supply swap next once I work out if I have enough SATA power cables.

 

syslog

Link to comment

Hmm. Another restart after the power supply swap. Any other ideas?

 

Since my last post, the parity check had finished. I moved it to another outlet on 1500W UPS. So, I don't think it's the outlet. My girlfriend overloaded the breaker with her hair dryer, but NUT did its job. I doubt it's the move to a different outlet. But maybe the UPS is faulty?

 

Sep 15 10:22:34 Tower upsmon[5986]: UPS [email protected] on battery
.... NUT is configured for shutdown after 5 min on battery
Sep 15 10:27:35 Tower upsmon[5986]: Signal 10: User requested FSD

 

Syslog is similar with nothing I can see. Restart occurs at Sep 15 12:47:31 in syslog.

 

I ran sensors to get CPU temps now (with docker and parity check running and ~5% cpu load). Don't know what it was before restart. But I wondered about overheating the CPU. Those seem fine too.

 

root@Tower:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +44.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 4:        +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 8:        +41.0°C  (high = +80.0°C, crit = +100.0°C)
Core 12:       +43.0°C  (high = +80.0°C, crit = +100.0°C)
Core 16:       +44.0°C  (high = +80.0°C, crit = +100.0°C)
Core 20:       +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 24:       +40.0°C  (high = +80.0°C, crit = +100.0°C)
Core 28:       +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 32:       +40.0°C  (high = +80.0°C, crit = +100.0°C)
Core 33:       +40.0°C  (high = +80.0°C, crit = +100.0°C)
Core 34:       +40.0°C  (high = +80.0°C, crit = +100.0°C)
Core 35:       +40.0°C  (high = +80.0°C, crit = +100.0°C)
Core 36:       +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 37:       +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 38:       +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 39:       +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 40:       +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 41:       +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 42:       +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 43:       +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 44:       +38.0°C  (high = +80.0°C, crit = +100.0°C)
Core 45:       +38.0°C  (high = +80.0°C, crit = +100.0°C)
Core 46:       +38.0°C  (high = +80.0°C, crit = +100.0°C)
Core 47:       +38.0°C  (high = +80.0°C, crit = +100.0°C)

nvme-pci-0300
Adapter: PCI adapter
Composite:    +49.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +49.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +52.9°C  (low  = -273.1°C, high = +65261.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C  (crit = +105.0°C)

eth0-pci-0600
Adapter: PCI adapter
PHY Temperature:  +62.0°C  
MAC Temperature:  +62.0°C  

eth2-pci-0200
Adapter: PCI adapter
PHY Temperature:  +53.0°C  
MAC Temperature:  +53.0°C  

 

syslog

Link to comment
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...