vojtagrec

Members
  • Posts

    18
  • Joined

  • Last visited

Converted

  • Gender
    Male
  • Location
    Prague, CZ

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

vojtagrec's Achievements

Noob

Noob (1/14)

6

Reputation

  1. @JorgeB Testing with 6.12.6 (kernel 6.1.64) right now, uptime 30+ minutes and so far without crash. So it looks like this revert helped.
  2. The error message sounds like you got the contents of i915.conf still wrong (“line starting with ‘enable_dc=0’”). Are you sure the line in that file really is options i915 enable_dc=0 ?
  3. Removed the i915 option from kernel cmdline in Syslinux Config, kept the option in modprobe.d, updated to rc8 and so far everything looks fine, cat /sys/module/i915/parameters/enable_dc returns 0 and there are no crashes.
  4. @Craig Dennis Eh sorry, I just noticed you posted before rc7 was released, so my comment is probably irrelevant to your case...
  5. @Craig Dennis On which RC are you? I just upgraded to rc7 and got a crash too. It looks like there is some regression, I had the enable_dc=0 applied via /boot/config/modprobe.d/i915.conf and it worked perfectly fine with rc6 but it seems to not work with rc7. I booted rc7 after the crash and checked /sys/module/i915/parameters/enable_dc and indeed it was "-1" (auto). When I added the kernel param to "Syslinux Configuration" it seems to work (I just tested with my server, current uptime 30+ min and it always crashed around ~20 min after boot for me). FYI @ich777 @JorgeB the workaround proposed in release notes (via modprobe.d) does not work with rc7, see above.
  6. @ich777 @JorgeB Is there some updated guide on how to build kernel for Unraid? I just found this outdated one. I think the bug might be caused by the same commit as this one (+ on FreeDesktop) and would like to try a kernel with the commit reverted. Or at least try bisecting the issue if it shows to be something else. And probably report it back to mainline, given that 6.1 is LTS release and will live on for years. I’m a software developer with basic working knowledge of C and modest experience with Linux, so just pointing out the Unraid peculiarities might help (but ofc some ready-made script/VM/Docker image would be ideal). Thanks!
  7. @ich777 I also tried the different i915 flags. With i915.disable_power_well=1 it crashes in the same manner (diagnostics attached). With i915.enable_dc=0, it seems to not crash (uptime over 1 hour now without crash, hope I don't jinx it). I purposefully kept PiKVM display open & active (so that I can make sure the display is not asleep etc.). nibbler-diagnostics-20230504-2305-disable_power_well.zip
  8. @menos @ich777 Just tested on rc5, still crashing. Diagnostics after the crash attached. Will try some of the i915 module flags. To me it looks like it must be some regression in kernels newer than 6.1.20 (that was in rc2). nibbler-diagnostics-20230504-1730.zip
  9. @ich777 Updated to rc4 tonight, crashed again, diagnostics attached. In the meantime, I uninstalled most of my plugins etc. because I was trying to find the cause of my disks not staying spinned down (in the end, I think it's the known issue with earlier rc builds when somebody does New config – I didn't do that, but one of the crashes clearly corrupted some files on my USB drive so I had to reassign the drives to the pools after a reboot so I guess the effect is similar to doing New config; under rc3 or rc4 the disks keep spinned down). For now I'll try keep running rc4 (just with the HDMI cable disconnected), first to see if it crashes eventually, second because I want the disks to be spinned down most of the time (and don't want to downgrade back to 6.11 just to make it work). So far the server's been up for 45 minutes and there are no i915-related errors. If I have some time I might want to dig into the EDID used in PiKVM – the first error in Unraid syslog seems to be related to DPLL (which, if I understand correctly, has to do with video signal timing), and given that it seems to run fine with no (or another) monitor connected, it might be something specific to PiKVM/the HDMI-CSI bridge used (although it worked with no issues up until rc3, so I wouldn't say it's something fundamentally broken in PiKVM, more like some very edge case configuration). Attaching the currently used EDID for the sake of completeness. nibbler-diagnostics-20230427-2125.zip tc358743-edid.hex
  10. @ich777 I tried some more – I used a different (higher quality) HDMI cable to connect to the real monitor, so I used that cable to connect to PiKVM, didn't help. I also tried using a different EDID in PiKVM (I was using 1920x1080, tried 1280x1024 instead), still crashed with the same error. Reverted to rc2 for now, will try again when rc4 is out (let me know should you want me to run some more tests on rc3).
  11. @ich777 OK so I tried with the HDMI cable disconnected and rc3 seemed to be running normally. After 4 hours (much more than it took to crash each time so far) I reconnected the HDMI cable to my PiKVM and the system crashed practically instantly. I had the Diagnostics page open in a tab so I started diagnostics right away. This time the web UI became unresponsive and I got kicked out of SSH (and got "Host is down" when trying to reconnect). Somehow the diagnostics ZIP still downloaded (attached, but I cannot guarantee it's not corrupted). Then I reset the server (via HW reset button) with a real monitor connected, so far it's been running for over 20 minutes with no issue... nibbler-diagnostics-20230424-1726.zip
  12. @ich777 I'm on the latest BIOS available. Didn't manage to run more tests last week but I should be able to run some tomorrow.
  13. @ich777 The diagnostics are from my first try (I understood from your comment that I should try just with ipvlan & intel-gpu-top disabled, and if it still crashed, try the i915 flag too) , for the second try I added it (confirmed in /proc/cmdline after boot) but it still crashed the same way. Will try with a real monitor tomorrow (or lated). Should I keep the i915 flag? Let me know if I should try some more flags/changes.
  14. @ich777 Crashed again. First line in syslog is at Apr 17 21:07:35, first i915 related error at Apr 17 21:23:10 (lasted ~16 minutes this time). I was better prepared this time and these are my findings: Clearly the GPU driver really crashes, I lose any video output in PiKVM (looks like a connected monitor, I get the info about resolution, but I get no image, just blank screen). Processor usage goes haywire (1 min load gets to 30-50 on 6 cores/12 threads). Tried to find out what's going on but because of the load it was so choppy I wasn't able to really tell anything. Processes with the highest CPU usage were emhttp, avahi, monitor_nchan and sometimes docker. "reboot" command behaves weirdly. It looked like a normal reboot at first but it looks to me like at least some disks were unmounted/mounted read-only (including, importantly, the flash drive, /dev/sda1 in my case) but then mdadm tried to start the array (?) and it failed horribly because /boot/ not being writable etc. This time I was more patient with the web GUI and managed to get diagnostics (yay!), although gathering them took ~6 minutes. Also pressing the "reset" button did reset the computer just fine. Rebooted with the i915 flag at 21:45 my time, while writing this post it crashed on me again at 22:00 😔 Enough playing for today, I'll revert to rc2 again and can try more tomorrow or some other day. nibbler-diagnostics-20230417-2129.zip