Sak Posted August 1 Share Posted August 1 I updated my motherboard bios to the latest one from the manufacturer website a couple days ago and everything was working fine. Today both of my SSD in the cache pool crashed and was no longer recognized by Unraid. Any advice? tower-diagnostics-20240801-1850.zip Quote Link to comment
JorgeB Posted August 1 Share Posted August 1 Do they show up in the board BIOS? Quote Link to comment
Sak Posted August 2 Author Share Posted August 2 (edited) yes both SSD shows up in BIOS and I ran SMART short test both on BIOS and on Unraid and they all passed. After the initial failure yesterday, I restarted Unraid a few times but the SSD still would not show up. I shutdown the machine and left it off for a couple hours. When I start it again, both SSDs were recognized however docker failed to start – I'm guessing due to docker.img failure. This ran for a couple of hours before the machine became unreachable. Today I had a chance to get to the machine and check BIOS settings again. I disabled VMD controller and enabled native ASPM. Also changed from ASUS OC profile to Intel Defaults for CPU Configs. Now Unraid and docker is running fine again with no loss of data. I did not backup the previous BIOS settings before the update so I am not sure if VMD controller was enabled before the update. Hopefully this fixed the issue, I will continue to monitor it. I had read about the intel 14th gen CPU problem and updated the BIOS as it had this in the changelog. My CPU is the i5-14500 1. Updated with microcode 0x125 to ensure eTVB operates within Intel specifications. Currently seeing Aug 2 14:51:21 Tower kernel: python3[26366]: segfault at 7c92 ip 0000153bb66fd556 sp 0000153b745f0c68 error 4 in ld-musl-x86_64.so.1[153bb66b8000+54000] likely on CPU 6 (core 12, socket 0) Aug 2 14:53:31 Tower kernel: python3[7600]: segfault at 7c92 ip 00001476e52ab556 sp 00001476a33d4c68 error 4 in ld-musl-x86_64.so.1[1476e5266000+54000] likely on CPU 6 (core 12, socket 0) Aug 2 15:01:12 Tower kernel: pcieport 0000:00:1a.0: AER: Multiple Corrected error message received from 0000:00:1a.0 Aug 2 15:01:12 Tower kernel: pcieport 0000:00:1a.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID) Aug 2 15:01:12 Tower kernel: pcieport 0000:00:1a.0: device [8086:7a48] error status/mask=00000001/00002000 tower-diagnostics-20240802-1521.zip Edited August 2 by Sak Quote Link to comment
JorgeB Posted August 2 Share Posted August 2 See if this helps with the PCIe errors: https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/?do=findComment&comment=1165009 Quote Link to comment
Sak Posted August 2 Author Share Posted August 2 thanks! setting pice_aspm=off seemed to have help the PCIe errors. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.