JSuchovsky Posted March 21, 2022 Share Posted March 21, 2022 Unraid recently (about 4 days ago) started crashing multiple times a day. Can someone help me troubleshoot what is going on? I have the logs writing to the flash drive right now, so I have a few cycles of it happening now. I think i have narrowed it down to this part of the log: Mar 20 04:00:16 Home crond[2794]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Mar 20 04:33:07 Home kernel: md: recovery thread: P incorrect, sector=977896384 Mar 20 04:33:07 Home kernel: md: recovery thread: PQ incorrect, sector=977948176 Mar 20 04:33:07 Home kernel: md: recovery thread: PQ incorrect, sector=977948184 Mar 20 04:33:07 Home kernel: md: recovery thread: PQ incorrect, sector=977948192 Mar 20 04:33:07 Home kernel: md: recovery thread: PQ incorrect, sector=977948200 Mar 20 04:33:07 Home kernel: md: recovery thread: PQ incorrect, sector=977966696 Mar 20 04:33:07 Home kernel: md: recovery thread: P incorrect, sector=977977880 Mar 20 04:33:07 Home kernel: md: recovery thread: P incorrect, sector=977986616 Mar 20 04:33:07 Home kernel: md: recovery thread: P incorrect, sector=978003624 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992184 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992192 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992200 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992208 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992216 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992224 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992232 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992240 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992248 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992256 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992264 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992272 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992280 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992288 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992296 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992304 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992312 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992320 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992328 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992336 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992344 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992352 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992360 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992368 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992376 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992384 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992392 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992400 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992408 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992416 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992424 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992432 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992440 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992448 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992456 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992464 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992472 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992480 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992488 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992496 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992504 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992512 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992520 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992528 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992536 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992544 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992552 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992560 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992568 Mar 20 04:33:20 Home kernel: md: recovery thread: PQ incorrect, sector=980992576 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103440 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103448 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103456 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103464 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103472 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103480 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103488 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103496 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103504 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103512 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103520 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103528 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103536 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103544 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103552 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103560 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103568 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103576 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103584 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103592 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103600 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103608 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103616 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103624 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103632 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103640 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103648 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103656 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103664 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103672 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103680 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103688 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103696 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103704 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103712 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103720 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103728 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103736 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103744 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103752 Mar 20 04:33:25 Home kernel: md: recovery thread: Q incorrect, sector=982103760 Mar 20 04:33:25 Home kernel: md: recovery thread: stopped logging Mar 20 04:54:51 Home kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Mar 20 04:54:51 Home kernel: caller _nv000651rm+0x1ad/0x200 [nvidia] mapping multiple BARs Mar 20 05:00:14 Home crond[2794]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Mar 20 06:00:16 Home crond[2794]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Mar 20 07:00:16 Home crond[2794]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Mar 20 08:00:16 Home crond[2794]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Mar 20 09:00:15 Home crond[2794]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Mar 20 09:33:11 Home kernel: microcode: microcode updated early to revision 0x42e, date = 2019-03-14 Mar 20 09:33:11 Home kernel: Linux version 5.10.28-Unraid (root@Develop) (gcc (GCC) 9.3.0, GNU ld version 2.33.1-slack15) #1 SMP Wed Apr 7 08:23:18 PDT 2021 Mar 20 09:33:11 Home kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot syslog.txt Quote Link to comment
trurl Posted March 21, 2022 Share Posted March 21, 2022 Attach Diagnostics to your NEXT post in this thread. Have you done memtest? Quote Link to comment
JSuchovsky Posted March 21, 2022 Author Share Posted March 21, 2022 I attached the diagnostics. I will start a memtest today. home-diagnostics-20220321-0836.zip Quote Link to comment
trurl Posted March 21, 2022 Share Posted March 21, 2022 4 hours ago, JSuchovsky said: start a memtest today before using your server for anything else. Any errors found by memtest are unacceptable. You shouldn't attempt to run a computer with bad RAM. Quote Link to comment
JSuchovsky Posted March 22, 2022 Author Share Posted March 22, 2022 One Pass finally completed and zero errors. I will let it run overnight and complete a second pass. Quote Link to comment
JSuchovsky Posted March 23, 2022 Author Share Posted March 23, 2022 It finished two passes of memtest with zero errors. Quote Link to comment
itimpi Posted March 23, 2022 Share Posted March 23, 2022 The version of memtest included with Unraid will not detect errors being corrected by ECC. I believe the version you can download from memtest86.com can do this. Quote Link to comment
JSuchovsky Posted June 2, 2022 Author Share Posted June 2, 2022 I ran the memtest with the ECC checked, found one bad stick of RAM and replaced it. Also replaced a drive that started to fail. Still having continuous reboots. It can't finish rebuilding the parity. home-diagnostics-20220602-0920.zip Quote Link to comment
trurl Posted June 3, 2022 Share Posted June 3, 2022 setup syslog server and post that syslog after crash Quote Link to comment
trurl Posted June 3, 2022 Share Posted June 3, 2022 unrelated, but your appdata and system shares have files on the array. Quote Link to comment
JSuchovsky Posted June 3, 2022 Author Share Posted June 3, 2022 I have the syslog mirrored to the Flash drive for now. Syslog2 is the latest, I believe it has two or three crashes in it, but it also been up for 15 hours now without crashing. 1 hour ago, trurl said: unrelated, but your appdata and system shares have files on the array. As for the appdata, I moved the files to my SSD hard drives in the pool when Plex was having the database issue. Is it recommended to move it back to the cache? Thank you for all the help! syslog2.txt syslog Quote Link to comment
trurl Posted June 4, 2022 Share Posted June 4, 2022 10 hours ago, JSuchovsky said: SSD hard drives in the pool You mean the array. SSDs in the array cannot be trimmed, and can only be written at parity speed. appdata and system shares always have open files so even if these are on SSDs in the array, parity will also be involved and can't spin down. Quote Link to comment
JSuchovsky Posted June 8, 2022 Author Share Posted June 8, 2022 I made some progress and moved all the System and appdata off the pool and on to the cache drives. The system seems more stable, except for now it seems to crash at 7:26am every morning local time. It has done this 3 days in a row. I cleared the syslog before i went to bed and looked at it this morning and it doesn't look like it logged anything until after the crash. Any other tips? home-diagnostics-20220608-0741.zip Quote Link to comment
trurl Posted June 8, 2022 Share Posted June 8, 2022 28 minutes ago, JSuchovsky said: cleared the syslog before i went to bed and looked at it this morning and it doesn't look like it logged anything until after the crash current syslog is in RAM just like the rest of the OS so restarts on boot, and that is the same log included in diagnostics. You have to get the log from syslog server to see what happened before crash. Quote Link to comment
JSuchovsky Posted June 8, 2022 Author Share Posted June 8, 2022 Sorry, here is the full one. syslog Quote Link to comment
JSuchovsky Posted June 24, 2022 Author Share Posted June 24, 2022 Is there a way to turn on more verbose logging? This issue is still happening randomly, from 5 minutes after boot up to just a little over 3 days. Quote Link to comment
JorgeB Posted June 24, 2022 Share Posted June 24, 2022 Crashing without nothing logged suggests a hardware problem, one thing you can try is to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.