mkono87 Posted September 2, 2021 Share Posted September 2, 2021 Been dealing with Unraid crashes for a while where there is no access to the gui nor ssh. I have no choice to hard reboot the machine. Sometimes it can be days, other times it can be months. I just cant seem to figure out the cause. This time I have been able to catch the logs closer to the time it happened. Do you see anything I should be aware of? https://pastebin.com/ZanJhvkh Quote Link to comment
trurl Posted September 2, 2021 Share Posted September 2, 2021 Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread. Quote Link to comment
mkono87 Posted September 2, 2021 Author Share Posted September 2, 2021 (edited) nas-diagnostics-20210901-2203.zip nas-diagnostics-20210901-2203.zip Edited September 2, 2021 by mkono87 Quote Link to comment
trurl Posted September 2, 2021 Share Posted September 2, 2021 12 hours ago, mkono87 said: This time I have been able to catch the logs closer to the time it happened. Attach those logs as a zipped plain text file Quote Link to comment
mkono87 Posted September 2, 2021 Author Share Posted September 2, 2021 2 hours ago, trurl said: Attach those logs as a zipped plain text file ZanJhvkh.zip Quote Link to comment
mkono87 Posted September 3, 2021 Author Share Posted September 3, 2021 Anyone? I'm really stuck here and not sure what to do. Any help is appreciated. Quote Link to comment
Tristankin Posted September 3, 2021 Share Posted September 3, 2021 Turn on save syslog to flash so we can see the reason the server is crashing. Check C states, turn off ram overclocking (anything above the spec on ark is overclocking) make sure psu sleep state is turned off in bios. Failing those look at downgrading to 6.8.3, this fixed mine, hard lock every 2 days on 6.9.x, today, 34 days uptime. Quote Link to comment
mkono87 Posted September 3, 2021 Author Share Posted September 3, 2021 5 hours ago, Tristankin said: Turn on save syslog to flash so we can see the reason the server is crashing. Check C states, turn off ram overclocking (anything above the spec on ark is overclocking) make sure psu sleep state is turned off in bios. Failing those look at downgrading to 6.8.3, this fixed mine, hard lock every 2 days on 6.9.x, today, 34 days uptime. Syslog to flash is on. This is where the log from that pastebin came from. Downgrsding I have not tried but I will give that a shot. Quote Link to comment
trurl Posted September 3, 2021 Share Posted September 3, 2021 On 9/2/2021 at 9:39 AM, trurl said: Attach those logs as a zipped plain text file I meant for you to attach the syslog server logs as a zipped plain text file. Quote Link to comment
mkono87 Posted September 6, 2021 Author Share Posted September 6, 2021 On 9/3/2021 at 12:14 PM, trurl said: I meant for you to attach the syslog server logs as a zipped plain text file. I thought that's what I gave you or do you mean the entire log? I can get that uploaded later today. Quote Link to comment
DieFalse Posted September 7, 2021 Share Posted September 7, 2021 We are awaiting the full Syslog, however you are running V1 of the AsRockRack motherboard bios from 2015. You need to update to atleast 2.6 of the bios as that fixed a lot of known issues. 2.7 is current and I know of no reason not to update to 2.7 so I suggest it. Quote Link to comment
mkono87 Posted September 8, 2021 Author Share Posted September 8, 2021 (edited) On 9/7/2021 at 10:46 AM, fmp4m said: We are awaiting the full Syslog, however you are running V1 of the AsRockRack motherboard bios from 2015. You need to update to atleast 2.6 of the bios as that fixed a lot of known issues. 2.7 is current and I know of no reason not to update to 2.7 so I suggest it. Okay I have no updated to the latest bios and attached the latest logs from when it went down yesterday. Please note: These logs are from before I updated the Bios. syslog.zip nas-diagnostics-20210908-1252.zip Edited September 8, 2021 by mkono87 Quote Link to comment
DieFalse Posted September 8, 2021 Share Posted September 8, 2021 @mkono87 you have XFS corruption, hopefully @JorgeB can assist with that. Thats the last entry before the crash: Sep 7 14:22:55 NAS kernel: XFS (sdb1): Metadata corruption detected at xfs_dinode_verify+0xa7/0x567 [xfs], inode 0xe997421 dinode Sep 7 14:22:55 NAS kernel: XFS (sdb1): Unmount and run xfs_repair Sep 7 14:22:55 NAS kernel: XFS (sdb1): First 128 bytes of corrupted metadata buffer: Sep 7 14:22:55 NAS kernel: 00000000: 49 4e 81 a4 03 02 00 00 00 00 00 63 00 00 00 64 IN.........c...d Sep 7 14:22:55 NAS kernel: 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 7 14:22:55 NAS kernel: 00000020: f0 be 68 03 81 88 ff ff 60 d6 04 ab 27 03 36 41 ..h.....`...'.6A Sep 7 14:22:55 NAS kernel: 00000030: 60 d6 04 ab 27 03 36 41 00 00 00 00 00 08 de c1 `...'.6A........ Sep 7 14:22:55 NAS kernel: 00000040: 00 00 00 00 00 00 00 8e 00 00 00 00 00 00 00 01 ................ Sep 7 14:22:55 NAS kernel: 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 5f 57 70 8c ............_Wp. Sep 7 14:22:55 NAS kernel: 00000060: ff ff ff ff be bc 40 17 00 00 00 00 00 00 00 06 ......@......... Sep 7 14:22:55 NAS kernel: 00000070: 00 00 1d dd 00 00 f5 f6 00 00 00 00 00 00 00 00 ................ Sep 7 16:13:34 NAS kernel: XFS (sdb1): Metadata corruption detected at xfs_dinode_verify+0xa7/0x567 [xfs], inode 0xe997421 dinode Sep 7 16:13:34 NAS kernel: XFS (sdb1): Unmount and run xfs_repair Sep 7 16:13:34 NAS kernel: XFS (sdb1): First 128 bytes of corrupted metadata buffer: Sep 7 16:13:34 NAS kernel: 00000000: 49 4e 81 a4 03 02 00 00 00 00 00 63 00 00 00 64 IN.........c...d Sep 7 16:13:34 NAS kernel: 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 7 16:13:34 NAS kernel: 00000020: f0 be 68 03 81 88 ff ff 60 d6 04 ab 27 03 36 41 ..h.....`...'.6A Sep 7 16:13:34 NAS kernel: 00000030: 60 d6 04 ab 27 03 36 41 00 00 00 00 00 08 de c1 `...'.6A........ Sep 7 16:13:34 NAS kernel: 00000040: 00 00 00 00 00 00 00 8e 00 00 00 00 00 00 00 01 ................ Sep 7 16:13:34 NAS kernel: 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 5f 57 70 8c ............_Wp. Sep 7 16:13:34 NAS kernel: 00000060: ff ff ff ff be bc 40 17 00 00 00 00 00 00 00 06 ......@......... Sep 7 16:13:34 NAS kernel: 00000070: 00 00 1d dd 00 00 f5 f6 00 00 00 00 00 00 00 00 ................ Sep 7 17:25:13 NAS kernel: XFS (sdb1): Metadata corruption detected at xfs_dinode_verify+0xa7/0x567 [xfs], inode 0xe997421 dinode Sep 7 17:25:13 NAS kernel: XFS (sdb1): Unmount and run xfs_repair Sep 7 17:25:13 NAS kernel: XFS (sdb1): First 128 bytes of corrupted metadata buffer: Sep 7 17:25:13 NAS kernel: 00000000: 49 4e 81 a4 03 02 00 00 00 00 00 63 00 00 00 64 IN.........c...d Sep 7 17:25:13 NAS kernel: 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 7 17:25:13 NAS kernel: 00000020: f0 be 68 03 81 88 ff ff 60 d6 04 ab 27 03 36 41 ..h.....`...'.6A Sep 7 17:25:13 NAS kernel: 00000030: 60 d6 04 ab 27 03 36 41 00 00 00 00 00 08 de c1 `...'.6A........ Sep 7 17:25:13 NAS kernel: 00000040: 00 00 00 00 00 00 00 8e 00 00 00 00 00 00 00 01 ................ Sep 7 17:25:13 NAS kernel: 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 5f 57 70 8c ............_Wp. Sep 7 17:25:13 NAS kernel: 00000060: ff ff ff ff be bc 40 17 00 00 00 00 00 00 00 06 ......@......... Sep 7 17:25:13 NAS kernel: 00000070: 00 00 1d dd 00 00 f5 f6 00 00 00 00 00 00 00 00 ................ Quote Link to comment
mkono87 Posted September 8, 2021 Author Share Posted September 8, 2021 24 minutes ago, fmp4m said: @mkono87 you have XFS corruption, hopefully @JorgeB can assist with that. Thats the last entry before the crash: Sep 7 14:22:55 NAS kernel: XFS (sdb1): Metadata corruption detected at xfs_dinode_verify+0xa7/0x567 [xfs], inode 0xe997421 dinode Sep 7 14:22:55 NAS kernel: XFS (sdb1): Unmount and run xfs_repair Sep 7 14:22:55 NAS kernel: XFS (sdb1): First 128 bytes of corrupted metadata buffer: Sep 7 14:22:55 NAS kernel: 00000000: 49 4e 81 a4 03 02 00 00 00 00 00 63 00 00 00 64 IN.........c...d Sep 7 14:22:55 NAS kernel: 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 7 14:22:55 NAS kernel: 00000020: f0 be 68 03 81 88 ff ff 60 d6 04 ab 27 03 36 41 ..h.....`...'.6A Sep 7 14:22:55 NAS kernel: 00000030: 60 d6 04 ab 27 03 36 41 00 00 00 00 00 08 de c1 `...'.6A........ Sep 7 14:22:55 NAS kernel: 00000040: 00 00 00 00 00 00 00 8e 00 00 00 00 00 00 00 01 ................ Sep 7 14:22:55 NAS kernel: 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 5f 57 70 8c ............_Wp. Sep 7 14:22:55 NAS kernel: 00000060: ff ff ff ff be bc 40 17 00 00 00 00 00 00 00 06 ......@......... Sep 7 14:22:55 NAS kernel: 00000070: 00 00 1d dd 00 00 f5 f6 00 00 00 00 00 00 00 00 ................ Sep 7 16:13:34 NAS kernel: XFS (sdb1): Metadata corruption detected at xfs_dinode_verify+0xa7/0x567 [xfs], inode 0xe997421 dinode Sep 7 16:13:34 NAS kernel: XFS (sdb1): Unmount and run xfs_repair Sep 7 16:13:34 NAS kernel: XFS (sdb1): First 128 bytes of corrupted metadata buffer: Sep 7 16:13:34 NAS kernel: 00000000: 49 4e 81 a4 03 02 00 00 00 00 00 63 00 00 00 64 IN.........c...d Sep 7 16:13:34 NAS kernel: 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 7 16:13:34 NAS kernel: 00000020: f0 be 68 03 81 88 ff ff 60 d6 04 ab 27 03 36 41 ..h.....`...'.6A Sep 7 16:13:34 NAS kernel: 00000030: 60 d6 04 ab 27 03 36 41 00 00 00 00 00 08 de c1 `...'.6A........ Sep 7 16:13:34 NAS kernel: 00000040: 00 00 00 00 00 00 00 8e 00 00 00 00 00 00 00 01 ................ Sep 7 16:13:34 NAS kernel: 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 5f 57 70 8c ............_Wp. Sep 7 16:13:34 NAS kernel: 00000060: ff ff ff ff be bc 40 17 00 00 00 00 00 00 00 06 ......@......... Sep 7 16:13:34 NAS kernel: 00000070: 00 00 1d dd 00 00 f5 f6 00 00 00 00 00 00 00 00 ................ Sep 7 17:25:13 NAS kernel: XFS (sdb1): Metadata corruption detected at xfs_dinode_verify+0xa7/0x567 [xfs], inode 0xe997421 dinode Sep 7 17:25:13 NAS kernel: XFS (sdb1): Unmount and run xfs_repair Sep 7 17:25:13 NAS kernel: XFS (sdb1): First 128 bytes of corrupted metadata buffer: Sep 7 17:25:13 NAS kernel: 00000000: 49 4e 81 a4 03 02 00 00 00 00 00 63 00 00 00 64 IN.........c...d Sep 7 17:25:13 NAS kernel: 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 7 17:25:13 NAS kernel: 00000020: f0 be 68 03 81 88 ff ff 60 d6 04 ab 27 03 36 41 ..h.....`...'.6A Sep 7 17:25:13 NAS kernel: 00000030: 60 d6 04 ab 27 03 36 41 00 00 00 00 00 08 de c1 `...'.6A........ Sep 7 17:25:13 NAS kernel: 00000040: 00 00 00 00 00 00 00 8e 00 00 00 00 00 00 00 01 ................ Sep 7 17:25:13 NAS kernel: 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 5f 57 70 8c ............_Wp. Sep 7 17:25:13 NAS kernel: 00000060: ff ff ff ff be bc 40 17 00 00 00 00 00 00 00 06 ......@......... Sep 7 17:25:13 NAS kernel: 00000070: 00 00 1d dd 00 00 f5 f6 00 00 00 00 00 00 00 00 ................ Hmm, thats no good. I dont even see a sdb1 entry when using lsblk or ls /dev. Im not sure what that would be. Quote Link to comment
DieFalse Posted September 8, 2021 Share Posted September 8, 2021 SDB is usually Cache. SDB1 is usually cache disk one. Quote Link to comment
trurl Posted September 8, 2021 Share Posted September 8, 2021 sdb1 is the first partition on /dev/sdb. Most disks would be listed that way in syslog unless they were array devices. Those entries were before reboot and no sdb attached currently. My guess is it referred to cache also. Quote Link to comment
mkono87 Posted September 8, 2021 Author Share Posted September 8, 2021 (edited) 1 hour ago, fmp4m said: SDB is usually Cache. SDB1 is usually cache disk one. 49 minutes ago, trurl said: sdb1 is the first partition on /dev/sdb. Most disks would be listed that way in syslog unless they were array devices. Those entries were before reboot and no sdb attached currently. My guess is it referred to cache also. Yes this is what I have now. Just to let you know. My docker containers and vms are on their own separate drive. Yes its small but I have have vms atm. NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS loop0 7:0 0 13.5M 1 loop /lib/modules loop1 7:1 0 101.1M 1 loop /lib/firmware loop2 7:2 0 40G 0 loop /var/lib/docker/btrfs /var/lib/docker sda 8:0 1 7.3G 0 disk └─sda1 8:1 1 7.3G 0 part /boot sdc 8:32 0 223.6G 0 disk └─sdc1 8:33 0 223.6G 0 part /mnt/disks/docker-vm sdd 8:48 0 3.6T 0 disk └─sdd1 8:49 0 3.6T 0 part sde 8:64 0 3.6T 0 disk └─sde1 8:65 0 3.6T 0 part sdf 8:80 0 465.8G 0 disk └─sdf1 8:81 0 465.8G 0 part /mnt/cache sdg 8:96 0 3.6T 0 disk └─sdg1 8:97 0 3.6T 0 part sdh 8:112 0 3.6T 0 disk └─sdh1 8:113 0 3.6T 0 part md1 9:1 0 3.6T 0 md /mnt/disk1 md2 9:2 0 3.6T 0 md /mnt/disk2 md3 9:3 0 3.6T 0 md /mnt/disk3 Edited September 8, 2021 by mkono87 Quote Link to comment
itimpi Posted September 9, 2021 Share Posted September 9, 2021 There is a chance that ‘sdb’ is referring to one of the array drives since it is not showing up in the ‘df’ command. You can see on the Main tab the ‘sdX’ type designations for the array drives.. Quote Link to comment
mkono87 Posted September 9, 2021 Author Share Posted September 9, 2021 I did receive an error notification for the docker driver with an uncorrected 1 status. I did a smart test after it finished the parity check. (I know not related). I did get a successful smart extended test, is this not to be trusted in this case? Sent from my Mi 9T using Tapatalk Quote Link to comment
DieFalse Posted September 9, 2021 Share Posted September 9, 2021 32 minutes ago, mkono87 said: I did receive an error notification for the docker driver with an uncorrected 1 status. I did a smart test after it finished the parity check. (I know not related). I did get a successful smart extended test, is this not to be trusted in this case? Sent from my Mi 9T using Tapatalk SMART checkes drives health - not data health. You have corruption and will need to repair the corruption. Quote Link to comment
mkono87 Posted September 9, 2021 Author Share Posted September 9, 2021 SMART checkes drives health - not data health. You have corruption and will need to repair the corruption.Will it probably throw the error notification again in the near future? Quote Link to comment
trurl Posted September 9, 2021 Share Posted September 9, 2021 38 minutes ago, mkono87 said: Will it probably throw the error notification again in the near future? If you don't repair the corrupt filesystem you will get those log entries again when it is accessed Quote Link to comment
mkono87 Posted September 16, 2021 Author Share Posted September 16, 2021 On 9/9/2021 at 10:27 AM, trurl said: If you don't repair the corrupt filesystem you will get those log entries again when it is accessed It did crash again and I finally got around to doing xfs_repair on my appdata drive. Seems to have fixed itself correctly. root@NAS:~# xfs_repair /dev/sdb1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 bad CRC for inode 244937761 bad CRC for inode 244937761, will rewrite Bad atime nsec 2173239295 on inode 244937761, resetting to zero cleared inode 244937761 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 2 - agno = 3 - agno = 1 - agno = 0 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done No lost and found folder created. So I guess at this point is reset the syslog and see what happens. Quote Link to comment
mkono87 Posted September 17, 2021 Author Share Posted September 17, 2021 It's crashed again. I will need to post another log when I get the chance. I'm totally lost at this point. Quote Link to comment
turnipisum Posted September 17, 2021 Share Posted September 17, 2021 Have you run a memory test? Might be worth doing. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.