Helmonder Posted February 19, 2020 Share Posted February 19, 2020 (edited) System just crashed again... System totally unavailable, no shares, no web gui, no telnet. I can connect with IPMI , herewith what I see in the screen (see capture). Thru the console I was able to copy syslog to /boot. Herewith attached. I then gave a shutdown -r now. After the gracefull shutdown time expires the system forces a shutdown, this also does not succeed and the system "hangs" for several minutes. At what time I use the physical off switch to have the system reboot. I will reboot again... This has been going on for several weeks... Sometimes it keeps running for a few days, sometimes a week.. And then again a crash... System has been stable for -years-.. This has been happening since the last two versions (I am running the most current at the moment. It happens in and out of safe mode, no difference. To be honoust I do not know if I was running in safe mode this last crash, I might not have been, I now am again. System has booted up and is running again. syslog Edited February 27, 2020 by Helmonder Quote Link to comment
Helmonder Posted February 19, 2020 Author Share Posted February 19, 2020 (edited) I am seeing some BTFS errors.... Running a scrub ... I see the following in the console: BTRFS errors have not always been there though, I expect this to be more of a result of the constant crashes then a cause.. I think I will need to run a fsck.. Did a dev stats before: Doing a --readonly first: [1/7] checking root items [2/7] checking extents incorrect offsets 6286 3566737550 incorrect offsets 6286 3566737550 incorrect offsets 6286 3566737550 bad block 368558080 ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space cache [4/7] checking fs roots incorrect offsets 6286 3566737550 root 5 inode 28307375 errors 500, file extent discount, nbytes wrong Found file extent holes: start: 0, len: 4096 root 5 inode 28307406 errors 2001, no inode item, link count wrong unresolved ref dir 85889 index 23729 namelen 32 name dcb5f3abeccef1ddf67afb89f3041dee filetype 1 errors 4, no inode ref root 5 inode 28307407 errors 2001, no inode item, link count wrong unresolved ref dir 85889 index 23730 namelen 32 name a1ed8f397213b5bf29bcd65f83c83487 filetype 1 errors 4, no inode ref root 5 inode 28307408 errors 2001, no inode item, link count wrong unresolved ref dir 85889 index 23731 namelen 36 name a1ed8f397213b5bf29bcd65f83c83487.txt filetype 1 errors 4, no inode ref root 5 inode 28307409 errors 2001, no inode item, link count wrong unresolved ref dir 85889 index 23732 namelen 32 name 655a811d730b9a00c3603353a409b145 filetype 1 errors 4, no inode ref root 5 inode 28307410 errors 2001, no inode item, link count wrong unresolved ref dir 85889 index 23733 namelen 32 name 212a89a0586ff984eafa34bdfdca96dd filetype 1 errors 4, no inode ref root 5 inode 28307411 errors 2001, no inode item, link count wrong unresolved ref dir 85889 index 23734 namelen 36 name 212a89a0586ff984eafa34bdfdca96dd.txt filetype 1 errors 4, no inode ref root 5 inode 28307412 errors 2001, no inode item, link count wrong And a looooooooooooooooooooooooooooooot more lines of the same... Running without --readonly and with --repair now.. Following is the result... Does not seem that it can fix it: Do not use --repair unless you are advised to do so by a developer or an experienced user, and then only after having accepted that no fsck can successfully repair all types of filesystem corruption. Eg. some software or hardware bugs can fatally damage a volume. The operation will start in 10 seconds. Use Ctrl-C to stop it. 10 9 8 7 6 5 4 3 2 1[1/7] checking root items Fixed 0 roots. [2/7] checking extents incorrect offsets 6286 3566737550 incorrect offsets 6286 3566737550 incorrect offsets 6286 3566737550 items overlap, can't fix check/main.c:4333: fix_item_offset: BUG_ON `ret` triggered, value -5 /sbin/btrfs[0x42f1fd] /sbin/btrfs[0x438339] /sbin/btrfs[0x4387f0] /sbin/btrfs[0x43937c] /sbin/btrfs[0x43d255] /sbin/btrfs(main+0x90)[0x40ecc0] /lib64/libc.so.6(__libc_start_main+0xeb)[0x14f743d22e5b] /sbin/btrfs(_start+0x2a)[0x40ef4a] Repeating the --repair cycle for 3 times gives the exact same result.. Please advise ? Am now attempting to copy the complete contents of the cache drive to a folder in the array... With the intention to reformat the cache drive and copy back the data after that (doing with dockers and vm disabled so the array should be quiet...).. Errors in the log are rolling though... The same BTRFS errors.. Hoping the copy will work without a crash... I still think the BTRFS issue is not causing the crash, so should work... I will also try and change the FS type of the cachedrive to XFS.. I am not using a cache pool, just one 1TB M2 SSD... Without using a pool I do not quite see the advantage of BTRFS... There is not a lot of info on file system fixes out there.. Edited February 19, 2020 by Helmonder Quote Link to comment
JorgeB Posted February 19, 2020 Share Posted February 19, 2020 1 hour ago, Helmonder said: Do not use --repair unless you are advised to do so by a developer --repair can only fix a small number of issues, and sometimes it makes things even worse, you should backup any data on cache and re-format the pool, there are some recovery options in the FAQ if needed. Quote Link to comment
Helmonder Posted February 19, 2020 Author Share Posted February 19, 2020 I am trying to find the best solution by searching in the forum.. This was the best I could find. Quote Link to comment
Helmonder Posted February 20, 2020 Author Share Posted February 20, 2020 Copied over what I could from the cache drive (my god what does plex have a LOT of files and folders).. I have now formatted the cache drive (with XFS) and am now copying back everything... Quote Link to comment
Helmonder Posted February 21, 2020 Author Share Posted February 21, 2020 Well.. Plex could not be saved.. am reinstalling it.. The rest is up again. Without btrfs.. Quote Link to comment
Helmonder Posted February 27, 2020 Author Share Posted February 27, 2020 Issue appears to be related to a bug in the system that causes specific docker ip assignments to be unstable. More explenation in the following thread: https://forums.unraid.net/topic/89038-fatal-system-crash/?tab=comments#comment-826221 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.