Max Posted July 21 Share Posted July 21 Hey guys i noticed some corruption errors, BTRFS csum Errors and input output errors on server's syslog today, all were on cache drive and i think both time i was running tdarr. Please give suggestion on how can fix these errors, could input output errors cause corruption or vice versa. And almost a week ago similar input output and errors on cache drive occured so when i ran scrub on it its was showing 2-3 plex meta data files as corrupt so i just deleted those files as at that time my APC ups was also having some battery related issues so my server did face one or two unclean shutdowns, so i didnt think on it much but they are back again. Could it be my cache drive going bad ? unraid-diagnostics-20240721-1326.zip Quote Link to comment
JorgeB Posted July 21 Share Posted July 21 Run a correcting scrub and post the results. Quote Link to comment
Max Posted July 21 Author Share Posted July 21 23 minutes ago, JorgeB said: Run a correcting scrub and post the results. UUID: 8206daa0-8850-4ac1-8149-65d3d6d92f27 Scrub started: Sun Jul 21 15:41:00 2024 Status: finished Duration: 0:11:29 Total to scrub: 239.14GiB Rate: 355.41MiB/s Error summary: csum=1 Corrected: 0 Uncorrectable: 1 Unverified: 0 ☝️ Scrub results Jul 21 15:41:00 Unraid kernel: BTRFS info (device sdd1): scrub: started on devid 1 Jul 21 15:49:17 Unraid kernel: BTRFS warning (device sdd1): checksum error at logical 4283911495680 on dev /dev/sdd1, physical 248789721088, root 5, inode 46457306, offset 11439251456, length 4096, links 1 (path: Media/All Movies/Movies/X2 (2003)/X2 (2003) Bluray-2160p.mkv) Jul 21 15:49:17 Unraid kernel: BTRFS error (device sdd1): bdev /dev/sdd1 errs: wr 0, rd 0, flush 0, corrupt 33, gen 0 Jul 21 15:49:17 Unraid kernel: BTRFS error (device sdd1): unable to fixup (regular) error at logical 4283911495680 on dev /dev/sdd1 Jul 21 15:52:29 Unraid kernel: BTRFS info (device sdd1): scrub: finished on devid 1 with status: 0 Same file that tdarr was using earlier. Quote Link to comment
JorgeB Posted July 21 Share Posted July 21 Delete/restore that movie from a backup and run another scrub to confirm no more errors. Quote Link to comment
Max Posted July 21 Author Share Posted July 21 32 minutes ago, JorgeB said: Delete/restore that movie from a backup and run another scrub to confirm no more errors. deleted that file and scrub ran without any error this time UUID: 8206daa0-8850-4ac1-8149-65d3d6d92f27 Scrub started: Sun Jul 21 16:52:22 2024 Status: finished Duration: 0:11:20 Total to scrub: 216.22GiB Rate: 325.61MiB/s Error summary: no errors found But why is this corruption happening again and again, i forgot to mention this earlier but originally back a week ago when earlier i noticed these errors i was actually installing some docker container using community application which ended up failing due input output error thats how i noticed these errors. Any thoughts on why its this errors could occuring ? Quote Link to comment
JorgeB Posted July 21 Share Posted July 21 If it's not a one time thing there may be an underlying hardware issue, most often for this is bad RAM, start by running memtest. Quote Link to comment
Max Posted July 22 Author Share Posted July 22 14 hours ago, JorgeB said: If it's not a one time thing there may be an underlying hardware issue, most often for this is bad RAM, start by running memtest. i have tested my ram using Live mem tester plugin multiple times and it always comes out without any errors and actually last time when i had these kind of errors, you suggested that ram is generally the first thing to test in these kind of scenarios so i had replaced all my ram sticks since these Kingston hyperx sticks come with lifetime warranty so...😅 Shortly after which i got myself apc ups for my server and LSI 9207-8i and also replaced my SMPS and after which it had been running quite stably so far, so i had thought maybe it was ram even though last time too memtest didn't find any errors but now they are back again. Surely i can't be that unlucky when it comes to RAM😅 And this morning again while running appdata backup it again failed on Tar verification of plex appdata and i noticed that again syslog is showing similar errors Jul 22 03:01:26 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1 Jul 22 03:01:26 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 12, gen 0 Jul 22 03:12:59 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1 Jul 22 03:12:59 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 13, gen 0 Jul 22 03:43:30 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1 Jul 22 03:43:30 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 14, gen 0 Jul 22 04:14:00 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1 Jul 22 04:14:00 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 15, gen 0 Jul 22 11:19:27 Unraid nginx: 2024/07/22 11:19:27 [error] 19942#19942: *1010208 open() "/usr/local/emhttp/plugins/dynamix.file.manager/javascript/ace/mode-log.js" failed (2: No such file or directory) while sending to client, client: 10.0.0.23, server: , request: "GET /plugins/dynamix.file.manager/javascript/ace/mode-log.js HTTP/1.1", host: "10.0.0.3", referrer: "http://10.0.0.3/Shares/Browse?dir=%2Fmnt%2Fuser%2FBackup%2FAppdata%2Fab_20240722_021002-failed" Could this also happen due same hardware issue here. Quote Link to comment
JorgeB Posted July 22 Share Posted July 22 1 hour ago, Max said: Could this also happen due same hardware issue here. Yes, memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM. Quote Link to comment
Max Posted July 22 Author Share Posted July 22 28 minutes ago, JorgeB said: Yes, memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM. 😭 thats what I pretty much did last time around bought a new 8 gig stick used it for week and it ran stable so afterwards replaced my older 2x8 gig sticks that i was using. So i guess a year later i'm back to square one then😭 But thanks, as always you have been much help i just hope it last before i can upgrade. ( I had been thinking of upgrading my CPU ever since ryzen 5000 series came out and ryzen 9000 is almost out 😅.) Quote Link to comment
JorgeB Posted July 22 Share Posted July 22 If it's not the RAM, board/CPU would be the net suspects. Quote Link to comment
Max Posted July 22 Author Share Posted July 22 @JorgeBhey i booted my system with just 1 stick to see how it goes but now syslog is showing these errors. Jul 22 15:08:02 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1 Jul 22 15:08:02 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 16, gen 0 Jul 22 15:08:19 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1 Jul 22 15:08:19 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 17, gen 0 Jul 22 15:08:44 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1 Jul 22 15:08:44 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 18, gen 0 Jul 22 15:09:15 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1 Jul 22 15:09:15 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 19, gen 0 any ideas which drive is device loop2 ? is it cache Quote Link to comment
JorgeB Posted July 22 Share Posted July 22 Loop2 is likely the docker image, but need the diags to confirm. Quote Link to comment
Max Posted July 22 Author Share Posted July 22 1 minute ago, JorgeB said: Loop2 is likely the docker image, but need the diags to confirm. . unraid-diagnostics-20240722-1528.zip Quote Link to comment
JorgeB Posted July 22 Share Posted July 22 It is, recreate it and see if new issues come up with just that stick https://docs.unraid.net/unraid-os/manual/docker-management/#re-create-the-docker-image-file Then: https://docs.unraid.net/unraid-os/manual/docker-management/#re-installing-docker-applications Also see below if you have any custom docker networks: https://docs.unraid.net/unraid-os/manual/docker-management/#docker-custom-networks Quote Link to comment
Max Posted July 22 Author Share Posted July 22 @JorgeB So with just one stick, i deleted the docker image and slowly started reinstalling dockers through community app plugin and it was going fine up until 18th docker. 19th Docker failed with following error Error: failed to register layer: read /var/lib/docker/tmp/GetImageBlob3662119572: input/output error And syslog is showing these same btrfs errors again. Jul 22 19:35:18 Unraid kernel: BTRFS warning (device loop2): csum failed root 5 ino 2846 off 250462208 csum 0x59363698 expected csum 0x508941a2 mirror 1 Jul 22 19:35:18 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 Jul 22 19:35:19 Unraid kernel: BTRFS warning (device loop2): csum failed root 5 ino 2846 off 250462208 csum 0x59363698 expected csum 0x508941a2 mirror 1 Jul 22 19:35:19 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 Jul 22 19:35:19 Unraid kernel: BTRFS warning (device loop2): csum failed root 5 ino 2846 off 250462208 csum 0x59363698 expected csum 0x508941a2 mirror 1 Jul 22 19:35:19 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 Jul 22 19:35:19 Unraid kernel: BTRFS warning (device loop2): csum failed root 5 ino 2846 off 250462208 csum 0x59363698 expected csum 0x508941a2 mirror 1 Jul 22 19:35:19 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 So i guess its to replace with another stick and start the whole process again😭 Quote Link to comment
JorgeB Posted July 22 Share Posted July 22 40 minutes ago, Max said: So i guess its to replace with another stick and start the whole process again Yep Quote Link to comment
Max Posted July 31 Author Share Posted July 31 @JorgeB update... So far after switching to stick 2, its running stable now and no more BTRFS or input output errors, so after 5-6 days of it running stable i added last RAM Stick and its still running stable, no BTRFS or I/O Error except i got segfault errors when i tried to play a episode from a TV series over plex and so far it has only happened with that one single particular file so i just replaced that file. Jul 28 22:41:35 Unraid kernel: PMS GTP[10402]: segfault at 77 ip 00001500349a0e6c sp 000015002b925720 error 4 in Plex Media Server[1500347fd000+d08000] likely on CPU 7 (core 3, socket 0) Jul 28 22:41:57 Unraid kernel: PMS GTP[14390]: segfault at 77 ip 0000148ea43a0e6c sp 0000148e9d38c720 error 4 in Plex Media Server[148ea41fd000+d08000] likely on CPU 4 (core 0, socket 0) Jul 28 22:44:38 Unraid kernel: PMS GTP[15524]: segfault at 77 ip 000015190f7a0e6c sp 0000151908fd4720 error 4 in Plex Media Server[15190f5fd000+d08000] likely on CPU 2 (core 2, socket 0) Jul 28 22:45:00 Unraid kernel: PMS GTP[22766]: segfault at 77 ip 000014b912da0e6c sp 000014b908ce6720 error 4 in Plex Media Server[14b912bfd000+d08000] likely on CPU 5 (core 1, socket 0) So my thinking is so far it looks 1st of the three sticks was causing BTRFS Corruption and somewhere some time that file may have gone corrupt which then now caused these segfault errors when i tried to read them but you are the expert here i'm not..😅 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.