Jump to content

Input Output and some BTRFS Errors on cache drive


Recommended Posts

Hey guys i noticed some corruption errors, BTRFS csum Errors and input output errors on server's syslog today, all were on cache drive and i think both time i was running tdarr.
Please give suggestion on how can fix these errors, could input output errors cause corruption or vice versa.
And almost a week ago similar input output and errors on cache drive occured so when i ran scrub on it its was showing 2-3 plex meta data files as corrupt so i just deleted those files as at that time my APC ups was also having some battery related issues so my server did face one or two unclean shutdowns, so i didnt think on it much but they are back again.
Could it be my cache drive going bad ?

unraid-diagnostics-20240721-1326.zip

Link to comment
23 minutes ago, JorgeB said:

Run a correcting scrub and post the results.

UUID:             8206daa0-8850-4ac1-8149-65d3d6d92f27
Scrub started:    Sun Jul 21 15:41:00 2024
Status:           finished
Duration:         0:11:29
Total to scrub:   239.14GiB
Rate:             355.41MiB/s
Error summary:    csum=1
  Corrected:      0
  Uncorrectable:  1
  Unverified:     0

☝️ Scrub results

Jul 21 15:41:00 Unraid kernel: BTRFS info (device sdd1): scrub: started on devid 1
Jul 21 15:49:17 Unraid kernel: BTRFS warning (device sdd1): checksum error at logical 4283911495680 on dev /dev/sdd1, physical 248789721088, root 5, inode 46457306, offset 11439251456, length 4096, links 1 (path: Media/All Movies/Movies/X2 (2003)/X2 (2003) Bluray-2160p.mkv)
Jul 21 15:49:17 Unraid kernel: BTRFS error (device sdd1): bdev /dev/sdd1 errs: wr 0, rd 0, flush 0, corrupt 33, gen 0
Jul 21 15:49:17 Unraid kernel: BTRFS error (device sdd1): unable to fixup (regular) error at logical 4283911495680 on dev /dev/sdd1
Jul 21 15:52:29 Unraid kernel: BTRFS info (device sdd1): scrub: finished on devid 1 with status: 0

Same file that tdarr was using earlier.

Link to comment
32 minutes ago, JorgeB said:

Delete/restore that movie from a backup and run another scrub to confirm no more errors.

deleted that file and scrub ran without any error this time

UUID:             8206daa0-8850-4ac1-8149-65d3d6d92f27
Scrub started:    Sun Jul 21 16:52:22 2024
Status:           finished
Duration:         0:11:20
Total to scrub:   216.22GiB
Rate:             325.61MiB/s
Error summary:    no errors found

But why is this corruption happening again and again, i forgot to mention this earlier but originally back a week ago when earlier i noticed these errors i was actually installing some docker container using community application which ended up failing due input output error thats how i noticed these errors.
Any thoughts on why its this errors could occuring ?

Link to comment
14 hours ago, JorgeB said:

If it's not a one time thing there may be an underlying hardware issue, most often for this is bad RAM, start by running memtest.

i have tested my ram using Live mem tester plugin multiple times and it always comes out without any errors and actually last time when i had these kind of errors, you suggested that ram is generally the first thing to test in these kind of scenarios so i had replaced all my ram sticks since these Kingston hyperx sticks come with lifetime warranty so...😅
Shortly after which i got myself apc ups for my server and LSI 9207-8i and also replaced my SMPS and after which it had been running quite stably so far, so i had thought maybe it was ram even though last time too memtest didn't find any errors but now they are back again. Surely i can't be that unlucky when it comes to RAM😅


And this morning again while running appdata backup it again failed on Tar verification of plex appdata and i noticed that again syslog is showing similar errors 

 

Jul 22 03:01:26 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1
Jul 22 03:01:26 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 12, gen 0
Jul 22 03:12:59 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1
Jul 22 03:12:59 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 13, gen 0
Jul 22 03:43:30 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1
Jul 22 03:43:30 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 14, gen 0
Jul 22 04:14:00 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1
Jul 22 04:14:00 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 15, gen 0
Jul 22 11:19:27 Unraid nginx: 2024/07/22 11:19:27 [error] 19942#19942: *1010208 open() "/usr/local/emhttp/plugins/dynamix.file.manager/javascript/ace/mode-log.js" failed (2: No such file or directory) while sending to client, client: 10.0.0.23, server: , request: "GET /plugins/dynamix.file.manager/javascript/ace/mode-log.js HTTP/1.1", host: "10.0.0.3", referrer: "http://10.0.0.3/Shares/Browse?dir=%2Fmnt%2Fuser%2FBackup%2FAppdata%2Fab_20240722_021002-failed"

Could this also happen due same hardware issue here.

Link to comment
1 hour ago, Max said:

Could this also happen due same hardware issue here.

Yes, memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.

Link to comment
28 minutes ago, JorgeB said:

Yes, memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.

😭 thats what I pretty much did last time around bought a new 8 gig stick used it for week and it ran stable so afterwards replaced my older 2x8 gig sticks that i was using.

So i guess a year later i'm back to square one then😭
But thanks, as always you have been much help i just hope it last before i can upgrade. ( I had been thinking of upgrading my CPU ever since ryzen 5000 series came out and ryzen 9000 is almost out 😅.)

Link to comment

@JorgeBhey i booted my system with just 1 stick to see how it goes but now syslog is showing these errors.

Jul 22 15:08:02 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1
Jul 22 15:08:02 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 16, gen 0
Jul 22 15:08:19 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1
Jul 22 15:08:19 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 17, gen 0
Jul 22 15:08:44 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1
Jul 22 15:08:44 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 18, gen 0
Jul 22 15:09:15 Unraid kernel: BTRFS warning (device loop2): csum failed root 26983 ino 3290 off 5459968 csum 0x851cd069 expected csum 0xa5ae22ba mirror 1
Jul 22 15:09:15 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 19, gen 0

any ideas which drive is device loop2 ? is it cache

Link to comment

@JorgeB So with just one stick, i deleted the docker image and slowly started reinstalling dockers through community app plugin and it was going fine up until 18th docker.

19th Docker failed with following error

Error: failed to register layer: read /var/lib/docker/tmp/GetImageBlob3662119572: input/output error


And syslog is showing these same btrfs errors again.

Jul 22 19:35:18 Unraid kernel: BTRFS warning (device loop2): csum failed root 5 ino 2846 off 250462208 csum 0x59363698 expected csum 0x508941a2 mirror 1
Jul 22 19:35:18 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
Jul 22 19:35:19 Unraid kernel: BTRFS warning (device loop2): csum failed root 5 ino 2846 off 250462208 csum 0x59363698 expected csum 0x508941a2 mirror 1
Jul 22 19:35:19 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
Jul 22 19:35:19 Unraid kernel: BTRFS warning (device loop2): csum failed root 5 ino 2846 off 250462208 csum 0x59363698 expected csum 0x508941a2 mirror 1
Jul 22 19:35:19 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
Jul 22 19:35:19 Unraid kernel: BTRFS warning (device loop2): csum failed root 5 ino 2846 off 250462208 csum 0x59363698 expected csum 0x508941a2 mirror 1
Jul 22 19:35:19 Unraid kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0

So i guess its to replace with another stick and start the whole process again😭

Link to comment
  • 2 weeks later...

@JorgeB update...
So far after switching to stick 2, its running stable now and no more BTRFS or input output errors, so after 5-6 days of it running stable i added last RAM Stick and its still running stable, no BTRFS or I/O Error except i got segfault errors when i tried to play a episode from a TV series over plex and so far it has only happened with that one single particular file so i just replaced that file.

 

Jul 28 22:41:35 Unraid kernel: PMS GTP[10402]: segfault at 77 ip 00001500349a0e6c sp 000015002b925720 error 4 in Plex Media Server[1500347fd000+d08000] likely on CPU 7 (core 3, socket 0)
Jul 28 22:41:57 Unraid kernel: PMS GTP[14390]: segfault at 77 ip 0000148ea43a0e6c sp 0000148e9d38c720 error 4 in Plex Media Server[148ea41fd000+d08000] likely on CPU 4 (core 0, socket 0)
Jul 28 22:44:38 Unraid kernel: PMS GTP[15524]: segfault at 77 ip 000015190f7a0e6c sp 0000151908fd4720 error 4 in Plex Media Server[15190f5fd000+d08000] likely on CPU 2 (core 2, socket 0)
Jul 28 22:45:00 Unraid kernel: PMS GTP[22766]: segfault at 77 ip 000014b912da0e6c sp 000014b908ce6720 error 4 in Plex Media Server[14b912bfd000+d08000] likely on CPU 5 (core 1, socket 0)


So my thinking is so far it looks 1st of the three sticks was causing BTRFS Corruption and somewhere some time that file may have gone corrupt which then now caused these segfault errors when i tried to read them but you are the expert here i'm not..😅

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...