Jump to content

BTRFS error (device dm-7): - RAM or SSD related?


Recommended Posts

Hey!

 

I upgraded my unraid system recently and after that i got alot of errors.

My upgrade was RAM, CPU, Motherboard and also i added a SSD m.2 cache drive.

 

My system;

ASUSTeK COMPUTER INC. ROG STRIX Z370-H GAMING (Latest BIOS that supports 128GB ram)

Corsair 128GB (4x32GB) DDR4 3600MHz

Intel® Core™ i7-8700K CPU @ 3.70GHz

 

I have provided my diagnostics and also a picture of my Memcheck (yes i didn't let it finish, but since it gave me errors i just gave up after 17h)

 

So, the problem began with me finding out my cache didn't transfer all my files, so i starded woundering, huh? Why dosen't it transfer the files? And after seeing the syslog, i noticed alot of errors like this;

 

May  4 17:13:58 Slave kernel: BTRFS warning (device dm-7): csum failed root 5 ino 617557 off 90546176 csum 0xb6988775 expected csum 0xce5d6bad mirror 1
May  4 17:13:58 Slave kernel: BTRFS error (device dm-7): bdev /dev/mapper/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 589, gen 0
May  4 17:13:58 Slave kernel: BTRFS warning (device dm-7): csum failed root 5 ino 617557 off 90546176 csum 0xb6988775 expected csum 0xce5d6bad mirror 1
May  4 17:13:58 Slave kernel: BTRFS error (device dm-7): bdev /dev/mapper/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 590, gen 0
May  4 17:13:58 Slave kernel: BTRFS warning (device dm-7): csum failed root 5 ino 617557 off 90546176 csum 0xb6988775 expected csum 0xce5d6bad mirror 1
May  4 17:13:58 Slave kernel: BTRFS error (device dm-7): bdev /dev/mapper/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 591, gen 0
May  4 17:13:58 Slave kernel: BTRFS warning (device dm-7): csum failed root 5 ino 617557 off 90546176 csum 0xb6988775 expected csum 0xce5d6bad mirror 1

 

So googled a bit about the problem and found that people was having this error when the RAM was failing, and since i have OC my RAM (bc it's 3600MHz) i first tried to make it to normal, 1.2V and 2133MHz, but the errors remains.

And that's when i started the memcheck, that gave me the errors provided in the pic.

 

So yes, the memory is only 2 weeks old and it can be DOA, but what confuses me is when i run mover, AKA triggering the cache to move to array, i get these errors - is that normal when RAM is failing, or should i also concidering having a broken m.2 drive?

May  4 17:13:58 Slave kernel: BTRFS error (device dm-7): bdev /dev/mapper/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 591, gen 0
May  4 17:13:58 Slave kernel: BTRFS warning (device dm-7): csum failed root 5 ino 617557 off 90546176 csum 0xb6988775 expected csum 0xce5d6bad mirror 1
May  4 17:13:58 Slave kernel: BTRFS error (device dm-7): bdev /dev/mapper/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 592, gen 0
May  4 17:13:58 Slave  shfs: copy_file: /PATHTOTHEFILE /PATHTOTHEFILEAGAIN.partial (5) Input/output error
May  4 17:13:58 Slave kernel: BTRFS warning (device dm-7): csum failed root 5 ino 617557 off 90546176 csum 0xb6988775 expected csum 0xce5d6bad mirror 1
May  4 17:13:58 Slave kernel: BTRFS error (device dm-7): bdev /dev/mapper/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 593, gen 0

 

I am thankful for all the help i can get!

 

EDIT
I also wan't to add that my last parity check, i got over 500 errors.. Maybe it's related!
 

image0.jpeg

slave-diagnostics-20230503-1928.zip

Edited by DjJoakim
Link to comment
21 minutes ago, JonathanM said:

Replace the failing memory. Don't try to use the machine until memtest passes at least 1 full pass, preferably let it run for 24 hours.

 

Do you think all my problems are related to the memory?
I have already contacted the supplier where i bought the memory, and they will replace it under warranty. The machine is working fine (if you don't read the syslog), should i be worried having it running until i replace the memories?

 

Thanks

Link to comment
On 5/4/2023 at 8:37 PM, itimpi said:

You will have to correct any current corruption after the RAM is fixed, but then hopefully it will stop happening.

 

So i changed the memory to 2133MHz again, and i am not getting any errors when the server is running, but when i try to run a mover, i get BTRFS errors, does that indicate that some files is corrupt becouse of the broken memory before, or is my memory still bad?
I will run a memcheck again on 2133MHz and leave it for 24h just to be sure...

Link to comment
26 minutes ago, DjJoakim said:

 

So i changed the memory to 2133MHz again, and i am not getting any errors when the server is running, but when i try to run a mover, i get BTRFS errors, does that indicate that some files is corrupt becouse of the broken memory before, or is my memory still bad?
I will run a memcheck again on 2133MHz and leave it for 24h just to be sure...

Simply fixing the ram will not fix existing errors so they need to be fixed.    Only after that has been done do you worry about any new ones.

Link to comment
6 minutes ago, itimpi said:

Simply fixing the ram will not fix existing errors so they need to be fixed.    Only after that has been done do you worry about any new ones.

 

Maybe a stupid question, but how do i fix the existing errors? Remove the files i think is corrupted, or is there some other way?

 

Thanks.

Link to comment
8 hours ago, JorgeB said:

Run a scrub, any corrupt files will be listed in the syslog, then delete/restore those files from a backup.

 

Sorry for being a totally noob, but this isn't the same as parity, right? I did some googleing about it and can't find so much information on how to get it done, and some people are talking about scrub being done during parity, but i don't really know what to believe.. 

Link to comment

You can run a scrub by clicking on the drive (or first member if a pool) and selecting the scrub option.

 

a scrub is completely independent of the Unraid’s parity system.   A btrfs formatted drive has internal block checksums so that it can check its own data integrity.

Link to comment
3 minutes ago, itimpi said:

You can run a scrub by clicking on the drive (or first member if a pool) and selecting the scrub option.

 

a scrub is completely independent of the Unraid’s parity system.   A btrfs formatted drive has internal block checksums so that it can check its own data integrity.

 

Oh okey i see, but hold on a minute now... i didn't get the math right.. my array is in XFS format, but my cache is in BTRFS format. Could this mean that the BTRFS errors i get are related to the cache drive? 
Well, then i should't get the erorrs when i run memcheck, right?

 

Feels like i don't have to run a scrub on my whole array, maybe only on the cache, am i right?

 

Thanks for clearing things out for me

Link to comment
3 minutes ago, DjJoakim said:

Could this mean that the BTRFS errors i get are related to the cache drive?

Yes.

 

3 minutes ago, DjJoakim said:

Well, then i should't get the erorrs when i run memcheck, right?

Bad RAM will corrupt data, btrfs detects corrupt data, so problem is related.

Link to comment
Just now, JorgeB said:

Yes.

 

Bad RAM will corrupt data, btrfs detects corrupt data, so problem is related.

 

Alright, then it feels like i got bad RAM (since memcheck gave me errors) and that made corruption on my cache drive, witch now is still there.
I will get new RAM from the retailer, as soon as i get them - i will run a scrub on the cache, and hopefully i don't have alot of data corruption in my array..

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...