I need support :( - BTRFS critical (device md2)


dezai

Recommended Posts

Hi Guys,

 

at first sorry for my bad english :).

 

I was realy happy since last week with my unraid build.

It is online since round about 1-2 years with no problems with a socket 1150 base.

I upgraded to a ryzen plattform last week and i used the tutorial from spaceinvade one (thanks to you man, the videos are realy amazing).

I´ve added the line, 

"kernel /bzimage
append rcu_nocbs=0-15 initrd=/bzroot"

to the usb stick for my ryzen 7 1700

 

So since last weekend my unraid build is spamming BTRFS cristical (device md2) all over the screen if the array is started.

And after round about 20 minutes it came to a dead end where the server is shutting down - hang up.

With "end trace e162fe777f177cd9"

 

At first i thought it is the parity drive, so i removed the parity drive but the problem is the same.

I also diabled the cache array - same problem.

 

Don´t worry i have a backup from all that data.

 

Sooo....what do you need from me to help me with this problem?

 

Syslog attatched.

 

at this moment when the errors appear i turned off the array.

 

and if i put the parity into the array (actually i preclear the parity), these errors appears without turning off the array.

 

At this point i don´t know what to do anymore......

 

 

syslog.txt

server-diagnostics-20200217-2154.zip

Edited by dezai
Link to comment

Oh ok that could be the problem.

 

Today i thought it is a good idea to start from zero and i formatted all the drives - create new cache and drive pool.

But the server ist now offline after 2 hours.

 

I know, that i´ve set the ram 4x 8GB DDR4 3200mhz ram with xmp to 3200mhz with 4x8tgb.

So yes.....that is the problem i think.

 

I´m currently at the office - i will check this today and read the link from you - but after the first informations,

ram is an obvious problem in my configuration.

Link to comment
17 hours ago, dezai said:

BTRFS cristical (device md2) ...

 

At first i thought it is the parity drive, so i removed the parity drive but the problem is the same.

I also diabled the cache array - same problem.

For future reference, md2 is referring specifically to the disk assigned as disk2 in the parity array, so there was no reason to think doing anything with parity or cache would help.

Link to comment
6 hours ago, trurl said:

For future reference, md2 is referring specifically to the disk assigned as disk2 in the parity array, so there was no reason to think doing anything with parity or cache would help.

Ahhhhh ok i didn´t know that but i also received informations, that sdj (parity disk) spit out errors so i wanted to start from zero :)

I have a backup for all data so that is now big problem.

Just a matter of time :)

12 hours ago, johnnie.black said:

Disk2 should be backed up and re-formatted, a btrfs filesystem will quick corrupt with bad hardware, like bad RAM, check this to make sure your RAM isn't being overclocked, which is a known cause of data corruption with some Ryzen systems.

Ok i realy think this solved the problem for me.

If i set everything on auto in the bios it set the ram to 1866mhz, so i set this to manualy 1866 mhz and actually everything is fine after 2 hours.

 

I´m starting to put the files from the backup onto the array and see tomorrow what the logs looks like.

 

But at this moment - zero errors.

 

Thank you very very much.

Link to comment

Hi Guy´s update time.......

 

.....pc doesnt´crash and BTRFS errors are away - now i have the same problems in you´re link:

 

 

 

I set the option for typical current idle but it did not fixed my problems.

Attatched new diagnostics file.

 

Problem is - my array is working fine - but i can´t open my docker containers and there are new errors in the diagnostics....:

Feb 19 19:18:31 Server kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 19 19:18:31 Server kernel: ata10.00: failed command: WRITE DMA Feb 19 19:18:31 Server kernel: ata10: hard resetting link

 

Is there a known fix for that?

 

I´m realy sad about the actual situation because i had everythng running one week ago.....

 

Could it be a problem with the sata controller i´ve built in?

https://www.amazon.de/gp/product/B07THFN3Q2/ref=ppx_yo_dt_b_asin_title_o07_s00?ie=UTF8&psc=1

 

server-diagnostics-20200219-1915.zip

 

Edited by dezai
Link to comment
14 minutes ago, dezai said:

I´m using actually a 4 port sata controller

Can you post a link to the model you're using? I knew there are 6 and 10 ports models, but they all look the same on the syslog, they appear as 12 ports with 2 dummy ports.


 

Feb 19 19:08:51 Server kernel: ahci 0000:03:00.0: AHCI 0001.0301 32 slots 12 ports 6 Gbps 0xff3 impl SATA mode
...
Feb 19 19:08:51 Server kernel: ata9: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580100 irq 49
Feb 19 19:08:51 Server kernel: ata10: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580180 irq 49
Feb 19 19:08:51 Server kernel: ata11: DUMMY
Feb 19 19:08:51 Server kernel: ata12: DUMMY
Feb 19 19:08:51 Server kernel: ata13: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580300 irq 49
Feb 19 19:08:51 Server kernel: ata14: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580380 irq 49
Feb 19 19:08:51 Server kernel: ata15: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580400 irq 49
Feb 19 19:08:51 Server kernel: ata16: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580480 irq 49
Feb 19 19:08:51 Server kernel: ata17: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580500 irq 49
Feb 19 19:08:51 Server kernel: ata18: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580580 irq 49
Feb 19 19:08:51 Server kernel: ata19: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580600 irq 49

 

 

Link to comment
Can you post a link to the model you're using? I knew there are 6 and 10 ports models, but they all look the same on the syslog, they appear as 12 ports with 2 dummy ports.  

 

 

Feb 19 19:08:51 Server kernel: ahci 0000:03:00.0: AHCI 0001.0301 32 slots 12 ports 6 Gbps 0xff3 impl SATA mode...Feb 19 19:08:51 Server kernel: ata9: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580100 irq 49Feb 19 19:08:51 Server kernel: ata10: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580180 irq 49Feb 19 19:08:51 Server kernel: ata11: DUMMYFeb 19 19:08:51 Server kernel: ata12: DUMMYFeb 19 19:08:51 Server kernel: ata13: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580300 irq 49Feb 19 19:08:51 Server kernel: ata14: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580380 irq 49Feb 19 19:08:51 Server kernel: ata15: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580400 irq 49Feb 19 19:08:51 Server kernel: ata16: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580480 irq 49Feb 19 19:08:51 Server kernel: ata17: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580500 irq 49Feb 19 19:08:51 Server kernel: ata18: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580580 irq 49Feb 19 19:08:51 Server kernel: ata19: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580600 irq 49

 

 

 

The First Controller was 6 Port that was correct. I thought that i have a funcional 4 Port but With that one of geht read Errors With the connected SSD Cache.  

So i Turn down the Box for a cupple of days and wait for the 9211-8i which ist already IT Mode flashed.

 

I hope that will fix the Problems.....

 

Product links to the old Cards

 

https://www.amazon.de/dp/B00AZ9T3OU/ref=cm_sw_r_other_apa_i_vVItEbNVE12EZ

 

https://www.amazon.de/dp/B07THFN3Q2/ref=cm_sw_r_other_apa_i_nUItEb0X2P25H

 

 

 

 

 

Edited by dezai
Link to comment

Thank you for your help johnnie.black.

 

The Server is up and running since yesterday :)

No more bad logs - all the Dockers and VM´s are back up and running.

Everything is fine with the LSI HBA :)

 

Currently i´m only waiting for a good price for 8-14 TB HDD´s to upgrade my server and build a backup Server with the 3TB WD Drives.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.