dezai Posted February 17, 2020 Share Posted February 17, 2020 (edited) Hi Guys, at first sorry for my bad english :). I was realy happy since last week with my unraid build. It is online since round about 1-2 years with no problems with a socket 1150 base. I upgraded to a ryzen plattform last week and i used the tutorial from spaceinvade one (thanks to you man, the videos are realy amazing). I´ve added the line, "kernel /bzimage append rcu_nocbs=0-15 initrd=/bzroot" to the usb stick for my ryzen 7 1700 So since last weekend my unraid build is spamming BTRFS cristical (device md2) all over the screen if the array is started. And after round about 20 minutes it came to a dead end where the server is shutting down - hang up. With "end trace e162fe777f177cd9" At first i thought it is the parity drive, so i removed the parity drive but the problem is the same. I also diabled the cache array - same problem. Don´t worry i have a backup from all that data. Sooo....what do you need from me to help me with this problem? Syslog attatched. at this moment when the errors appear i turned off the array. and if i put the parity into the array (actually i preclear the parity), these errors appears without turning off the array. At this point i don´t know what to do anymore...... syslog.txt server-diagnostics-20200217-2154.zip Edited February 17, 2020 by dezai Quote Link to comment
JorgeB Posted February 18, 2020 Share Posted February 18, 2020 Disk2 should be backed up and re-formatted, a btrfs filesystem will quick corrupt with bad hardware, like bad RAM, check this to make sure your RAM isn't being overclocked, which is a known cause of data corruption with some Ryzen systems. Quote Link to comment
dezai Posted February 18, 2020 Author Share Posted February 18, 2020 Oh ok that could be the problem. Today i thought it is a good idea to start from zero and i formatted all the drives - create new cache and drive pool. But the server ist now offline after 2 hours. I know, that i´ve set the ram 4x 8GB DDR4 3200mhz ram with xmp to 3200mhz with 4x8tgb. So yes.....that is the problem i think. I´m currently at the office - i will check this today and read the link from you - but after the first informations, ram is an obvious problem in my configuration. Quote Link to comment
trurl Posted February 18, 2020 Share Posted February 18, 2020 17 hours ago, dezai said: BTRFS cristical (device md2) ... At first i thought it is the parity drive, so i removed the parity drive but the problem is the same. I also diabled the cache array - same problem. For future reference, md2 is referring specifically to the disk assigned as disk2 in the parity array, so there was no reason to think doing anything with parity or cache would help. Quote Link to comment
dezai Posted February 18, 2020 Author Share Posted February 18, 2020 6 hours ago, trurl said: For future reference, md2 is referring specifically to the disk assigned as disk2 in the parity array, so there was no reason to think doing anything with parity or cache would help. Ahhhhh ok i didn´t know that but i also received informations, that sdj (parity disk) spit out errors so i wanted to start from zero I have a backup for all data so that is now big problem. Just a matter of time 12 hours ago, johnnie.black said: Disk2 should be backed up and re-formatted, a btrfs filesystem will quick corrupt with bad hardware, like bad RAM, check this to make sure your RAM isn't being overclocked, which is a known cause of data corruption with some Ryzen systems. Ok i realy think this solved the problem for me. If i set everything on auto in the bios it set the ram to 1866mhz, so i set this to manualy 1866 mhz and actually everything is fine after 2 hours. I´m starting to put the files from the backup onto the array and see tomorrow what the logs looks like. But at this moment - zero errors. Thank you very very much. Quote Link to comment
dezai Posted February 19, 2020 Author Share Posted February 19, 2020 (edited) Hi Guy´s update time....... .....pc doesnt´crash and BTRFS errors are away - now i have the same problems in you´re link: I set the option for typical current idle but it did not fixed my problems. Attatched new diagnostics file. Problem is - my array is working fine - but i can´t open my docker containers and there are new errors in the diagnostics....: Feb 19 19:18:31 Server kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 19 19:18:31 Server kernel: ata10.00: failed command: WRITE DMA Feb 19 19:18:31 Server kernel: ata10: hard resetting link Is there a known fix for that? I´m realy sad about the actual situation because i had everythng running one week ago..... Could it be a problem with the sata controller i´ve built in? https://www.amazon.de/gp/product/B07THFN3Q2/ref=ppx_yo_dt_b_asin_title_o07_s00?ie=UTF8&psc=1 server-diagnostics-20200219-1915.zip Edited February 19, 2020 by dezai Quote Link to comment
JorgeB Posted February 19, 2020 Share Posted February 19, 2020 You're using a 6 or 10 port Asmedia controller with SATA port multipliers, they are a known problem. Quote Link to comment
dezai Posted February 19, 2020 Author Share Posted February 19, 2020 So switching to a LSI 9211-i8 flashed it mode can solve the problem? Quote Link to comment
JorgeB Posted February 19, 2020 Share Posted February 19, 2020 At least that problem yes, note that the LSI won't trim the SSDs, so you should connect them onboard, use the LSI for HDDs. Quote Link to comment
dezai Posted February 19, 2020 Author Share Posted February 19, 2020 Ok i´m done with that card - ordered now a HBA. Hope this will arrive asap. I´m using actually a 4 port sata controller and actually everything is fine. I hope everything is fixed with the hba. Quote Link to comment
JorgeB Posted February 19, 2020 Share Posted February 19, 2020 14 minutes ago, dezai said: I´m using actually a 4 port sata controller Can you post a link to the model you're using? I knew there are 6 and 10 ports models, but they all look the same on the syslog, they appear as 12 ports with 2 dummy ports. Feb 19 19:08:51 Server kernel: ahci 0000:03:00.0: AHCI 0001.0301 32 slots 12 ports 6 Gbps 0xff3 impl SATA mode ... Feb 19 19:08:51 Server kernel: ata9: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580100 irq 49 Feb 19 19:08:51 Server kernel: ata10: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580180 irq 49 Feb 19 19:08:51 Server kernel: ata11: DUMMY Feb 19 19:08:51 Server kernel: ata12: DUMMY Feb 19 19:08:51 Server kernel: ata13: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580300 irq 49 Feb 19 19:08:51 Server kernel: ata14: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580380 irq 49 Feb 19 19:08:51 Server kernel: ata15: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580400 irq 49 Feb 19 19:08:51 Server kernel: ata16: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580480 irq 49 Feb 19 19:08:51 Server kernel: ata17: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580500 irq 49 Feb 19 19:08:51 Server kernel: ata18: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580580 irq 49 Feb 19 19:08:51 Server kernel: ata19: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580600 irq 49 Quote Link to comment
dezai Posted February 20, 2020 Author Share Posted February 20, 2020 (edited) Can you post a link to the model you're using? I knew there are 6 and 10 ports models, but they all look the same on the syslog, they appear as 12 ports with 2 dummy ports. Feb 19 19:08:51 Server kernel: ahci 0000:03:00.0: AHCI 0001.0301 32 slots 12 ports 6 Gbps 0xff3 impl SATA mode...Feb 19 19:08:51 Server kernel: ata9: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580100 irq 49Feb 19 19:08:51 Server kernel: ata10: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580180 irq 49Feb 19 19:08:51 Server kernel: ata11: DUMMYFeb 19 19:08:51 Server kernel: ata12: DUMMYFeb 19 19:08:51 Server kernel: ata13: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580300 irq 49Feb 19 19:08:51 Server kernel: ata14: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580380 irq 49Feb 19 19:08:51 Server kernel: ata15: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580400 irq 49Feb 19 19:08:51 Server kernel: ata16: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580480 irq 49Feb 19 19:08:51 Server kernel: ata17: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580500 irq 49Feb 19 19:08:51 Server kernel: ata18: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580580 irq 49Feb 19 19:08:51 Server kernel: ata19: SATA max UDMA/133 abar m8192@0xf6580000 port 0xf6580600 irq 49 The First Controller was 6 Port that was correct. I thought that i have a funcional 4 Port but With that one of geht read Errors With the connected SSD Cache. So i Turn down the Box for a cupple of days and wait for the 9211-8i which ist already IT Mode flashed. I hope that will fix the Problems..... Product links to the old Cards https://www.amazon.de/dp/B00AZ9T3OU/ref=cm_sw_r_other_apa_i_vVItEbNVE12EZ https://www.amazon.de/dp/B07THFN3Q2/ref=cm_sw_r_other_apa_i_nUItEb0X2P25H Edited February 20, 2020 by dezai Quote Link to comment
dezai Posted February 20, 2020 Author Share Posted February 20, 2020 lame.........HBA deliverydate is next monday Quote Link to comment
dezai Posted February 28, 2020 Author Share Posted February 28, 2020 Thank you for your help johnnie.black. The Server is up and running since yesterday No more bad logs - all the Dockers and VM´s are back up and running. Everything is fine with the LSI HBA Currently i´m only waiting for a good price for 8-14 TB HDD´s to upgrade my server and build a backup Server with the 3TB WD Drives. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.