gdeyoung Posted January 15, 2019 Share Posted January 15, 2019 (edited) Keep getting recurring errors in my system log: Jan 15 10:06:19 NASserver kernel: BTRFS critical (device md6): corrupt leaf, slot offset bad: block=1187753492480, root=1, slot=159 Jan 15 10:06:19 NASserver kernel: BTRFS critical (device md6): corrupt leaf, slot offset bad: block=1187753492480, root=1, slot=159 And then this one repeats every 10 seconds: Jan 15 10:53:36 NASserver kernel: pcieport 0000:00:03.2: AER: Corrected error received: id=0000 Jan 15 10:53:36 NASserver kernel: pcieport 0000:00:03.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=001a(Transmitter ID) Jan 15 10:53:36 NASserver kernel: pcieport 0000:00:03.2: device [1022:1453] error status/mask=00001000/00006000 Jan 15 10:53:36 NASserver kernel: pcieport 0000:00:03.2: [12] Replay Timer Timeout Not sure if I have had a flaky disk (md6) and that has caused the stability issues. Need help in interpreting what is happening with the system. Before I noticed the errors in the log this morning I was attempting to upgrade the system to 6.6.6 last night. When I upgraded to 6.6.6 last night, the system stopped disk 6/md6 in the middle of the night and make it read-only and it really screwed up the system. Backed off the update this morning to 6.5.3 and it is all stable again. This is not the first issue with this system and the 6.6.x update. Previous upgrades to 6.6.x can't stay up without the shares disappearing after 24 hours of uptime. When on 6.5.x months of stable uptime. Edited January 15, 2019 by gdeyoung Quote Link to comment
trurl Posted January 15, 2019 Share Posted January 15, 2019 syslog snippets are seldom sufficient😉 Go to Tools - Diagnostics and attach the complete diagnostics zip file to your next post. Quote Link to comment
gdeyoung Posted January 15, 2019 Author Share Posted January 15, 2019 nasserver-diagnostics-20190115-1127.zip As requested Quote Link to comment
JorgeB Posted January 15, 2019 Share Posted January 15, 2019 Disk6 has metadata corruption, best way to resolve is to backup/move the data, format that disk and move the data back. As for what caused the corruption, I see you're using Ryzen, make sure RAM isn't overclocked, several cases on the forum where Ryzen will corrupt data with overclocked RAM, or better yet use a server with ECC RAM. Also metadata for btrfs withi HDDs should be used with the dup (duplicate) profile, so it can recover from some corruptions, this is going to be the default for the future, but it's not currently, you can convert each disk with: btrfs balance start -mconvert=dup /mnt/diskX For the PCIe erros, this is usually a board compatibility problem, look for a bios update, change the offending card to a different PCIe slot, or replace the board with a different model. Quote Link to comment
gdeyoung Posted January 18, 2019 Author Share Posted January 18, 2019 Thanks for the reply. I have been working on this and running into a few errors. I'm trying to empty the md6 drive (the one with the BTRFS errors) to be able to reformat. The metadata problem is so bad it fails on the copy so I can't empty the drive. I have two copies of the data on other servers. I just want to reformat the drive and sync the data back. How do I reformat and not have a rebuild on a drive I can't empty? Quote Link to comment
JorgeB Posted January 18, 2019 Share Posted January 18, 2019 4 minutes ago, gdeyoung said: How do I reformat and not have a rebuild on a drive I can't empty? Stop the array, on the main page click on that disk and change the filesystem to a different one, start array, format the disk, repeat the process to go back to the original filesystem. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.