Runaround Posted March 21, 2022 Share Posted March 21, 2022 (edited) Hello, I'm trying to replace my existing 10TB parity drive with a 12 TB while moving the 10TB down to replace a smaller drive. I've done this previously a few times (to get up to the 10 TB) and I'm following the guide. Once I swap the drive around in the GUI and start the copy the server is locking up. I tried tracking the progress this time and the last report I saw in the GUI was 86% done. I have attempted the copy 3 times now. Any ideas on what I should look at to find the cause? nas4-diagnostics-20220322-0829.zip Edited March 31, 2022 by Runaround Issue resolved Quote Link to comment
ChatNoir Posted March 21, 2022 Share Posted March 21, 2022 3 hours ago, Runaround said: I'm trying to replace my existing 10TB parity drive with a 12 TB while moving the 10TB down to replace a smaller drive. I've done this previously a few times (to get up to the 10 TB) and I'm following the guide. Once I swap the drive around in the GUI and start the copy the server is locking up. I tried tracking the progress this time and the last report I saw in the GUI was 86% done. I have attempted the copy 3 times now. Any ideas on what I should look at to find the cause? You could have better informed suggestion if you attach your diagnostics to your next post. Quote Link to comment
Runaround Posted March 22, 2022 Author Share Posted March 22, 2022 15 hours ago, ChatNoir said: You could have better informed suggestion if you attach your diagnostics to your next post. Thanks - I added the diagnostics to the initial post to make it easier to find. I started a preclear pass on the disk too. It's 94% done with the pre-read phase currently. Quote Link to comment
JorgeB Posted March 22, 2022 Share Posted March 22, 2022 Mar 21 21:28:25 NAS4 kernel: BTRFS error (device sdi1): block=892079783936 write time tree block corruption detected Write time corruption detected by btrfs is 9 out of 10 times due to bad RAM. Ryzen with overclocked RAM like you have is known to corrupt data, see here, also make sure power supply idle control is correctly set. Quote Link to comment
Runaround Posted March 22, 2022 Author Share Posted March 22, 2022 1 minute ago, JorgeB said: Mar 21 21:28:25 NAS4 kernel: BTRFS error (device sdi1): block=892079783936 write time tree block corruption detected Write time corruption detected by btrfs is 9 out of 10 times due to bad RAM. Ryzen with overclocked RAM like you have is known to corrupt data, see here, also make sure power supply idle control is correctly set. Thanks for the info and detailed post you linked. I'll be adjusting the memory timings today. I moved my old desktop CPU/Motherboard over 4 months ago and I didn't think about the Memory timings. I just noticed this round of btrfs issues happening today. It got corrupted last week and I was able to recover the data and re-format the disk. I thought it was working normally after that. I have had a bit of a history of issues with this particular SSD I'm using for cache. I was getting tons of CRC errors from the firmware on my HBA card when I initially installed it, but it had been working well till it had an issue last week. I'm thinking of replacing it just in case. Do you think that is contributing to the server locking up on the Parity copy as well? I've actually not had this server lock up in several years. Quote Link to comment
JorgeB Posted March 22, 2022 Share Posted March 22, 2022 1 hour ago, Runaround said: Do you think that is contributing to the server locking up on the Parity copy as well? Could be, difficult to say, locking up with Ryzen is usually more related to the other issue I mentioned. Quote Link to comment
Runaround Posted March 22, 2022 Author Share Posted March 22, 2022 4 hours ago, JorgeB said: Could be, difficult to say, locking up with Ryzen is usually more related to the other issue I mentioned. I set the CPU and Memory speeds to "auto" in the BIOS. The memory is now running at the speeds you linked to earlier. The BTRFS errors are gone from the logs now. Sadly, my server is acting more unstable now. I see errors showing read issues on SDA so maybe the flash drive hasn't liked having the lock ups and power cycles? I had the syslog viewer up on the screen when it crashed so I saved it and there is a screenshot from a monitor I've connected. Not sure if it's related, but there was a pre-clear running every time it has crashed. It was in the writing zero's phase. System Log 2.html Quote Link to comment
Runaround Posted March 23, 2022 Author Share Posted March 23, 2022 (edited) Next update: I powered off the server and took out the flash drive. Did a checkdisk on the flash drive on my PC and booted back up the server. No more errors for the flash drive in the log now and the server isn't crashing within an hour now. There was a curious number of settings that were changed / old in my server after booting though. I've corrected them all now. I also changed the power settings idle control (I think, it's named a little oddly) I picked up a smaller drive so I can try to replace my bad disk without having to copy parity first. It's clearing now, so hopefully I'll have more progress tomorrow. Thanks again for the help provided so far. Edited March 23, 2022 by Runaround Quote Link to comment
Runaround Posted March 24, 2022 Author Share Posted March 24, 2022 The server was stable yesterday and I've started a drive replacement. It's about 60% complete so hopefully it finishes. Once it does, I'll try to swap out the parity drive again. Quote Link to comment
Runaround Posted March 26, 2022 Author Share Posted March 26, 2022 Well, I thought things were better, but the server crashed again last night. I was doing another parity copy. The last status I saw was at 96%. I’ve attached what was on the screen this morning. Any ideas? Quote Link to comment
JorgeB Posted March 26, 2022 Share Posted March 26, 2022 You can enable the syslog server and then post that after a crash to see if there's anything logged, but if it's a hardware issue there probably won't be. Quote Link to comment
Runaround Posted March 27, 2022 Author Share Posted March 27, 2022 On 3/26/2022 at 7:47 AM, JorgeB said: You can enable the syslog server and then post that after a crash to see if there's anything logged, but if it's a hardware issue there probably won't be. Here is the syslog from the flash drive and a fresh copy of diagnostics. I didn't really see anything in the syslog though. It's really strange that this is only happening for the cache drive copy. I was able to rebuild a disk just fine. syslog nas4-diagnostics-20220327-1118.zip Quote Link to comment
JorgeB Posted March 28, 2022 Share Posted March 28, 2022 16 hours ago, Runaround said: I didn't really see anything in the syslog though. Same, this suggests a hardware issue. Quote Link to comment
Runaround Posted March 28, 2022 Author Share Posted March 28, 2022 5 hours ago, JorgeB said: Same, this suggests a hardware issue. I really appreciate your assistance with this. I have corrected a lot of configuration issues that I didn't know about already. I ran memtest and there was an error, so I'll go through each of the DIMMs to figure out what's wrong. For now, I did a test on one stick. and it passed. I have it in alone and I'm trying to pre-clear a one of these 12 TB disks. It so odd to me that I only see an issue when dealing with these 12 TB disks though. Quote Link to comment
Runaround Posted March 30, 2022 Author Share Posted March 30, 2022 To hurry along the 12 TB drive issue, I confirmed a single stick was good with MemTest and tried a pre-clear a 12 TB disk with only that stick. It completed yesterday for the 1st time. Thanks again @JorgeB! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.