soysauce10 Posted August 11, 2022 Share Posted August 11, 2022 (edited) I think I made a huge mistake in replacing 3 drives at once from an array of 6 disks and 1 parity drive. Here are the following steps I took: took screenshot of disks in functioning array used Unbalance to move data out of 3 disks that are to be replaced powered down array and server replaced 3 disks in array powered on server new config which wiped out parity disk assigned the 3 new disks started array and it started rebuilding the parity but server crashes I wanted to get some advice on next steps. Based on some research on these forums and reddit, I think my next steps are to unassign the 3 new drives and let the parity rebuild. While parity is rebuilding, prepare new disks by preclearing them. When parity and preclear is complete, add and assign new drives. Is that right? Edited August 11, 2022 by soysauce10 added additional step missing Quote Link to comment
JorgeB Posted August 11, 2022 Share Posted August 11, 2022 Doing a parity sync should not crash the server, please post the diagnostics. Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 23 minutes ago, soysauce10 said: new config which wiped out parity disk So you don't care about any of the data that was on the original disks? Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 24 minutes ago, soysauce10 said: I think my next steps are to unassign the 3 new drives and let the parity rebuild. While parity is rebuilding, prepare new disks by preclearing them. When parity and preclear is complete, add and assign new drives. Is that right? None of that should matter. What about the original disks? What were your plans for that data? Quote Link to comment
soysauce10 Posted August 11, 2022 Author Share Posted August 11, 2022 57 minutes ago, JorgeB said: Doing a parity sync should not crash the server, please post the diagnostics. Sorry for not providing them earlier. 53 minutes ago, trurl said: None of that should matter. What about the original disks? What were your plans for that data? I had used Unbalance to move all data on those previous 3 disks out and onto the other disks in the array before I swapped in the 3 new drives. wmlunraid-diagnostics-20220811-1107.zip Quote Link to comment
Solution JorgeB Posted August 11, 2022 Solution Share Posted August 11, 2022 Aug 11 11:09:48 WMLUnraid kernel: ata1: COMRESET failed (errno=-16) Aug 11 11:09:48 WMLUnraid kernel: ata1: reset failed, giving up Aug 11 11:09:48 WMLUnraid kernel: ata1.00: disabled SSD cache dropped offline, but that's not a reason for the server crashing, still you need to fix that, check/replace cables, then enable the syslog server, start the parity sync and post that if it crashes again. Quote Link to comment
soysauce10 Posted August 11, 2022 Author Share Posted August 11, 2022 4 minutes ago, JorgeB said: Aug 11 11:09:48 WMLUnraid kernel: ata1: COMRESET failed (errno=-16) Aug 11 11:09:48 WMLUnraid kernel: ata1: reset failed, giving up Aug 11 11:09:48 WMLUnraid kernel: ata1.00: disabled SSD cache dropped offline, but that's not a reason for the server crashing, still you need to fix that, check/replace cables, then enable the syslog server, start the parity sync and post that if it crashes again. Interesting. Okay, I'll work on that. Thanks. Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 35 minutes ago, soysauce10 said: used Unbalance to move all data on those previous 3 disks out and onto the other disks in the array before I swapped in the 3 new drives. Would have been much simpler and probably faster to simply replace/rebuild each one at a time, no need to move anything. Quote Link to comment
soysauce10 Posted August 11, 2022 Author Share Posted August 11, 2022 4 minutes ago, trurl said: Would have been much simpler and probably faster to simply replace/rebuild each one at a time, no need to move anything. Do you mean add the three new drives and move the data onto the new ones then rebuild? The server is already maxed out on drives, so that's why I didn't add the new drives and move data before removing. Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 Nothing like that. Do you know how to rebuild a disk if one of them fails? Exactly the same thing. Replace/rebuild is the normal way to do what you were trying to do. Everyone does it all the time when they want to replace a disk with a larger disk, or if they think a drive is too old or beginning to show problem, or whatever. Shut down, replace one disk, boot up, assign new disk to the slot of the replaced disk, start array to rebuild. Repeat as necessary, one disk at a time. Replacing/rebuilding disks is the whole point of parity. Quote Link to comment
soysauce10 Posted August 11, 2022 Author Share Posted August 11, 2022 Just now, trurl said: Nothing like that. Do you know how to rebuild a disk if one of them fails? Exactly the same thing. Replace/rebuild is the normal way to do what you were trying to do. Everyone does it all the time when they want to replace a disk with a larger disk, or if they think a drive is too old or beginning to show problem, or whatever. Shut down, replace one disk, boot up, assign new disk to the slot of the replaced disk, start array to rebuild. Repeat as necessary, one disk at a time. Replacing/rebuilding disks is the whole point of parity. Got it. I should have done that. Yeah, I understood where I went wrong before I posted this topic. Thank you. Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 Perhaps you thought the whole point of parity was as a backup. Parity contains none of your data and is in no way a substitute for backup. Do you have backups of anything important and irreplaceable? Parity can only rebuild a single disk in combination with all the other disks. Dual parity allows 2 simultaneous rebuilds. So, in a sense, you did remove too many at a time. If you were going to rebuild. Parity is just an extra bit that allows a missing bit to be calculated from all the other bits. Very simple calculation, easy to understand. https://wiki.unraid.net/Manual/Overview#Parity-Protected_Array Quote Link to comment
soysauce10 Posted August 11, 2022 Author Share Posted August 11, 2022 1 minute ago, trurl said: Perhaps you thought the whole point of parity was as a backup. Parity contains none of your data and is in no way a substitute for backup. Do you have backups of anything important and irreplaceable? Parity can only rebuild a single disk in combination with all the other disks. Dual parity allows 2 simultaneous rebuilds. So, in a sense, you did remove too many at a time. If you were going to rebuild. Parity is just an extra bit that allows a missing bit to be calculated from all the other bits. Very simple calculation, easy to understand. https://wiki.unraid.net/Manual/Overview#Parity-Protected_Array I think I sort of ignored parity part of it and removed the disks after using Unbalance to move files because I wasn't worried about the disks failing on me right then and there. Quote Link to comment
soysauce10 Posted August 11, 2022 Author Share Posted August 11, 2022 On the server, I'm seeing these disk errors. See the attached picture. Quick search on these forums look like it's an issue with the Docker image or the cache disk full. I think the docker image is at the max 40GB but the cache disk is not full. But it does correlate to the issue of the cache disk dropping out like JorgeB said. Quote Link to comment
soysauce10 Posted August 11, 2022 Author Share Posted August 11, 2022 I exported more logs if it helps. wmlunraid-diagnostics-20220811-1225.zip Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 Can't tell anything about that since the array was not started in those diagnostics so nothing was mounted. Start array and attach new diagnostics to your NEXT post in this thread. Quote Link to comment
JorgeB Posted August 11, 2022 Share Posted August 11, 2022 SSD is on a different port but it dropped again: Aug 11 12:27:15 WMLUnraid kernel: ata2: COMRESET failed (errno=-16) Aug 11 12:27:15 WMLUnraid kernel: ata2: limiting SATA link speed to 3.0 Gbps Aug 11 12:27:20 WMLUnraid kernel: ata2: COMRESET failed (errno=-16) Aug 11 12:27:20 WMLUnraid kernel: ata2: reset failed, giving up Aug 11 12:27:20 WMLUnraid kernel: ata2.00: disabled Quote Link to comment
soysauce10 Posted August 11, 2022 Author Share Posted August 11, 2022 6 minutes ago, trurl said: Can't tell anything about that since the array was not started in those diagnostics so nothing was mounted. Start array and attach new diagnostics to your NEXT post in this thread. Sorry about that. I started the array and it started a parity sync and then I exported logs. 1 minute ago, JorgeB said: SSD is on a different port but it dropped again: Aug 11 12:27:15 WMLUnraid kernel: ata2: COMRESET failed (errno=-16) Aug 11 12:27:15 WMLUnraid kernel: ata2: limiting SATA link speed to 3.0 Gbps Aug 11 12:27:20 WMLUnraid kernel: ata2: COMRESET failed (errno=-16) Aug 11 12:27:20 WMLUnraid kernel: ata2: reset failed, giving up Aug 11 12:27:20 WMLUnraid kernel: ata2.00: disabled I just swapped out the sata cable to troubleshoot. Thank you both for your help! wmlunraid-diagnostics-20220811-1239.zip Quote Link to comment
JorgeB Posted August 11, 2022 Share Posted August 11, 2022 2 minutes ago, soysauce10 said: I just swapped out the sata cable to troubleshoot. Before or after these last diags? It's still dropping. Also the docker image is corrupt and needs to be re-created. Quote Link to comment
soysauce10 Posted August 11, 2022 Author Share Posted August 11, 2022 1 minute ago, JorgeB said: Before or after these last diags? It's still dropping. Also the docker image is corrupt and needs to be re-created. I swapped the cable before exporting the diags. Yeah, I can still see the errors on the server. Does recreating the image mean losing all settings on the containers? Is there a way to restore? Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 Before doing anything else, fix the connection problems. Then appdata and system shares have files on the array. Also, domains has files on its designated pool (cache) and on the other pool, maybe this is intentional? And share named important apparently related to VMs, it is cache:no but has files on cache. You have a cache:no share B-----s with files on cache. 1 minute ago, soysauce10 said: Does recreating the image mean losing all settings on the containers? Is there a way to restore? It is easy to restore dockers but we can get into that after things are working well and we clean up those shares. Quote Link to comment
soysauce10 Posted August 13, 2022 Author Share Posted August 13, 2022 On 8/11/2022 at 1:05 PM, trurl said: Before doing anything else, fix the connection problems. Then appdata and system shares have files on the array. Also, domains has files on its designated pool (cache) and on the other pool, maybe this is intentional? And share named important apparently related to VMs, it is cache:no but has files on cache. You have a cache:no share B-----s with files on cache. It is easy to restore dockers but we can get into that after things are working well and we clean up those shares. I think the connectivity issues are resolved. I spent the last day or so testing all the RAM and slots and letting parity sync back up. When I used unbalance to move data from disk to disk, it left empty folders in those appdata and share names, which is why it appears domains folder is in cache and other pools. No, it is not intentional. If you could point me in the right direction for the restore, that would be great. I'll start researching more on what the process is as well. Thanks for the help! Quote Link to comment
trurl Posted August 13, 2022 Share Posted August 13, 2022 Since you are on 6.10, you should install Dynamix File Manager plugin, it will let you delete those folders or move them if needed. Just note that nothing can move open files, and dockers will often have open files. Since you will be deleting docker.img that would be a perfect time to take care of that. https://wiki.unraid.net/Manual/Docker_Management#Re-Create_the_Docker_image_file https://wiki.unraid.net/Manual/Docker_Management#Re-Installing_Docker_Applications Quote Link to comment
soysauce10 Posted August 13, 2022 Author Share Posted August 13, 2022 16 hours ago, trurl said: Since you are on 6.10, you should install Dynamix File Manager plugin, it will let you delete those folders or move them if needed. Just note that nothing can move open files, and dockers will often have open files. Since you will be deleting docker.img that would be a perfect time to take care of that. https://wiki.unraid.net/Manual/Docker_Management#Re-Create_the_Docker_image_file https://wiki.unraid.net/Manual/Docker_Management#Re-Installing_Docker_Applications Restored! I appreciate the help!! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.