migueldias Posted September 14, 2021 Share Posted September 14, 2021 (edited) So I replaced a very old 2TB WD Green drive from my array. It had around 6 years of power-on hours. I replaced it with a 12TB WD white label. Just started reconstructing (progress is at 1.4%, should take a day to complete). The problem is that shortly after starting the array, I started getting these notifications: https://i.imgur.com/tK5qhFs.png and then, a few minutes later: Checking the Attributes for both: Suffice to say I am a bit nervous now. I don't really know how to proceed. I'm guessing I should just leave it and wait until the process ends? I still have the old drive that I replaced. It had 100+ read errors which is why I replaced it with a 12TB I had. The Array screen is now looking like this: Any tip would be really appreciated. tower-diagnostics-20210914-1114.zip Edited September 14, 2021 by migueldias Quote Link to comment
JorgeB Posted September 14, 2021 Share Posted September 14, 2021 Both disks appear to be failing, you can run an extended SMART test on both to confirm. Quote Link to comment
migueldias Posted September 14, 2021 Author Share Posted September 14, 2021 10 minutes ago, JorgeB said: Both disks appear to be failing, you can run an extended SMART test on both to confirm. Doing that now, but on just one of them as I really want to avoid stressing the drives even more now that there is a rebuild going on. In that event, as there are two drives I'll have to replace one at a time which just increases the risk of more data loss. What an unfortunate event this is. I really can't afford to lose some of the data stored on this array. I think I will just leave the array untouched until the rebuild finishes (with a bunch of errors, I'm suspecting). Once that is done, I'll stop the array, re-check the SATA connections and run a Parity Check. Quote Link to comment
JorgeB Posted September 14, 2021 Share Posted September 14, 2021 Rebuilt disk will be corrupt, but if the disks are really failing there will always be some data loss, you can also use ddrescue on all the failing disks, this way at least you can know which files are corrupt. Quote Link to comment
migueldias Posted September 14, 2021 Author Share Posted September 14, 2021 7 minutes ago, JorgeB said: Rebuilt disk will be corrupt, but if the disks are really failing there will always be some data loss, you can also use ddrescue on all the failing disks, this way at least you can know which files are corrupt. Thanks. I'll let the rebuild go on until completion for now, and try to deal with the damage later. I'm still hopeful that this is some false positive as it is very weird that two drives, connected to different controllers, started showing errors just as I started to rebuild another drive. Quote Link to comment
JorgeB Posted September 14, 2021 Share Posted September 14, 2021 Even if the disks aren't failing, and it really looks like they are, the rebuilt disk will still be corrupt due to the read errors. Quote Link to comment
migueldias Posted September 14, 2021 Author Share Posted September 14, 2021 (edited) 40 minutes ago, JorgeB said: Even if the disks aren't failing, and it really looks like they are, the rebuilt disk will still be corrupt due to the read errors. Certainly, but at 36 errors (assuming it stays around the 100 error count by the end), the corruption should be minimal. Furthermore if they are indeed corrupted and I am still rebuilding one disk, I won't ever get be able to reconstruct the corrupt data. I think the best plan now is to let the reconstruction finish and once that is done I'll use unBALANCE to move all the contents of the two failing 4TB drives in to the new 12TB drive (it will have ~10TB free after the rebuild). Once that is done I'll remove both drives from the array. Edited September 14, 2021 by migueldias Quote Link to comment
trurl Posted September 14, 2021 Share Posted September 14, 2021 4 hours ago, migueldias said: replaced a very old 2TB WD Green drive Did you check the health of other disks before deciding to replace that one? Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? 4 hours ago, migueldias said: can't afford to lose some of the data You must always have another copy of everything important and irreplaceable. Lots of ways to lose data that parity can't help with. Quote Link to comment
migueldias Posted September 14, 2021 Author Share Posted September 14, 2021 So I had to stop the rebuild operation and shut off the array due to an unrelated reason. When I turned it back on, the disk was seen as unmountable. I ran xfs_repair on it and now it mounts fine and it is rebuilding again. My problem is that now some of my dockers are not working, more specifically binhex-plex. When I try to start it I have no GUI access and the logs say this: (...) 2021-09-14 22:13:41,817 INFO exited: plexmediaserver (exit status 255; not expected) 2021-09-14 22:13:41,818 DEBG received SIGCHLD indicating a child quit 2021-09-14 22:13:42,820 INFO spawned: 'plexmediaserver' with pid 67 2021-09-14 22:13:43,074 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 22622312476576 for <Subprocess at 22622312917696 with name plexmediaserver in state STARTING> (stdout)> 2021-09-14 22:13:43,074 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 22622312476624 for <Subprocess at 22622312917696 with name plexmediaserver in state STARTING> (stderr)> 2021-09-14 22:13:43,074 INFO exited: plexmediaserver (exit status 255; not expected) 2021-09-14 22:13:43,074 DEBG received SIGCHLD indicating a child quit 2021-09-14 22:13:45,076 INFO spawned: 'plexmediaserver' with pid 72 2021-09-14 22:13:45,215 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 22622312917216 for <Subprocess at 22622312917696 with name plexmediaserver in state STARTING> (stdout)> 2021-09-14 22:13:45,215 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 22622312476384 for <Subprocess at 22622312917696 with name plexmediaserver in state STARTING> (stderr)> 2021-09-14 22:13:45,216 INFO exited: plexmediaserver (exit status 255; not expected) 2021-09-14 22:13:45,216 DEBG received SIGCHLD indicating a child quit 2021-09-14 22:13:48,219 INFO spawned: 'plexmediaserver' with pid 77 2021-09-14 22:13:48,334 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 22622312476480 for <Subprocess at 22622312917696 with name plexmediaserver in state STARTING> (stdout)> 2021-09-14 22:13:48,334 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 22622312476432 for <Subprocess at 22622312917696 with name plexmediaserver in state STARTING> (stderr)> 2021-09-14 22:13:48,334 INFO exited: plexmediaserver (exit status 255; not expected) 2021-09-14 22:13:48,334 DEBG received SIGCHLD indicating a child quit 2021-09-14 22:13:49,334 INFO gave up: plexmediaserver entered FATAL state, too many start retries too quickly When I tried to force update it, it couldn't remove the image, so I installed it again, but the same thing happens when I run it. Not sure if the xfs_repair corrupted something. Quote Link to comment
trurl Posted September 14, 2021 Share Posted September 14, 2021 post new diagnostics Quote Link to comment
migueldias Posted September 15, 2021 Author Share Posted September 15, 2021 (edited) 8 hours ago, trurl said: post new diagnostics Hi trurl, Here they are. tower-diagnostics-20210915-0920.zip edit: I do have a CA Backup tar file of the appdata folder from 2 days ago. Edited September 15, 2021 by migueldias Quote Link to comment
trurl Posted September 15, 2021 Share Posted September 15, 2021 Repair put some files in lost+found share because it couldn't figure out what they were. You are having connection problems while trying to rebuild disk1. You should go to Settings and disable Docker until you get your array stable again. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.