While reconstructing an old drive, multiple others started showing errors.


migueldias

Recommended Posts

So I replaced a very old 2TB WD Green drive from my array. It had around 6 years of power-on hours. I replaced it with a 12TB WD white label.

 

Just started reconstructing (progress is at 1.4%, should take a day to complete). The problem is that shortly after starting the array, I started getting these notifications:

 

https://i.imgur.com/tK5qhFs.png

and then, a few minutes later:

 

L2nMO7U.png

 

Checking the Attributes for both:

 

AgeCg04.png

 

VC42ntE.png

 

Suffice to say I am a bit nervous now. I don't really know how to proceed. I'm guessing I should just leave it and wait until the process ends?

 

I still have the old drive that I replaced. It had 100+ read errors which is why I replaced it with a 12TB I had.

 

The Array screen is now looking like this:

 

ofIKOMA.png

 

Any tip would be really appreciated.

tower-diagnostics-20210914-1114.zip

Edited by migueldias
Link to comment
10 minutes ago, JorgeB said:

Both disks appear to be failing, you can run an extended SMART test on both to confirm.

Doing that now, but on just one of them as I really want to avoid  stressing the drives even more now that there is a rebuild going on.

 

In that event, as there are two drives I'll have to replace one at a time which just increases the risk of more data loss.

 

What an unfortunate event this is. I really can't afford to lose some of the data stored on this array.

 

I think I will just leave the array untouched until the rebuild finishes (with a bunch of errors, I'm suspecting). Once that is done, I'll stop the array, re-check the SATA connections and run a Parity Check.

Link to comment
7 minutes ago, JorgeB said:

Rebuilt disk will be corrupt, but if the disks are really failing there will always be some data loss, you can also use ddrescue on all the failing disks, this way at least you can know which files are corrupt.

Thanks.

 

I'll let the rebuild go on until completion for now, and try to deal with the damage later. I'm still hopeful that this is some false positive as it is very weird that two drives, connected to different controllers, started showing errors just as I started to rebuild another drive.

Link to comment
40 minutes ago, JorgeB said:

Even if the disks aren't failing, and it really looks like they are, the rebuilt disk will still be corrupt due to the read errors.

Certainly, but at 36 errors (assuming it stays around the 100 error count by the end), the corruption should be minimal.

 

Furthermore if they are indeed corrupted and I am still rebuilding one disk, I won't ever get be able to reconstruct the corrupt data. 

 

I think the best plan now is to let the reconstruction finish and once that is done I'll use unBALANCE to move all the contents of the two failing 4TB drives in to the new 12TB drive (it will have ~10TB free after the rebuild). Once that is done I'll remove both drives from the array.

Edited by migueldias
Link to comment
4 hours ago, migueldias said:

replaced a very old 2TB WD Green drive

Did you check the health of other disks before deciding to replace that one? Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected?

 

4 hours ago, migueldias said:

can't afford to lose some of the data

You must always have another copy of everything important and irreplaceable. Lots of ways to lose data that parity can't help with.

Link to comment

So I had to stop the rebuild operation and shut off the array due to an unrelated reason.

 

When I turned it back on, the disk was seen as unmountable. I ran xfs_repair on it and now it mounts fine and it is rebuilding again.

 

My problem is that now some of my dockers are not working, more specifically binhex-plex.

 

When I try to start it I have no GUI access and the logs say this:

 

(...)
2021-09-14 22:13:41,817 INFO exited: plexmediaserver (exit status 255; not expected)
2021-09-14 22:13:41,818 DEBG received SIGCHLD indicating a child quit
2021-09-14 22:13:42,820 INFO spawned: 'plexmediaserver' with pid 67
2021-09-14 22:13:43,074 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 22622312476576 for <Subprocess at 22622312917696 with name plexmediaserver in state STARTING> (stdout)>
2021-09-14 22:13:43,074 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 22622312476624 for <Subprocess at 22622312917696 with name plexmediaserver in state STARTING> (stderr)>
2021-09-14 22:13:43,074 INFO exited: plexmediaserver (exit status 255; not expected)
2021-09-14 22:13:43,074 DEBG received SIGCHLD indicating a child quit
2021-09-14 22:13:45,076 INFO spawned: 'plexmediaserver' with pid 72
2021-09-14 22:13:45,215 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 22622312917216 for <Subprocess at 22622312917696 with name plexmediaserver in state STARTING> (stdout)>
2021-09-14 22:13:45,215 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 22622312476384 for <Subprocess at 22622312917696 with name plexmediaserver in state STARTING> (stderr)>
2021-09-14 22:13:45,216 INFO exited: plexmediaserver (exit status 255; not expected)
2021-09-14 22:13:45,216 DEBG received SIGCHLD indicating a child quit
2021-09-14 22:13:48,219 INFO spawned: 'plexmediaserver' with pid 77
2021-09-14 22:13:48,334 DEBG fd 8 closed, stopped monitoring <POutputDispatcher at 22622312476480 for <Subprocess at 22622312917696 with name plexmediaserver in state STARTING> (stdout)>
2021-09-14 22:13:48,334 DEBG fd 12 closed, stopped monitoring <POutputDispatcher at 22622312476432 for <Subprocess at 22622312917696 with name plexmediaserver in state STARTING> (stderr)>
2021-09-14 22:13:48,334 INFO exited: plexmediaserver (exit status 255; not expected)
2021-09-14 22:13:48,334 DEBG received SIGCHLD indicating a child quit
2021-09-14 22:13:49,334 INFO gave up: plexmediaserver entered FATAL state, too many start retries too quickly

 

When I tried to force update it, it couldn't remove the image, so I installed it again, but the same thing happens when I run it.

 

Not sure if the xfs_repair corrupted something.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.