Replace Disk


Recommended Posts

Why did you do New Config? Did you let New Config rebuild parity?

 

You really should have asked before doing anything. New Config is exactly the wrong thing to do when you need to rebuild a disk.

 

Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.

Link to comment
21 hours ago, Brucey7 said:

I did not allow New Config to rebuild parity because I know it will initialise the new drive

You seem to know some things that aren't true. New Config will not do anything to any disks, except (optionally and by default) rebuild parity.

 

Whether or not you rebuilt parity though, New Config is still the exactly wrong thing to have done, as explained in that warning from the New Config page I quoted.

 

You are going to have to jump through some hoops now to get it to think you need to rebuild a disk, and you can't allow anything to write to your server until you are ready to begin jumping.

21 hours ago, Brucey7 said:

Parity is still ok.

I have my doubts. Have you written or allowed anything at all to be written to your server since New Config? Simply starting the array in Normal mode is going to mount the disks read/write and update parity slightly, making it out-of-sync with the disk you need to rebuild.

 

21 hours ago, Brucey7 said:

new drive (which is on order)

You should shutdown and wait.

Link to comment

Yes, array has not been restarted.

 

My plan was to assign the new disk only, format it, start the array with the all the disks (new disks included) after clicking "Parity is OK", shut down, reboot and rebuild parity.

 

I have a few servers, this particular server has issues, every few months I get UDMA errors sometimes resulting in a disk dropping off the array, a new config retaining all disks corrects it (it didn't this time), I then do a connecting parity check.  I've replaced disk back planes, cables, disk controllers, everything except the motherboard which is too big/expensive a job.

Link to comment
1 hour ago, Brucey7 said:

My plan was to assign the new disk only, format it, start the array with the all the disks (new disks included) after clicking "Parity is OK", shut down, reboot and rebuild parity.

This plan will result in the complete loss of the data that was on the failed drive.

 

Please don't do anything without explicit instructions.

Link to comment
16 hours ago, Brucey7 said:

Yes, array has not been restarted.

Here is what I propose. Don't do it until we get some others ( @JorgeB , @JonathanM ,  @itimpi ) to take a look and see if they agree with my idea.

  1. Assign all disks exactly as before, with the replacement drive assigned to the slot of the drive you are replacing.
  2. Check the box for Parity Valid and check the box for Maintenance mode, then start the array.
  3. Stop the array, unassign the replacement disk, then start the array in normal mode (not Maintenance mode). That should get us to a place where that slot is emulated by parity.
  4. Then with new Diagnostics and a screenshot we can decide how to proceed.

 

 

Link to comment

I have an update.

 

I reseated all the drives and rebooted the server. 

 

It saw the failed disk, I ran a parity check and it ran ok for about an hour before I went to bed, this morning the disk has been dropped again overnight sometime with 2048 disk errors, parity check hasn't yet finished.

 

So I will shortly be in a position where the disk is being emulated and I can add the new disk when it arrives next week.

 

I have attached the diagnostics.  I'd be grateful for confirmation the disk is shot.

tower2-diagnostics-20210821-0646.zip

Link to comment

It looks like you assigned all the disks including the original disk10, and didn't rebuild parity. OK

Aug 20 12:17:27 Tower2 kernel: mdcmd (36): start NEW_ARRAY
Aug 20 12:17:27 Tower2 kernel: md: invalidslota=99
Aug 20 12:17:27 Tower2 kernel: md: invalidslotb=99

 

Then you started a CORRECTING parity check ???  

Aug 20 12:20:37 Tower2 kernel: mdcmd (40): check 
Aug 20 12:20:37 Tower2 kernel: md: recovery thread: check P Q ...
Aug 20 12:20:37 Tower2 kernel: md: recovery thread: PQ corrected, sector=0
Aug 20 12:20:51 Tower2 kernel: md: recovery thread: PQ corrected, sector=3519136
Aug 20 12:20:51 Tower2 kernel: md: recovery thread: PQ corrected, sector=3519144

which eventually corrected so many that it quit logging them.

Aug 20 13:04:51 Tower2 kernel: md: recovery thread: PQ corrected, sector=722433872
Aug 20 13:04:51 Tower2 kernel: md: recovery thread: PQ corrected, sector=722433880
Aug 20 13:04:51 Tower2 kernel: md: recovery thread: PQ corrected, sector=722433888
Aug 20 13:04:51 Tower2 kernel: md: recovery thread: stopped logging

 

And well into it, disk10 started giving read errors, probably disconnected

Aug 20 21:22:19 Tower2 kernel: md: disk10 read error, sector=8532144800

then Unraid tried to write the calculated data back to it, which failed and the disk was disabled.

Aug 20 21:22:19 Tower2 kernel: md: disk10 write error, sector=8532144800

 

It isn't showing in SMART so can't tell if it is good or not from those diagnostics.

 

It does look like disk10 is currently being emulated though

    [disk10] => Array
            [id] => WDC_WD60EFRX-68MYMN0_WD-WX11D4446368
            [size] => 5860522532
            [status] => DISK_DSBL
            [fsType] => xfs
            [fsStatus] => Mounted

and the emulated disk is mounted and full.

Filesystem      Size  Used Avail Use% Mounted on
/dev/md10       5.5T  5.5T   51G 100% /mnt/disk10

 

So maybe you got lucky despite doing everything wrong and not following directions.

 

Probably it doesn't even matter whether or not you finish the parity check since disk10 is no longer involved and anything you might have done to parity before is already done and we'll just have to deal with whatever consequences.

 

I guess you will have to check connections again, maybe change cables, to see if we can get a look at disk10 SMART. Wouldn't be surprised if there was never anything wrong with the disk itself.

 

Don't do any more parity checks!!!

 

Or New Configs!!! 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.