Brucey7 Posted August 19, 2021 Share Posted August 19, 2021 I have a disk failed, I have done a new config and kept all disks in the new config, now shows one disk missing but doesn’t show details of the disk. I want to replace with larger disk and rebuild from parity, what do I do? Quote Link to comment
trurl Posted August 19, 2021 Share Posted August 19, 2021 Why did you do New Config? Did you let New Config rebuild parity? You really should have asked before doing anything. New Config is exactly the wrong thing to do when you need to rebuild a disk. Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread. Quote Link to comment
trurl Posted August 19, 2021 Share Posted August 19, 2021 From the New Config page in the webUI: Quote DO NOT USE THIS UTILITY THINKING IT WILL REBUILD A FAILED DRIVE - it will have the opposite effect of making it impossible to rebuild an existing failed drive - you have been warned! Quote Link to comment
Brucey7 Posted August 19, 2021 Author Share Posted August 19, 2021 I did not allow New Config to rebuild parity because I know it will initialise the new drive (which is on order) Parity is still ok. Quote Link to comment
Brucey7 Posted August 19, 2021 Author Share Posted August 19, 2021 diagnosticstower2-diagnostics-20210819-1248.zip Quote Link to comment
trurl Posted August 19, 2021 Share Posted August 19, 2021 21 hours ago, Brucey7 said: I did not allow New Config to rebuild parity because I know it will initialise the new drive You seem to know some things that aren't true. New Config will not do anything to any disks, except (optionally and by default) rebuild parity. Whether or not you rebuilt parity though, New Config is still the exactly wrong thing to have done, as explained in that warning from the New Config page I quoted. You are going to have to jump through some hoops now to get it to think you need to rebuild a disk, and you can't allow anything to write to your server until you are ready to begin jumping. 21 hours ago, Brucey7 said: Parity is still ok. I have my doubts. Have you written or allowed anything at all to be written to your server since New Config? Simply starting the array in Normal mode is going to mount the disks read/write and update parity slightly, making it out-of-sync with the disk you need to rebuild. 21 hours ago, Brucey7 said: new drive (which is on order) You should shutdown and wait. Quote Link to comment
trurl Posted August 19, 2021 Share Posted August 19, 2021 Just reviewed your diagnostics and according to syslog, no disks were assigned and the array hadn't been started yet. Is that still true? Quote Link to comment
Brucey7 Posted August 20, 2021 Author Share Posted August 20, 2021 Yes, array has not been restarted. My plan was to assign the new disk only, format it, start the array with the all the disks (new disks included) after clicking "Parity is OK", shut down, reboot and rebuild parity. I have a few servers, this particular server has issues, every few months I get UDMA errors sometimes resulting in a disk dropping off the array, a new config retaining all disks corrects it (it didn't this time), I then do a connecting parity check. I've replaced disk back planes, cables, disk controllers, everything except the motherboard which is too big/expensive a job. Quote Link to comment
DivideBy0 Posted August 20, 2021 Share Posted August 20, 2021 I agree with the guidance here. If you don't know, stop and ASK. I am impressed with the answers and timing on this forum and a BIG THANK you "trurl" as you keep helping us. Quote Link to comment
JonathanM Posted August 20, 2021 Share Posted August 20, 2021 1 hour ago, Brucey7 said: My plan was to assign the new disk only, format it, start the array with the all the disks (new disks included) after clicking "Parity is OK", shut down, reboot and rebuild parity. This plan will result in the complete loss of the data that was on the failed drive. Please don't do anything without explicit instructions. Quote Link to comment
trurl Posted August 20, 2021 Share Posted August 20, 2021 Do you still have the original disk? How did you decide it was a bad disk? Quote Link to comment
Brucey7 Posted August 20, 2021 Author Share Posted August 20, 2021 Yes I still have it, after a "new config" retaining all disks, the server wouldn't see it. Quote Link to comment
trurl Posted August 20, 2021 Share Posted August 20, 2021 16 hours ago, Brucey7 said: Yes, array has not been restarted. Here is what I propose. Don't do it until we get some others ( @JorgeB , @JonathanM , @itimpi ) to take a look and see if they agree with my idea. Assign all disks exactly as before, with the replacement drive assigned to the slot of the drive you are replacing. Check the box for Parity Valid and check the box for Maintenance mode, then start the array. Stop the array, unassign the replacement disk, then start the array in normal mode (not Maintenance mode). That should get us to a place where that slot is emulated by parity. Then with new Diagnostics and a screenshot we can decide how to proceed. Quote Link to comment
JorgeB Posted August 20, 2021 Share Posted August 20, 2021 4 minutes ago, trurl said: if they agree with my idea. 👍 Quote Link to comment
Brucey7 Posted August 20, 2021 Author Share Posted August 20, 2021 I have an update. I reseated all the drives and rebooted the server. It saw the failed disk, I ran a parity check and it ran ok for about an hour before I went to bed, this morning the disk has been dropped again overnight sometime with 2048 disk errors, parity check hasn't yet finished. So I will shortly be in a position where the disk is being emulated and I can add the new disk when it arrives next week. I have attached the diagnostics. I'd be grateful for confirmation the disk is shot. tower2-diagnostics-20210821-0646.zip Quote Link to comment
trurl Posted August 21, 2021 Share Posted August 21, 2021 It looks like you assigned all the disks including the original disk10, and didn't rebuild parity. OK Aug 20 12:17:27 Tower2 kernel: mdcmd (36): start NEW_ARRAY Aug 20 12:17:27 Tower2 kernel: md: invalidslota=99 Aug 20 12:17:27 Tower2 kernel: md: invalidslotb=99 Then you started a CORRECTING parity check Aug 20 12:20:37 Tower2 kernel: mdcmd (40): check Aug 20 12:20:37 Tower2 kernel: md: recovery thread: check P Q ... Aug 20 12:20:37 Tower2 kernel: md: recovery thread: PQ corrected, sector=0 Aug 20 12:20:51 Tower2 kernel: md: recovery thread: PQ corrected, sector=3519136 Aug 20 12:20:51 Tower2 kernel: md: recovery thread: PQ corrected, sector=3519144 which eventually corrected so many that it quit logging them. Aug 20 13:04:51 Tower2 kernel: md: recovery thread: PQ corrected, sector=722433872 Aug 20 13:04:51 Tower2 kernel: md: recovery thread: PQ corrected, sector=722433880 Aug 20 13:04:51 Tower2 kernel: md: recovery thread: PQ corrected, sector=722433888 Aug 20 13:04:51 Tower2 kernel: md: recovery thread: stopped logging And well into it, disk10 started giving read errors, probably disconnected Aug 20 21:22:19 Tower2 kernel: md: disk10 read error, sector=8532144800 then Unraid tried to write the calculated data back to it, which failed and the disk was disabled. Aug 20 21:22:19 Tower2 kernel: md: disk10 write error, sector=8532144800 It isn't showing in SMART so can't tell if it is good or not from those diagnostics. It does look like disk10 is currently being emulated though [disk10] => Array [id] => WDC_WD60EFRX-68MYMN0_WD-WX11D4446368 [size] => 5860522532 [status] => DISK_DSBL [fsType] => xfs [fsStatus] => Mounted and the emulated disk is mounted and full. Filesystem Size Used Avail Use% Mounted on /dev/md10 5.5T 5.5T 51G 100% /mnt/disk10 So maybe you got lucky despite doing everything wrong and not following directions. Probably it doesn't even matter whether or not you finish the parity check since disk10 is no longer involved and anything you might have done to parity before is already done and we'll just have to deal with whatever consequences. I guess you will have to check connections again, maybe change cables, to see if we can get a look at disk10 SMART. Wouldn't be surprised if there was never anything wrong with the disk itself. Don't do any more parity checks!!! Or New Configs!!! Quote Link to comment
Brucey7 Posted August 24, 2021 Author Share Posted August 24, 2021 Thank you to all those that helped, especially trurl The new disk is fitted and rebuilt successfully. I'm not sure about whether the old disk is ok or not, at some point I try and preclear it and see what happens. Quote Link to comment
trurl Posted August 24, 2021 Share Posted August 24, 2021 You should not preclear the old disk until you are sure you have all your files. Post new diagnostics if you want further advice. Quote Link to comment
Brucey7 Posted August 24, 2021 Author Share Posted August 24, 2021 Thanks, I do have all my old files Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.