caplam Posted October 23, 2020 Share Posted October 23, 2020 (edited) Today i think i made a big mstake. I was playing around with powertop and i think i made something i shouldn't. 3 minutes after playing with it i had 2 disks with read errors that were disconnected from array. Before that all was fine. A third disk had read errors but was removed. I tried to stop array without success. I couldn't also take a diag file. Server was unresponsive and i had to do cold reboot. I started rebuild procedure for one disk but one my parity drives has now read errors and rebuild is slow (350ko/s). I don't know what's next. I suspect the disk which is offline to be good. Have you any suggestion ? godzilla-diagnostics-20201023-1530.zip Edited October 23, 2020 by caplam Quote Link to comment
JorgeB Posted October 23, 2020 Share Posted October 23, 2020 You have two disabled disks and parity2 is failing, so not a good spot, disks 2 and 4 look healthy, but disk2 is now corrupt, you could re-enable disk4, assuming it's OK and try again to rebuild disk2, though some sync errors will still exist. Quote Link to comment
caplam Posted October 23, 2020 Author Share Posted October 23, 2020 how can you re-enable a disk ? Quote Link to comment
JorgeB Posted October 23, 2020 Share Posted October 23, 2020 You can try the invalid slot command, follow the instructions below carefully and ask if there's any doubt. -Tools -> New Config -> Retain current configuration: All -> Apply -Check all assignments and assign any missing disk(s) if needed, don't assign parity2, if you have a spare to rebuild disk2 (same size or larger) use it since it will leave you with more options if this doesn't work -Important - After checking the assignments leave the browser on that page, the "Main" page. -Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters): mdcmd set invalidslot 2 29 -Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk2 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish (or cancel it) and then run a filesystem check. 1 Quote Link to comment
caplam Posted October 23, 2020 Author Share Posted October 23, 2020 Not sure to understand. My present situation is: disks 1&3 ok disk4 disabled disk2 rebuilding (paused at 2%) parity1 ok parity2 failing lots of errors If i understand correctly: 8 minutes ago, JorgeB said: -Tools -> New Config -> Retain current configuration: All -> Apply is for reassigning drives 9 minutes ago, JorgeB said: -Check all assignments and assign any missing disk(s) if needed, don't assign parity2, if you have a spare to rebuild disk2 (same size or larger) use it since it will leave you with more options if this doesn't work i unassign parity2 (as it's failing i suppose it's useless for rebuilding) i unplug disk2 and replace it with a precleared one 11 minutes ago, JorgeB said: -Important - After checking the assignments leave the browser on that page, the "Main" page. clear enough 11 minutes ago, JorgeB said: -Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters): mdcmd set invalidslot 2 29 it seems mdcmd has no help. 15 minutes ago, JorgeB said: Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, at this stage i think i have only one parity disk in the array (parity2 has been unassigned) 16 minutes ago, JorgeB said: this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk2 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish (or cancel it) and then run a filesystem check. for this step i have a new precleared disk as my disk2 So i suppose that all these steps are for having disk4 back in the array. Quote Link to comment
JorgeB Posted October 23, 2020 Share Posted October 23, 2020 That command enables all disks except the ones listed, disk2 that needs to be rebuilt, and disk29 (parity2), since there won't be one. Quote Link to comment
caplam Posted October 23, 2020 Author Share Posted October 23, 2020 it was quite a bit long as i wasn't able to find the key of the tray 😁 So now rebuilding is running. Hope it can finish fine. Thank you JorgeB. 👍 I have the former disk2 unplugged. Quote Link to comment
JorgeB Posted October 23, 2020 Share Posted October 23, 2020 Looks like it didn't mount, not that surprising since like mentioned parity wouldn't 100% in sync, you'll need to run xfs_repair on it after it's done (or now after canceling the rebuild). Quote Link to comment
caplam Posted October 23, 2020 Author Share Posted October 23, 2020 (edited) i think i'll let rebuild finish. As you said the disk was corrupted as rebuild was started. I stopped it around 2%. I have a 6Tb disk to order. I have spare ones in 4Tb but not in 6. Edited October 23, 2020 by caplam Quote Link to comment
JorgeB Posted October 23, 2020 Share Posted October 23, 2020 Old disk2 was corrupt, now the problem is that parity isn't 100% valid because you are using the old disk4, so there's some filesystem corruption on the emulated disk, this is expected in this case and if it's the only issue it should be easily fixed by xfs_repair. Quote Link to comment
caplam Posted October 23, 2020 Author Share Posted October 23, 2020 ok so i guess that when rebuild is finished i stop the array and restart it in maintenance mode. In the gui i can run xfs repair. I have to do it for both disk 2 and 4. Quote Link to comment
caplam Posted October 24, 2020 Author Share Posted October 24, 2020 i ran xfs_repair on disk2 in maintenance mode. I then restarted the array but no luck : disk2 is disabled. Quote Link to comment
itimpi Posted October 24, 2020 Share Posted October 24, 2020 28 minutes ago, caplam said: i ran xfs_repair on disk2 in maintenance mode. I then restarted the array but no luck : disk2 is disabled. xfs_repair will not stop a disk being disabled - it is intended to fix it being unmountable. If the drive is disabled then it is the emulated disk that is being fixed. The standard way to clear the disabled state is to rebuild the disk. Quote Link to comment
caplam Posted October 24, 2020 Author Share Posted October 24, 2020 i've just realised that xfs_repair had been run (from gui) on /dev/md2 which is the emulated disk; so it's logical that the drive is still disabled. I shouldn't have let the rebuild run. I was useless. So a new rebuild is running, next stage in 7 hours. In the mean time ca i re-enable docker and vm ? Quote Link to comment
itimpi Posted October 24, 2020 Share Posted October 24, 2020 1 hour ago, caplam said: i've just realised that xfs_repair had been run (from gui) on /dev/md2 which is the emulated disk; If it had not run against the mdX device it would have invalidated parity which would not be a good idea. What are you rebuilding? If it is the disabled disk you will end up with whatever showed on the emulated drive before the rebuild. Quote Link to comment
caplam Posted October 24, 2020 Author Share Posted October 24, 2020 I followed the re enable procedure.Stop array Unassign disk2Start array Stop arrayAssign disk2Start arrayAt this point rebuild starts Before that i ran xfs repair Within the gui (so on emulated disk)Rebuilding is on its way. I think i’ll start vms and dockers.Normally, it should Be god.If not i’ll have to find how to start with a blank drive2 and restore from backup i guess.But i also have old disk2. From what i saw corrupted data weren’t important(temporary download files) Quote Link to comment
itimpi Posted October 24, 2020 Share Posted October 24, 2020 You should be able to look at disk2 while the rebuild is in progress. Whatever you see there is what you will end up with when the rebuild completes. starting VMs/dockers should not affect the rebuild but may have a performance impact if they use array drives. Quote Link to comment
caplam Posted October 24, 2020 Author Share Posted October 24, 2020 (edited) during rebuilding i was able to browse disk2. Now rebuilding is done but end with disk2 disabled again. What can i do now ? edit : there is something strange : i see the disk2 as disk2 which is disabled: it's device sdr and i see it too as unassigned device it's device sdq. Edited October 24, 2020 by caplam Quote Link to comment
itimpi Posted October 24, 2020 Share Posted October 24, 2020 6 minutes ago, caplam said: during rebuilding i was able to browse disk2. Now rebuilding is done but end with disk2 disabled again. What can i do now ? I can only suggest that you post your diagnostics again. If the drive is disabled after the rebuild process then that suggests that a write to it failed during the rebuild process. There might be something in the diagnostics to give a clue as to what exactly happened. Quote Link to comment
caplam Posted October 24, 2020 Author Share Posted October 24, 2020 here it is. godzilla-diagnostics-20201024-1855.zip Quote Link to comment
caplam Posted October 24, 2020 Author Share Posted October 24, 2020 from the log i see many sas errors. I have no sas disk in the array. All array disk are in my main case on sata ports. on the sas controller i have disks in an external case. For now all are unassigned. I also see write errors and before that "link is slow to respond" on a sata port. What does it mean ? bad cable ? (it would be bad luck) Quote Link to comment
trurl Posted October 24, 2020 Share Posted October 24, 2020 In addition to the SMART attributes monitored by Unraid by default, you should set it to add attributes 1 and 200 for WD Red disks, such as disk2. Run an extended SMART test on disk2. Quote Link to comment
caplam Posted October 24, 2020 Author Share Posted October 24, 2020 i can't for now disk is spun down and i can't spin it up Quote Link to comment
caplam Posted October 24, 2020 Author Share Posted October 24, 2020 when i try to spin up i have : Unraid Disk 2 SMART health [1]: 24-10-2020 19:16 Warning [GODZILLA] - raw read error rate is 132 WDC_WD40EFRX-68WT0N0_WD-WCC4E1KN5L9R (sdr) Unraid Disk 2 SMART health [200]: 24-10-2020 19:16 Warning [GODZILLA] - multi zone error rate is 1 WDC_WD40EFRX-68WT0N0_WD-WCC4E1KN5L9R (sdr) Can i rebuild another disk ? i have others on spare. Quote Link to comment
caplam Posted October 24, 2020 Author Share Posted October 24, 2020 is it the disk itself or a connection problem. The disk is used but i had it precleared 3 pass. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.