justintas Posted November 29, 2021 Share Posted November 29, 2021 Hoping for some direction on error encountered. Array running out of space so started process to replace Parity Disk with a larger one, all going ok till struck an error in rebuild it is a read error of of one of the disks see below. Current rebuild has got another 3 hours to go to finish. Prior to parity upgrade was no errors and last parity rebuild was all good. Question , Is this something to worry about and will it lead to some data loss ? When it finishes if reboot will the error correct itself ? I have new data disks to install and old parity disk still available. Any help or suggestions appreciated Thanks in advance Justintas Parity - ST12000VN0008-2PH103_ZS802V5R (sdb) - active 33 C [DISK INVALID] (new parity Disk getting rebuilt) Disk 1 - WDC_WD40EFRX-68N32N0_WD-WCC7K2PF6VZX (sdc) - active 31 C (disk has read errors) [NOK] Disk 2 - WDC_WD40EFRX-68N32N0_WD-WCC7K3EN931N (sdd) - active 31 C [OK] Disk 3 - WDC_WD40EFRX-68N32N0_WD-WCC7K5FREU23 (sde) - active 31 C [OK] Parity sync / Data rebuild in progress. Total size: 12 TB Elapsed time: 8 hours, 12 minutes Current position: 4.01 TB (33.4 %) Estimated speed: 216.3 MB/sec Estimated finish: 10 hours, 16 minutes Sync errors corrected: 2689 Quote Link to comment
trurl Posted November 29, 2021 Share Posted November 29, 2021 attach diagnostics to your NEXT post in this thread Quote Link to comment
justintas Posted November 29, 2021 Author Share Posted November 29, 2021 Diagnostics attached hptower-diagnostics-20211130-1009.zip Quote Link to comment
trurl Posted November 29, 2021 Share Posted November 29, 2021 Does look like disk1 has problems Serial Number: WD-WCC7K2PF6VZX ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 189 051 - 3 197 Current_Pending_Sector -O--CK 200 200 000 - 5 attribute 1 isn't monitored by default, click on each of your WD disks to get to its page and add attribute 1 and 200 The pending sectors are monitored by default, and should have warned you, but they might have just been discovered since rebuild is going to access all sectors. Did you get any notifications about disk1? No doubt it has a SMART warning on the Dashboard page now (unless you acknowledged it). Was that warning there when you decided to replace parity? Have you written anything to your server since parity rebuild began? Do you still have the original parity disk? Do you have another copy of anything important and irreplaceable? Quote Link to comment
justintas Posted November 30, 2021 Author Share Posted November 30, 2021 46 minutes ago, trurl said: The pending sectors are monitored by default, and should have warned you, but they might have just been discovered since rebuild is going to access all sectors. Did you get any notifications about disk1? No doubt it has a SMART warning on the Dashboard page now (unless you acknowledged it). Was that warning there when you decided to replace parity? Have you written anything to your server since parity rebuild began? Do you still have the original parity disk? Do you have another copy of anything important and irreplaceable? Thanks for your help to answer questions; No notification about disk 1, Yes it has a notification on dashboard now about error, No wasn't reason decided to replace parity I need to expand array as running out of disk so replacing parity first. No nothing written to array since rebuild started, yes still have original parity disk, yes copies of most of important data etc , just my movie collection on there that is not backed up. Options ? Quote Link to comment
trurl Posted November 30, 2021 Share Posted November 30, 2021 You need to replace disk1 instead of parity. Should be possible to rebuild disk1 from original parity, but will require jumping through a few hoops now that parity had been replaced and is invalid. Of course, you need a replacement for disk1 that is at least as large as disk1 but no larger than original parity. Quote Link to comment
justintas Posted November 30, 2021 Author Share Posted November 30, 2021 ok , so I have a replacement ready brand new but is same size as original parity 8tb is that ok ? or an older 4tb drive which one to use? So what steps do I follow , assume let parity rebuild follow first Quote Link to comment
trurl Posted November 30, 2021 Share Posted November 30, 2021 I would go with the 8TB to get the extra capacity which is what you wanted anyway. Looks like you already have autostart disabled. Shutdown, replace new parity with original parity, leave disk1 installed for now, then reboot. Tools - New Config - Retain All - Apply. Assign original parity, check the box saying parity is already valid, then start the array. Shutdown, replace disk1, reboot. Assign new disk1 and start the array to begin rebuild of disk1. Quote Link to comment
justintas Posted November 30, 2021 Author Share Posted November 30, 2021 Thanks Trurl , will wait to current parity option finishes, about an hour then follow the steps you have outlined. Assume once data disk 1 is rebuilt and all boots ok can go ahead again and replace the parity disk ? then start gradual upgrade of each data disk. Really appreciate your help and advice. Quote Link to comment
justintas Posted December 15, 2021 Author Share Posted December 15, 2021 (edited) just struck a problem with fixing above had a delay due to a damaged cage drive so had to switch hardware. Have inserted new disk 1 and did a rebuild. When reboot 2 things are happening; 1. it is saying disk 1 ''Unmountable disk present: '' not sure what did wrong here ? 2. All my dockers are not visible assume could be related to 1 above. updated diagnostics attached , have tried 2 rebuilds but can't work out what i have done wrong ? hptower-diagnostics-20211215-1157.zip Edited December 15, 2021 by justintas fix Quote Link to comment
trurl Posted December 15, 2021 Share Posted December 15, 2021 Probably too slightly out-of-sync for a clean rebuild. You will have to repair the filesystem on disk1. https://wiki.unraid.net/Manual/Storage_Management#Drive_shows_as_unmountable Quote Link to comment
justintas Posted December 15, 2021 Author Share Posted December 15, 2021 Ok ran check then repair output doesn't look to good ? Here is last lines of process.. Metadata corruption detected at 0x44d778, xfs_bmbt block 0xec37d798/0x1000 libxfs_bwrite: write verifier failed on xfs_bmbt bno 0xec37d798/0x1000 Maximum metadata LSN (2146145896:-2144772351) is ahead of log (22:71999). Format log to cycle 2146145899. xfs_repair: Releasing dirty buffer to free list! cache_purge: shake on cache 0x5021c0 left 3 nodes!? xfs_repair: Refusing to write a corrupt buffer to the data device! xfs_repair: Lost a write to the data device! fatal error -- File system metadata writeout failed, err=117. Re-run xfs_repair. Options ? above run in mainteance mode only way to activate check option I do have another disk available to try as a replacement for SDC ? Original SDC is still available but jammed in drive cage Quote Link to comment
trurl Posted December 15, 2021 Share Posted December 15, 2021 2 hours ago, justintas said: Original SDC Do you mean original disk1? The sdX designations aren't very useful since they can change with hardware changes or even just reboots. Quote Link to comment
trurl Posted December 15, 2021 Share Posted December 15, 2021 2 hours ago, justintas said: ran check then repair Since you mentioned SDC, I have to wonder. How exactly did you do the check and repair? Best is to run it from the webUI so the correct designation gets used automatically. If you did it from the command line, what was the exact command you used? Quote Link to comment
justintas Posted December 15, 2021 Author Share Posted December 15, 2021 yes original disk is jammed in disk cage hence swapped to new cage and put new disk in as disk 1 (sdc) Are errors recoverable ? Quote Link to comment
justintas Posted December 15, 2021 Author Share Posted December 15, 2021 check and repair was from gui check first with default settings -n had to put array into mainteance mode to run then ran check again with a blank option was that the correct steps ? Quote Link to comment
trurl Posted December 16, 2021 Share Posted December 16, 2021 28 minutes ago, justintas said: original disk is jammed in disk cage By "jammed" do you mean it can't be removed for some reason? 28 minutes ago, justintas said: new disk in as disk 1 (sdc) Might as well forget about that sdc designation. If you want to identify a specific drive assignment, disk1 is the way to go. If you want to identify a specific drive, some unique portion of the serial number is most useful, often the last 4 characters will work for many models. 26 minutes ago, justintas said: check and repair was from gui So it would have used the correct designation, which in this case would be /dev/md1. Specifying the md device is necessary to get parity updated with repair so it remains valid. Might be useful to try to get the data from the original disk. Can you mount it as an Unassigned Device? Quote Link to comment
justintas Posted December 16, 2021 Author Share Posted December 16, 2021 2 minutes ago, trurl said: By "jammed" do you mean it can't be removed for some reason? Yes its jammed in the cage and cant be removed screw must have moved 2 minutes ago, trurl said: Might as well forget about that sdc designation. If you want to identify a specific drive assignment, disk1 is the way to go. If you want to identify a specify drive, some unique portion of the serial number is most useful, often the last 4 characters will work for many models. So it would have used the correct designation, which in this case would be /dev/md1. Specifying the md device is necessary to get parity updated with repair so it remains valid. Might be useful to try to get the data from the original disk. Can you mount it as an Unassigned Device? yes is disk 1 is green but showing as unmountable , mounting existing drive will be hard as cage damaged. Will see if can cut drive out of cage any other options or do pictures attached if any help Quote Link to comment
justintas Posted December 16, 2021 Author Share Posted December 16, 2021 Ok tried check process again here are the results below, and guess what it is fixed !!! Thanks for your guidance much appreciated re ran check -n Results xfs_repair status: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... would have reset inode 4328816385 nlinks from 1 to 2 would have reset inode 4328816389 nlinks from 1 to 2 would have reset inode 4328816398 nlinks from 1 to 2 No modify flag set, skipping filesystem flush and exiting. Then re ran check (blank) to do a repair Results: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... resetting inode 4328816385 nlinks from 1 to 2 resetting inode 4328816389 nlinks from 1 to 2 resetting inode 4328816398 nlinks from 1 to 2 done Quote Link to comment
justintas Posted December 16, 2021 Author Share Posted December 16, 2021 Should I do another parity rebuild before changing parity disk to larger disk ? Quote Link to comment
Solution trurl Posted December 16, 2021 Solution Share Posted December 16, 2021 13 hours ago, justintas said: Should I do another parity rebuild before changing parity disk to larger disk ? No point unless you just want to exercise your hardware. Parity will be built to the new larger disk whether your current parity is valid or not, or even if you had no parity disk before. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.