polishprocessors Posted April 12, 2024 Posted April 12, 2024 Hey all! So I had a drive (I think) go bad so I pulled it from the array after moving all the data off the emulated version. Went into New Config, got a new config going without that drive and fired up a Parity Build which finished after ~24h for 14TB. So far everything's good. But while running the rebuild I was also downloading things so my cache drive nearly filled up so I had to pause downloads until Parity was finished because the mover apparently won't run when Parity is going. Fast forward 24h and now I have an array where the mover seems to freeze at some point while moving files, perhaps only when writing to disk3 but that also might be a coincidence? Other than perhaps a confluence with disk3 (which tests out fine) I can see no consistency with when things/the mover go wrong, but it keeps stalling and then there seems to be no way to kill it besides restarting unraid. Does anyone have any idea where to look and/or what might be up? FWIW, as well, generating this diagnostics file took 2+ minutes, but, besides files not moving about properly, my array/dockers appear to be functioning fine... unraid-diagnostics-20240412-1108.zip Quote
polishprocessors Posted April 12, 2024 Author Posted April 12, 2024 (edited) I should note: mover worked fine before the drive removal, shares are set to Cache > Array, and nothing on the config side changed besides removal of a drive... I'm also now trying to run a New Permissions because I tried to move at least one set of files from /mnt/cache/media/movies > /mnt/disk1/media/movies and, because I did it from the CLI, they came through as owned by root. This took the better part of 20m, but eventually finished, but Unraid still thinks the mover is running even though nothing's happening so I've no idea what to do besides another reboot... Edited April 12, 2024 by polishprocessors Quote
polishprocessors Posted April 12, 2024 Author Posted April 12, 2024 (edited) Ok, I realise I'm going at this on my own, but on reboot I tried excluding disk3 from the share and re-ran the mover. Got some early errors: Apr 12 12:05:07 unRaid shfs: copy_file: /mnt/cache/media/movies/Defiance (2008)/Defiance (2008) [1080p].mp4 /mnt/disk1/media/movies/Defiance (2008)/Defiance (2008) [1080p].mp4.partial (17) File exists Apr 12 12:05:07 unRaid move: move_object: /mnt/cache/media/movies/Defiance (2008)/Defiance (2008) [1080p].mp4 Connection refused But it's looking to otherwise go just fine. Is it possible that drive is bad despite showing green and passing all self-tests? Parity built just fine with that drive, but I did notice all writes to that drive were going INCREDIBLY slow when I was manually copying (1-4MB/s) versus other drives being fine (150MB/s). But those issues were only when I was moving files to the drive, not when it was running parity, leading me to perhaps believe there's some sort of logical error with the drive, not physical? Edited April 12, 2024 by polishprocessors Quote
JorgeB Posted April 12, 2024 Posted April 12, 2024 2 minutes ago, polishprocessors said: File exists This means the file already exists, mover won't move any duplicate files. Quote
polishprocessors Posted April 12, 2024 Author Posted April 12, 2024 2 minutes ago, JorgeB said: This means the file already exists, mover won't move any duplicate files Well fine, but what about the Connection Refused error? Again, this is going to other drives (not disk3) without issues, so I don't expect any problems if the issue is just with disk3... I'm going to let the mover complete now and then take the array down to maintenance mode to run an xfs check on disk3... Quote
JorgeB Posted April 12, 2024 Posted April 12, 2024 Make sure the mover logging is still enabled, run the mover, post new diags. Quote
polishprocessors Posted April 12, 2024 Author Posted April 12, 2024 (edited) See attached. Mover finished no issues with disk3 excluded from shares (just excluded, not explicitly removed from the array). I think I'm going to flip to maintenance mode and check the filesystem on disk3. unraid-diagnostics-20240412-1357.zip Edited April 12, 2024 by polishprocessors Quote
polishprocessors Posted April 12, 2024 Author Posted April 12, 2024 Ok, ran zfs check and it came back with this: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 6 - agno = 4 - agno = 7 - agno = 5 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done Quote
polishprocessors Posted April 12, 2024 Author Posted April 12, 2024 On restarting the array disk1 and disk2 come up immediately, but disk3 takes ages to mount. No errors that I can see, but just takes 3+ minutes to mount where disk1/2/4 took fractions of a second. Attaching another diagnostics file... unraid-diagnostics-20240412-1318.zip Quote
JorgeB Posted April 12, 2024 Posted April 12, 2024 Mover appears to be working, or it's still not? Strange about disk3 taking so much time to mount, but don't see any errors, could be a disk problem, it may not be performing normally, you can try running the diskspeed docker to test. Quote
polishprocessors Posted April 12, 2024 Author Posted April 12, 2024 (edited) I did, and it all looks good... Really mysterious, these issues. I'm nervous about adding disk3 back into the shares but might have to at this point... Edited April 12, 2024 by polishprocessors Quote
polishprocessors Posted April 12, 2024 Author Posted April 12, 2024 Hmm... perhaps of some note I turned on alerting for SMART command timeouts and immediately got the following alerts: 188 Command timeout 0x0032 100 099 000 Old age Always Never 4 4 5 Quote
polishprocessors Posted April 12, 2024 Author Posted April 12, 2024 Ok, I'll add one more thing for now but try to stop clogging this up until I get more info or someone's got better ideas on what to do. All data was already moved off this drive, but I tried moving a sample of data around between working disks and this disk3. disk1 & 2 are fine, running a copy from disk1 > disk2 copies over at 50MB/s. Disk1 > Disk3, however, copies (in unbalanced) showing .2MB/s, but in the unraid dashboard at 0MB/s then bursts of 200K-1M. Reading seems to work fine, it's just writing disk3 that appears to not work. Quote
JorgeB Posted April 12, 2024 Posted April 12, 2024 Could be a problem with the disk and writes, diskspeed only test reads. Quote
polishprocessors Posted April 12, 2024 Author Posted April 12, 2024 Sucks, but yeah, looks like it might be... Had another drive with reallocated sectors I took out of the array, though, so if I'm down 18TB in a week that's gonna suck... Quote
polishprocessors Posted April 15, 2024 Author Posted April 15, 2024 Can confirm, there is clearly something hokey with that drive. Even reads (after I copied some files over at .2M/s using unbalanced and then wanted to move them back) were sometimes up to 50M/s, but sometimes more like .2M/s as well. I eventually was able to copy all the files off that drive and then excluded it from shares on the array and have a replacement drive on the way to replace it. In the meantime I added another (smaller) drive and that wrote zeros and formatted and joined the array with no issues, so yes, I'm pretty sure I just got extremely unlucky and had 2 drives fail at once. Though neither was a hard failure, just the beginnings of a slow one, so I'm not out any data, but does make me wonder if dual parity is warranted... 1 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.