Jump to content

Weird issues after removing drive from array


Recommended Posts

Posted

Hey all! So I had a drive (I think) go bad so I pulled it from the array after moving all the data off the emulated version. Went into New Config, got a new config going without that drive and fired up a Parity Build which finished after ~24h for 14TB. So far everything's good. But while running the rebuild I was also downloading things so my cache drive nearly filled up so I had to pause downloads until Parity was finished because the mover apparently won't run when Parity is going. Fast forward 24h and now I have an array where the mover seems to freeze at some point while moving files, perhaps only when writing to disk3 but that also might be a coincidence?

Other than perhaps a confluence with disk3 (which tests out fine) I can see no consistency with when things/the mover go wrong, but it keeps stalling and then there seems to be no way to kill it besides restarting unraid. Does anyone have any idea where to look and/or what might be up? FWIW, as well, generating this diagnostics file took 2+ minutes, but, besides files not moving about properly, my array/dockers appear to be functioning fine...

unraid-diagnostics-20240412-1108.zip

Posted (edited)

I should note: mover worked fine before the drive removal, shares are set to Cache > Array, and nothing on the config side changed besides removal of a drive...

I'm also now trying to run a New Permissions because I tried to move at least one set of files from /mnt/cache/media/movies > /mnt/disk1/media/movies and, because I did it from the CLI, they came through as owned by root. This took the better part of 20m, but eventually finished, but Unraid still thinks the mover is running even though nothing's happening so I've no idea what to do besides another reboot...

Edited by polishprocessors
Posted (edited)

Ok, I realise I'm going at this on my own, but on reboot I tried excluding disk3 from the share and re-ran the mover. Got some early errors:

Apr 12 12:05:07 unRaid shfs: copy_file: /mnt/cache/media/movies/Defiance (2008)/Defiance (2008) [1080p].mp4 /mnt/disk1/media/movies/Defiance (2008)/Defiance (2008) [1080p].mp4.partial (17) File exists
Apr 12 12:05:07 unRaid move: move_object: /mnt/cache/media/movies/Defiance (2008)/Defiance (2008) [1080p].mp4 Connection refused

But it's looking to otherwise go just fine. Is it possible that drive is bad despite showing green and passing all self-tests? Parity built just fine with that drive, but I did notice all writes to that drive were going INCREDIBLY slow when I was manually copying (1-4MB/s) versus other drives being fine (150MB/s). But those issues were only when I was moving files to the drive, not when it was running parity, leading me to perhaps believe there's some sort of logical error with the drive, not physical?

Edited by polishprocessors
Posted
2 minutes ago, JorgeB said:

This means the file already exists, mover won't move any duplicate files

Well fine, but what about the Connection Refused error? Again, this is going to other drives (not disk3) without issues, so I don't expect any problems if the issue is just with disk3... I'm going to let the mover complete now and then take the array down to maintenance mode to run an xfs check on disk3...

Posted

Ok, ran zfs check and it came back with this:

 

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 6
        - agno = 4
        - agno = 7
        - agno = 5
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

 

Posted

Mover appears to be working, or it's still not?

 

Strange about disk3 taking so much time to mount, but don't see any errors, could be a disk problem, it may not be performing normally, you can try running the diskspeed docker to test.

Posted

Ok, I'll add one more thing for now but try to stop clogging this up until I get more info or someone's got better ideas on what to do. All data was already moved off this drive, but I tried moving a sample of data around between working disks and this disk3. disk1 & 2 are fine, running a copy from disk1 > disk2 copies over at 50MB/s. Disk1 > Disk3, however, copies (in unbalanced) showing .2MB/s, but in the unraid dashboard at 0MB/s then bursts of 200K-1M. Reading seems to work fine, it's just writing disk3 that appears to not work.

Posted

Can confirm, there is clearly something hokey with that drive. Even reads (after I copied some files over at .2M/s using unbalanced and then wanted to move them back) were sometimes up to 50M/s, but sometimes more like .2M/s as well. I eventually was able to copy all the files off that drive and then excluded it from shares on the array and have a replacement drive on the way to replace it. In the meantime I added another (smaller) drive and that wrote zeros and formatted and joined the array with no issues, so yes, I'm pretty sure I just got extremely unlucky and had 2 drives fail at once. Though neither was a hard failure, just the beginnings of a slow one, so I'm not out any data, but does make me wonder if dual parity is warranted...

  • Like 1

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...