Having two issues. Mover hangs and the server won't reboot


Recommended Posts

I started noticing some issues with my Plex server. Things like transcoder timeouts. Fast forward to now.

 

Running 6.9.1.

 

I tried to shutdown the server the other day. It hung forever, but the UI was still up. I noticed the mover was running so I issued the 'mover stop' command which worked fine. However, the server refused to shut down cleanly (and still does). I just noticed the mover hung again, so I'm hoping you guys can help.

truffle-diagnostics-20210322-1650.zip

Link to comment
57 minutes ago, flyize said:

Right. This has suddenly happened to a bunch of files. I'd like to know why, and if possible, a way to fix it without having to compare directories manually.

 

Suppose you have a Cache:yes User share called, say "Media". If you have an application that writes new files to /mnt/cache/Media they will be moved to the array when the Mover runs. However, if a file of the same name already exists on the array you will have a duplicate that can't be moved. The solution would be to make sure the application writes to /mnt/user/Media instead. Then, when a file with the same name as an existing one is stored the old file will be overwritten. I'm not saying that's necessarily what's happening in your case, but it would explain what you're seeing.

 

Link to comment

I *think* I got it working.

 

Not sure if this was required but I got it booted in safe mode, brought the array up in maintenance mode, and ran xfs_repair on all the drives. Everything seems to shutdown and reboot normally now.

 

I was also able to use a 'finddupes' script here to locate all the duplicate files and remove them from the cache drive.

Link to comment

Did xfs_repair find any file system corruption? I see no evidence of it in your diagnostics but I do see a lot of this:

 

Mar 19 17:30:36 Truffle kernel: EDAC sbridge: Seeking for: PCI ID 8086:0e79
Mar 19 17:30:36 Truffle kernel: EDAC sbridge: Seeking for: PCI ID 8086:0e6a
Mar 19 17:30:36 Truffle kernel: EDAC sbridge: Seeking for: PCI ID 8086:0e6b

 

I don't know if it's bad or benign (either way, it's verbose) and a quick search didn't reveal much. I know that EDAC is usually associated with ECC RAM but this seems to be scanning the PCI bus. The 8086 manufacturer IDs suggest Intel but the unit IDs don't match anything in your lspci.txt. Maybe someone else can shed some light.

 

Link to comment
  • 2 weeks later...
3 hours ago, flyize said:

its a shingled drive. I think when Mover runs, it very quickly overwhelms the cache on the SMR drive (cache drive is 1TB). Does that seem logical?

 

The Mover makes sequential writes to the array, not random ones, so the persistent cache should see little use. If the drive can write directly to a shingled band it will do so. SMR disks work rather better in Unraid than in typical RAID applications, which is fortunate because you don't have just one, you have several. Parity 2 is one of them so maybe that's the problem. I don't mind using them as data disks (where there's an advantage to doing so) but I'm less enthusiastic about using them for parity.

 

Link to comment
30 minutes ago, John_M said:

 

The Mover makes sequential writes to the array, not random ones, so the persistent cache should see little use. If the drive can write directly to a shingled band it will do so. SMR disks work rather better in Unraid than in typical RAID applications, which is fortunate because you don't have just one, you have several. Parity 2 is one of them so maybe that's the problem. I don't mind using them as data disks (where there's an advantage to doing so) but I'm less enthusiastic about using them for parity.

 

Well that's interesting. Is it worth moving disks around so that I don't an SMR drive in parity?

 

edit: Mover is still running from this morning. It looks like its taking about 1.5 hours to move a ~2GB file.

Edited by flyize
Link to comment

That's too slow even for SMR, but that specific model has been known to suffer from very bad performance in some cases, you'd need to do some tests to see if it's a specif disk, but yeah if possible avoid SMR for parity, at least it will make it easier to test write performance between SMR and CMR array disks.

Link to comment

Can I just remove the 2nd parity drive without having to rebuild parity?

 

edit: Also, I just disabled Docker and VMs, and now the Mover is flying again. Any chance that would offer clues as to what is going on?

Edited by flyize
updates
Link to comment
12 minutes ago, flyize said:

I just disabled Docker and VMs, and now the Mover is flying again.

Mover can't move open files

 

14 minutes ago, flyize said:

Can I just remove the 2nd parity drive without having to rebuild parity?

If the problem disk is parity2 then stop the array, unassign parity2, start the array.

Link to comment

As for just ripping parity2 out, I get this message

 

[quote]Start will disable the missing disk and then bring the array on-line. Install a replacement disk as soon as possible.[/quote]

 

Can someone confirm that parity will be maintained by the other parity drive?

Link to comment
34 minutes ago, trurl said:

yes

 

If you aren't planning to replace parity2 then New Config without it and check the box saying parity is valid.

My plan was going to be to move parity2 into the array, unbalance data off a non-SMR drive, then add that drive as parity2. Will that work?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.