Having two issues. Mover hangs and the server won't reboot

flyize · March 22, 2021

I started noticing some issues with my Plex server. Things like transcoder timeouts. Fast forward to now.

Running 6.9.1.

I tried to shutdown the server the other day. It hung forever, but the UI was still up. I noticed the mover was running so I issued the 'mover stop' command which worked fine. However, the server refused to shut down cleanly (and still does). I just noticed the mover hung again, so I'm hoping you guys can help.

truffle-diagnostics-20210322-1650.zip

flyize · March 23, 2021

Okay, it seems the Mover isn't working because the files were somehow moved to the array, but never removed from the cache drive. Is there a way to clean this up without resorting to doing it manually?

JorgeB · March 23, 2021

Mover won't move existing files, and if files exist on source and dest you need to manually delete one of them.

flyize · March 23, 2021

Right. This has suddenly happened to a bunch of files. I'd like to know why, and if possible, a way to fix it without having to compare directories manually.

John_M · March 23, 2021

57 minutes ago, flyize said:

Right. This has suddenly happened to a bunch of files. I'd like to know why, and if possible, a way to fix it without having to compare directories manually.

Suppose you have a Cache:yes User share called, say "Media". If you have an application that writes new files to /mnt/cache/Media they will be moved to the array when the Mover runs. However, if a file of the same name already exists on the array you will have a duplicate that can't be moved. The solution would be to make sure the application writes to /mnt/user/Media instead. Then, when a file with the same name as an existing one is stored the old file will be overwritten. I'm not saying that's necessarily what's happening in your case, but it would explain what you're seeing.

flyize · March 23, 2021

I think I understand all that. With the exception of the last couple of weeks, everything has worked well for months. (And I haven't changed anything that I'm aware of.)

Edited March 23, 2021 by flyize

ChatNoir · March 23, 2021

Did you had unclean shutdowns or cache FS issues ? This might have caused a copied files not being deleted from the source.

flyize · March 23, 2021

Yeah, the server won't shutdown cleanly. I'm not sure if these items are related, which is why I mentioned it.

John_M · March 23, 2021

If the server fails to shut down cleanly it should leave a log on the boot flash device, in the logs directory (i.e. /boot/logs from the Unraid command line).

flyize · March 23, 2021

I *think* I got it working.

Not sure if this was required but I got it booted in safe mode, brought the array up in maintenance mode, and ran xfs_repair on all the drives. Everything seems to shutdown and reboot normally now.

I was also able to use a 'finddupes' script here to locate all the duplicate files and remove them from the cache drive.

John_M · March 23, 2021

Did xfs_repair find any file system corruption? I see no evidence of it in your diagnostics but I do see a lot of this:

Mar 19 17:30:36 Truffle kernel: EDAC sbridge: Seeking for: PCI ID 8086:0e79
Mar 19 17:30:36 Truffle kernel: EDAC sbridge: Seeking for: PCI ID 8086:0e6a
Mar 19 17:30:36 Truffle kernel: EDAC sbridge: Seeking for: PCI ID 8086:0e6b

I don't know if it's bad or benign (either way, it's verbose) and a quick search didn't reveal much. I know that EDAC is usually associated with ECC RAM but this seems to be scanning the PCI bus. The 8086 manufacturer IDs suggest Intel but the unit IDs don't match anything in your lspci.txt. Maybe someone else can shed some light.

flyize · March 24, 2021

I didn't see any obvious errors to suggest file system corruption, no.

edit: This suggests that I have ECC enabled in the BIOS. The server doesn't have KVM, should I drag it down and check to see if ECC is enabled?

http://forums.debian.net/viewtopic.php?f=5&t=141551

Edited March 24, 2021 by flyize

flyize · April 6, 2021

Okay, I'm still seeing this issue. I was unable to reboot last night and had to power the server off and back on. Now I turned on the Mover, and it's hung. Can anyone help?

truffle-diagnostics-20210406-0841.zip

flyize · April 6, 2021

It appears that it took the Mover about 3 hours to move a 2GB file. What could be going on here?

trurl · April 6, 2021

Unrelated but your appdata share has files on the array.

1 hour ago, flyize said:

It appears that it took the Mover about 3 hours to move a 2GB file. What could be going on here?

Don't see anything like that in your previous diagnostics. Post new diagnostics.

flyize · April 6, 2021

Attached. Although after some thought, I have an idea what the issue might be. I added a new drive. Unfortunately its a shingled drive. I think when Mover runs, it very quickly overwhelms the cache on the SMR drive (cache drive is 1TB). Does that seem logical? If so, would that somehow prevent the machine from properly rebooting?

truffle-diagnostics-20210406-1522.zip

John_M · April 6, 2021

3 hours ago, flyize said:

its a shingled drive. I think when Mover runs, it very quickly overwhelms the cache on the SMR drive (cache drive is 1TB). Does that seem logical?

The Mover makes sequential writes to the array, not random ones, so the persistent cache should see little use. If the drive can write directly to a shingled band it will do so. SMR disks work rather better in Unraid than in typical RAID applications, which is fortunate because you don't have just one, you have several. Parity 2 is one of them so maybe that's the problem. I don't mind using them as data disks (where there's an advantage to doing so) but I'm less enthusiastic about using them for parity.

flyize · April 7, 2021

30 minutes ago, John_M said:

The Mover makes sequential writes to the array, not random ones, so the persistent cache should see little use. If the drive can write directly to a shingled band it will do so. SMR disks work rather better in Unraid than in typical RAID applications, which is fortunate because you don't have just one, you have several. Parity 2 is one of them so maybe that's the problem. I don't mind using them as data disks (where there's an advantage to doing so) but I'm less enthusiastic about using them for parity.

Well that's interesting. Is it worth moving disks around so that I don't an SMR drive in parity?

edit: Mover is still running from this morning. It looks like its taking about 1.5 hours to move a ~2GB file.

Edited April 7, 2021 by flyize

JorgeB · April 7, 2021

That's too slow even for SMR, but that specific model has been known to suffer from very bad performance in some cases, you'd need to do some tests to see if it's a specif disk, but yeah if possible avoid SMR for parity, at least it will make it easier to test write performance between SMR and CMR array disks.

flyize · April 7, 2021

Can I just remove the 2nd parity drive without having to rebuild parity?

edit: Also, I just disabled Docker and VMs, and now the Mover is flying again. Any chance that would offer clues as to what is going on?

Edited April 7, 2021 by flyize
updates

trurl · April 7, 2021

12 minutes ago, flyize said:

I just disabled Docker and VMs, and now the Mover is flying again.

Mover can't move open files

14 minutes ago, flyize said:

Can I just remove the 2nd parity drive without having to rebuild parity?

If the problem disk is parity2 then stop the array, unassign parity2, start the array.

flyize · April 7, 2021

It *was* moving files, just very slowly. With Docker/VMs disabled, its moving them much faster.

flyize · April 7, 2021

As for just ripping parity2 out, I get this message

[quote]Start will disable the missing disk and then bring the array on-line. Install a replacement disk as soon as possible.[/quote]

Can someone confirm that parity will be maintained by the other parity drive?

trurl · April 7, 2021

26 minutes ago, flyize said:

Can someone confirm that parity will be maintained by the other parity drive?

yes

If you aren't planning to replace parity2 then New Config without it and check the box saying parity is valid.

flyize · April 7, 2021

34 minutes ago, trurl said:

yes

If you aren't planning to replace parity2 then New Config without it and check the box saying parity is valid.

My plan was going to be to move parity2 into the array, unbalance data off a non-SMR drive, then add that drive as parity2. Will that work?

Having two issues. Mover hangs and the server won't reboot

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation