flyize Posted March 22, 2021 Share Posted March 22, 2021 I started noticing some issues with my Plex server. Things like transcoder timeouts. Fast forward to now. Running 6.9.1. I tried to shutdown the server the other day. It hung forever, but the UI was still up. I noticed the mover was running so I issued the 'mover stop' command which worked fine. However, the server refused to shut down cleanly (and still does). I just noticed the mover hung again, so I'm hoping you guys can help. truffle-diagnostics-20210322-1650.zip Quote Link to comment
flyize Posted March 23, 2021 Author Share Posted March 23, 2021 Okay, it seems the Mover isn't working because the files were somehow moved to the array, but never removed from the cache drive. Is there a way to clean this up without resorting to doing it manually? Quote Link to comment
JorgeB Posted March 23, 2021 Share Posted March 23, 2021 Mover won't move existing files, and if files exist on source and dest you need to manually delete one of them. Quote Link to comment
flyize Posted March 23, 2021 Author Share Posted March 23, 2021 Right. This has suddenly happened to a bunch of files. I'd like to know why, and if possible, a way to fix it without having to compare directories manually. Quote Link to comment
John_M Posted March 23, 2021 Share Posted March 23, 2021 57 minutes ago, flyize said: Right. This has suddenly happened to a bunch of files. I'd like to know why, and if possible, a way to fix it without having to compare directories manually. Suppose you have a Cache:yes User share called, say "Media". If you have an application that writes new files to /mnt/cache/Media they will be moved to the array when the Mover runs. However, if a file of the same name already exists on the array you will have a duplicate that can't be moved. The solution would be to make sure the application writes to /mnt/user/Media instead. Then, when a file with the same name as an existing one is stored the old file will be overwritten. I'm not saying that's necessarily what's happening in your case, but it would explain what you're seeing. Quote Link to comment
flyize Posted March 23, 2021 Author Share Posted March 23, 2021 (edited) I think I understand all that. With the exception of the last couple of weeks, everything has worked well for months. (And I haven't changed anything that I'm aware of.) Edited March 23, 2021 by flyize Quote Link to comment
ChatNoir Posted March 23, 2021 Share Posted March 23, 2021 Did you had unclean shutdowns or cache FS issues ? This might have caused a copied files not being deleted from the source. Quote Link to comment
flyize Posted March 23, 2021 Author Share Posted March 23, 2021 Yeah, the server won't shutdown cleanly. I'm not sure if these items are related, which is why I mentioned it. Quote Link to comment
John_M Posted March 23, 2021 Share Posted March 23, 2021 If the server fails to shut down cleanly it should leave a log on the boot flash device, in the logs directory (i.e. /boot/logs from the Unraid command line). Quote Link to comment
flyize Posted March 23, 2021 Author Share Posted March 23, 2021 I *think* I got it working. Not sure if this was required but I got it booted in safe mode, brought the array up in maintenance mode, and ran xfs_repair on all the drives. Everything seems to shutdown and reboot normally now. I was also able to use a 'finddupes' script here to locate all the duplicate files and remove them from the cache drive. Quote Link to comment
John_M Posted March 23, 2021 Share Posted March 23, 2021 Did xfs_repair find any file system corruption? I see no evidence of it in your diagnostics but I do see a lot of this: Mar 19 17:30:36 Truffle kernel: EDAC sbridge: Seeking for: PCI ID 8086:0e79 Mar 19 17:30:36 Truffle kernel: EDAC sbridge: Seeking for: PCI ID 8086:0e6a Mar 19 17:30:36 Truffle kernel: EDAC sbridge: Seeking for: PCI ID 8086:0e6b I don't know if it's bad or benign (either way, it's verbose) and a quick search didn't reveal much. I know that EDAC is usually associated with ECC RAM but this seems to be scanning the PCI bus. The 8086 manufacturer IDs suggest Intel but the unit IDs don't match anything in your lspci.txt. Maybe someone else can shed some light. Quote Link to comment
flyize Posted March 24, 2021 Author Share Posted March 24, 2021 (edited) I didn't see any obvious errors to suggest file system corruption, no. edit: This suggests that I have ECC enabled in the BIOS. The server doesn't have KVM, should I drag it down and check to see if ECC is enabled? http://forums.debian.net/viewtopic.php?f=5&t=141551 Edited March 24, 2021 by flyize Quote Link to comment
flyize Posted April 6, 2021 Author Share Posted April 6, 2021 Okay, I'm still seeing this issue. I was unable to reboot last night and had to power the server off and back on. Now I turned on the Mover, and it's hung. Can anyone help? truffle-diagnostics-20210406-0841.zip Quote Link to comment
flyize Posted April 6, 2021 Author Share Posted April 6, 2021 It appears that it took the Mover about 3 hours to move a 2GB file. What could be going on here? Quote Link to comment
trurl Posted April 6, 2021 Share Posted April 6, 2021 Unrelated but your appdata share has files on the array. 1 hour ago, flyize said: It appears that it took the Mover about 3 hours to move a 2GB file. What could be going on here? Don't see anything like that in your previous diagnostics. Post new diagnostics. Quote Link to comment
flyize Posted April 6, 2021 Author Share Posted April 6, 2021 Attached. Although after some thought, I have an idea what the issue might be. I added a new drive. Unfortunately its a shingled drive. I think when Mover runs, it very quickly overwhelms the cache on the SMR drive (cache drive is 1TB). Does that seem logical? If so, would that somehow prevent the machine from properly rebooting? truffle-diagnostics-20210406-1522.zip Quote Link to comment
John_M Posted April 6, 2021 Share Posted April 6, 2021 3 hours ago, flyize said: its a shingled drive. I think when Mover runs, it very quickly overwhelms the cache on the SMR drive (cache drive is 1TB). Does that seem logical? The Mover makes sequential writes to the array, not random ones, so the persistent cache should see little use. If the drive can write directly to a shingled band it will do so. SMR disks work rather better in Unraid than in typical RAID applications, which is fortunate because you don't have just one, you have several. Parity 2 is one of them so maybe that's the problem. I don't mind using them as data disks (where there's an advantage to doing so) but I'm less enthusiastic about using them for parity. Quote Link to comment
flyize Posted April 7, 2021 Author Share Posted April 7, 2021 (edited) 30 minutes ago, John_M said: The Mover makes sequential writes to the array, not random ones, so the persistent cache should see little use. If the drive can write directly to a shingled band it will do so. SMR disks work rather better in Unraid than in typical RAID applications, which is fortunate because you don't have just one, you have several. Parity 2 is one of them so maybe that's the problem. I don't mind using them as data disks (where there's an advantage to doing so) but I'm less enthusiastic about using them for parity. Well that's interesting. Is it worth moving disks around so that I don't an SMR drive in parity? edit: Mover is still running from this morning. It looks like its taking about 1.5 hours to move a ~2GB file. Edited April 7, 2021 by flyize Quote Link to comment
JorgeB Posted April 7, 2021 Share Posted April 7, 2021 That's too slow even for SMR, but that specific model has been known to suffer from very bad performance in some cases, you'd need to do some tests to see if it's a specif disk, but yeah if possible avoid SMR for parity, at least it will make it easier to test write performance between SMR and CMR array disks. Quote Link to comment
flyize Posted April 7, 2021 Author Share Posted April 7, 2021 (edited) Can I just remove the 2nd parity drive without having to rebuild parity? edit: Also, I just disabled Docker and VMs, and now the Mover is flying again. Any chance that would offer clues as to what is going on? Edited April 7, 2021 by flyize updates Quote Link to comment
trurl Posted April 7, 2021 Share Posted April 7, 2021 12 minutes ago, flyize said: I just disabled Docker and VMs, and now the Mover is flying again. Mover can't move open files 14 minutes ago, flyize said: Can I just remove the 2nd parity drive without having to rebuild parity? If the problem disk is parity2 then stop the array, unassign parity2, start the array. Quote Link to comment
flyize Posted April 7, 2021 Author Share Posted April 7, 2021 It *was* moving files, just very slowly. With Docker/VMs disabled, its moving them much faster. Quote Link to comment
flyize Posted April 7, 2021 Author Share Posted April 7, 2021 As for just ripping parity2 out, I get this message [quote]Start will disable the missing disk and then bring the array on-line. Install a replacement disk as soon as possible.[/quote] Can someone confirm that parity will be maintained by the other parity drive? Quote Link to comment
trurl Posted April 7, 2021 Share Posted April 7, 2021 26 minutes ago, flyize said: Can someone confirm that parity will be maintained by the other parity drive? yes If you aren't planning to replace parity2 then New Config without it and check the box saying parity is valid. Quote Link to comment
flyize Posted April 7, 2021 Author Share Posted April 7, 2021 34 minutes ago, trurl said: yes If you aren't planning to replace parity2 then New Config without it and check the box saying parity is valid. My plan was going to be to move parity2 into the array, unbalance data off a non-SMR drive, then add that drive as parity2. Will that work? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.