Mover skipping files, lost a VM because of failed cache drive

Farvneho · April 28, 2021

I have been using Unraid for about 2 years, with 4 platter drives and 1 SSD cache. On Sunday, the 10-year-old SSD failed, but took with it an entire VM which was not being moved to the proper part of the array.

I am not using Mover Helper. The share settings are as follows:

And the logs from the last mover instance run before the drive failure show:

Apr 25 03:40:07 holly move: move: file /mnt/cache/appdata/swag/keys/letsencrypt
Apr 25 03:40:07 holly move: move_object: /mnt/cache/appdata/swag/keys/letsencrypt No such file or directory
Apr 25 03:40:07 holly move: move: file /mnt/cache/appdata/swag/etc/letsencrypt/keys/0015_key-certbot.pem
Apr 25 03:40:07 holly move: move: file /mnt/cache/appdata/swag/etc/letsencrypt/csr/0015_csr-certbot.pem
Apr 25 03:40:07 holly move: move: skip /mnt/cache/domains/Debian/vdisk1.img
Apr 25 03:40:07 holly root: mover: finished
Apr 25 14:16:54 holly emhttpd: read SMART /dev/sdb
Apr 25 14:30:30 holly kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 25 14:30:30 holly kernel: ata4.00: failed command: FLUSH CACHE
Apr 25 14:30:30 holly kernel: ata4.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 17
Apr 25 14:30:30 holly kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 25 14:30:30 holly kernel: ata4.00: status: { DRDY }
Apr 25 14:30:30 holly kernel: ata4: hard resetting link
Apr 25 14:30:35 holly kernel: ata4: link is slow to respond, please be patient (ready=0)
Apr 25 14:30:40 holly kernel: ata4: COMRESET failed (errno=-16)
Apr 25 14:30:40 holly kernel: ata4: hard resetting link
Apr 25 14:30:45 holly kernel: ata4: link is slow to respond, please be patient (ready=0)
Apr 25 14:30:50 holly kernel: ata4: COMRESET failed (errno=-16)
Apr 25 14:30:50 holly kernel: ata4: hard resetting link
Apr 25 14:30:55 holly kernel: ata4: link is slow to respond, please be patient (ready=0)
Apr 25 14:31:25 holly kernel: ata4: COMRESET failed (errno=-16)
Apr 25 14:31:25 holly kernel: ata4: limiting SATA link speed to 3.0 Gbps
Apr 25 14:31:25 holly kernel: ata4: hard resetting link

In every instance of that vdisk, it seems to skip moving it.

I'm planning to install a new cache drive this weekend, and I don't want this to happen again. What could have caused this?

Thanks for your help.

trurl · April 28, 2021

The screenshot of your user share is not for the domains share, which, as you say, is being skipped. Usually you want that share to stay on cache along with appdata and system shares. Keeping appdata, domains, system shares on cache allows array to spin down since these files are always open, and can improve docker / VM performance since these are on faster disks not affected by parity updates.

Looks like a bad connection on one of your disks, but can't say for sure which.

All the information we need to figure out your configuration and situation can be easily given to us, and is usually better than any screenshots or syslog snippets (seldom sufficient).

If possible before rebooting and preferably with the array started
Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.

Doridian · April 28, 2021

Keep in mind one thing about unRAID and cache: Files exist either only in cache, or only in array (they get moved between them, not copied).

Also, files that are actively in use do not get touched by the mover, like if your VM is running, the disk will be mounted and therefor deemed "untouchable".

So, you pretty much have the choice (if your VM is running 24/7 and without external backups):

- Keep your VM disk on cache only and have either data loss or make it a RAID1 cache pool

- Keep your VM on the array only and deal with the speed ramifications this may cause

trurl · April 29, 2021

Or keep these things on cache and have them backed up to the array. CA Backup plugin will take care of appdata, I think there is also a VM Backup plugin but I haven't used it.

To get your dockers going again exactly as they were you just need appdata and the saved templates, which are on flash. You should always have a flash backup of course.

Farvneho · May 18, 2021

On 4/28/2021 at 3:47 PM, Doridian said:

Keep in mind one thing about unRAID and cache: Files exist either only in cache, or only in array (they get moved between them, not copied).

Also, files that are actively in use do not get touched by the mover, like if your VM is running, the disk will be mounted and therefor deemed "untouchable".

So, you pretty much have the choice (if your VM is running 24/7 and without external backups):

- Keep your VM disk on cache only and have either data loss or make it a RAID1 cache pool

- Keep your VM on the array only and deal with the speed ramifications this may cause

Thank you for this. I think this is why my VM was lost. What I am going to do from now on is have a cache pool so there is at least some redundancy even if a VM is running from the cache. I could go through the trouble of shutting the machine down before Mover starts, but it's a Minecraft server and people like to afk.

Mover skipping files, lost a VM because of failed cache drive

Recommended Posts

Farvneho

Link to comment

trurl

Link to comment

Doridian

Link to comment

trurl

Link to comment

Farvneho

Link to comment

Join the conversation