Jump to content

Mover skipping files, lost a VM because of failed cache drive


Recommended Posts

I have been using Unraid for about 2 years, with 4 platter drives and 1 SSD cache.  On Sunday, the 10-year-old SSD failed, but took with it an entire VM which was not being moved to the proper part of the array.

 

I am not using Mover Helper.  The share settings are as follows:505OXFz.png

 

And the logs from the last mover instance run before the drive failure show:

 

Apr 25 03:40:07 holly move: move: file /mnt/cache/appdata/swag/keys/letsencrypt
Apr 25 03:40:07 holly move: move_object: /mnt/cache/appdata/swag/keys/letsencrypt No such file or directory
Apr 25 03:40:07 holly move: move: file /mnt/cache/appdata/swag/etc/letsencrypt/keys/0015_key-certbot.pem
Apr 25 03:40:07 holly move: move: file /mnt/cache/appdata/swag/etc/letsencrypt/csr/0015_csr-certbot.pem
Apr 25 03:40:07 holly move: move: skip /mnt/cache/domains/Debian/vdisk1.img
Apr 25 03:40:07 holly root: mover: finished
Apr 25 14:16:54 holly emhttpd: read SMART /dev/sdb
Apr 25 14:30:30 holly kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 25 14:30:30 holly kernel: ata4.00: failed command: FLUSH CACHE
Apr 25 14:30:30 holly kernel: ata4.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 17
Apr 25 14:30:30 holly kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 25 14:30:30 holly kernel: ata4.00: status: { DRDY }
Apr 25 14:30:30 holly kernel: ata4: hard resetting link
Apr 25 14:30:35 holly kernel: ata4: link is slow to respond, please be patient (ready=0)
Apr 25 14:30:40 holly kernel: ata4: COMRESET failed (errno=-16)
Apr 25 14:30:40 holly kernel: ata4: hard resetting link
Apr 25 14:30:45 holly kernel: ata4: link is slow to respond, please be patient (ready=0)
Apr 25 14:30:50 holly kernel: ata4: COMRESET failed (errno=-16)
Apr 25 14:30:50 holly kernel: ata4: hard resetting link
Apr 25 14:30:55 holly kernel: ata4: link is slow to respond, please be patient (ready=0)
Apr 25 14:31:25 holly kernel: ata4: COMRESET failed (errno=-16)
Apr 25 14:31:25 holly kernel: ata4: limiting SATA link speed to 3.0 Gbps
Apr 25 14:31:25 holly kernel: ata4: hard resetting link

 

In every instance of that vdisk, it seems to skip moving it.

 

I'm planning to install a new cache drive this weekend, and I don't want this to happen again.  What could have caused this?

 

Thanks for your help.

Link to comment

The screenshot of your user share is not for the domains share, which, as you say, is being skipped. Usually you want that share to stay on cache along with appdata and system shares. Keeping appdata, domains, system shares on cache allows array to spin down since these files are always open, and can improve docker / VM performance since these are on faster disks not affected by parity updates.

 

Looks like a bad connection on one of your disks, but can't say for sure which.

 

All the information we need to figure out your configuration and situation can be easily given to us, and is usually better than any screenshots or syslog snippets (seldom sufficient).

 

If possible before rebooting and preferably with the array started
Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.

Link to comment

Keep in mind one thing about unRAID and cache: Files exist either only in cache, or only in array (they get moved between them, not copied).

Also, files that are actively in use do not get touched by the mover, like if your VM is running, the disk will be mounted and therefor deemed "untouchable".

 

So, you pretty much have the choice (if your VM is running 24/7 and without external backups):

- Keep your VM disk on cache only and have either data loss or make it a RAID1 cache pool

- Keep your VM on the array only and deal with the speed ramifications this may cause

Link to comment

Or keep these things on cache and have them backed up to the array. CA Backup plugin will take care of appdata, I think there is also a VM Backup plugin but I haven't used it.

 

To get your dockers going again exactly as they were you just need appdata and the saved templates, which are on flash. You should always have a flash backup of course.

  • Thanks 1
Link to comment
  • 3 weeks later...
On 4/28/2021 at 3:47 PM, Doridian said:

Keep in mind one thing about unRAID and cache: Files exist either only in cache, or only in array (they get moved between them, not copied).

Also, files that are actively in use do not get touched by the mover, like if your VM is running, the disk will be mounted and therefor deemed "untouchable".

 

So, you pretty much have the choice (if your VM is running 24/7 and without external backups):

- Keep your VM disk on cache only and have either data loss or make it a RAID1 cache pool

- Keep your VM on the array only and deal with the speed ramifications this may cause

Thank you for this.  I think this is why my VM was lost.  What I am going to do from now on is have a cache pool so there is at least some redundancy even if a VM is running from the cache.   I could go through the trouble of shutting the machine down before Mover starts, but it's a Minecraft server and people like to afk.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...