Cache File System Failure

zaraki1311 · December 21, 2018

Hello,

Just posting to see if anyone has any thoughts as to what happened or how to prevent this issue in the future.

So last night just be for midnight I got the errors below on my cache pool. My cache pool consists of 4 250gb Samsung 860 evos that are very well aged somewhere between 2-3 years on time each. Looking at the cache it is setup to be btrfs raid 1. When I looked at my unraid server this morning there was an error on the cache pool saying there were was no file system. I rebooted my server and it still did not come back. The only thing I see with the disks is that one disk as some crc errors in the smart data. I decided to format the pool so I could get up and running again, but I need to know if there is something I can do to protect myself in the future or if this might just be a drive failing and I should get new ones and the cache to a different raid level.

Dec 20 23:37:26 ArlongPark kernel: BTRFS critical (device sdg1): corrupt leaf: root=2 block=723569819648 slot=105, unexpected item end, have 12754 expect 12818
Dec 20 23:37:26 ArlongPark kernel: BTRFS: error (device sdg1) in __btrfs_free_extent:6953: errno=-5 IO failure
Dec 20 23:37:26 ArlongPark kernel: BTRFS info (device sdg1): forced readonly
Dec 20 23:37:26 ArlongPark kernel: BTRFS: error (device sdg1) in btrfs_run_delayed_refs:3058: errno=-5 IO failure
Dec 21 03:40:01 ArlongPark crond[2324]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null

trurl · December 21, 2018

syslog snippets are seldom sufficient.

Did you get Diagnostics before rebooting?

zaraki1311 · December 21, 2018

I did not specifically grab them at the time, but I do have a diagnostics file on the flash drive that contains logs from dec 17th to 9:20 this morning

arlongpark-diagnostics-20181221-0920.zip

JorgeB · December 22, 2018

Error points to metadata corruption, but nothing jumps out, pool appeared to be working normally until the error, sorry, not much help.

zaraki1311 · December 22, 2018

Is there anything I should possibly try to do to maybe protect myself better? I have been thinking about adding an nvme or 3 to the mix and retire the old drives. So in that case would putting it in raid 5 be a better solution as it almost seems that the raid 1 had no redundancy at all?

Also is there a good way to backup the cache? I am running a few vms 100% in the cache so am in bit of a bind as the windows backups may or may not have been working. I currently have the CA AppdataBackup configured to back up my appdata but that doesn't cover my vms

JorgeB · December 22, 2018

4 hours ago, zaraki1311 said:

Is there anything I should possibly try to do to maybe protect myself better?

You should backup frequently, you can either snapshot and use send/receive to another device (this is what I do) or use for example rsync, both can be scripted to run daily, example on how to setup snapshot with send/receive here:

https://forums.unraid.net/topic/51703-vm-faq/?do=findComment&comment=523800

Cache File System Failure

Recommended Posts

zaraki1311

Link to comment

trurl

Link to comment

zaraki1311

Link to comment

JorgeB

Link to comment

zaraki1311

Link to comment

JorgeB

Link to comment

Archived