Reviving this thread since it happened again, now with a hard drive, a Seagate ST1000LM035, disk had already some reallocated sectors when I began using it for this function, but it passed an extended SMART test and worked fine without any issues for a few weeks, this morning had several emails from last night, first with new reallocated sectors:
18-04-2021 01:21 Unraid Wbackups disk SMART health [5] Warning [TOWER1] - reallocated sector ct is 984 ST1000LM035-1RK172_ZDE5AFTA (sdd)
18-04-2021 01:18 Unraid Wbackups disk SMART health [5] Warning [TOWER1] - reallocated sector ct is 968 ST1000LM035-1RK172_ZDE5AFTA (sdd)
18-04-2021 00:17 Unraid Wbackups disk SMART health [5] Warning [TOWER1] - reallocated sector ct is 960 ST1000LM035-1RK172_ZDE5AFTA (sdd)
Then and because I have a script monitoring all btrfs pools for errors, got an email about that:
18-04-2021 01:47 Unraid Status ERRORS on wbackups pool
In this case the errors detected were corruption errors:
root@Tower1:~# btrfs dev stats /mnt/wbackups/
[/dev/sdd1].write_io_errs 0
[/dev/sdd1].read_io_errs 0
[/dev/sdd1].flush_io_errs 0
[/dev/sdd1].corruption_errs 20
[/dev/sdd1].generation_errs 0
A scrub this morning confirmed the data corruption:
root@Tower1:~# btrfs scrub status /mnt/wbackups/
UUID: 9c12f50a-ad56-4a61-934a-4b1ee064cae9
Scrub started: Sun Apr 18 12:26:01 2021
Status: finished
Duration: 1:21:57
Total to scrub: 545.20GiB
Rate: 113.59MiB/s
Error summary: csum=650
Corrected: 0
Uncorrectable: 650
Unverified: 0
Note that this is a single disk btrfs device, so no redundancy to fix data corruption (this is only used as another backup destination), looking at the syslog it identifies the corrupt file, one of several zip files from a remote backup that are synced to that disk:
Apr 18 12:30:47 Tower1 kernel: BTRFS warning (device sdd1): checksum error at logical 271712112640 on dev /dev/sdd1, physical 38710136832, root 5, inode 5879, offset 162140160, length 4096, links 1 (path: SageBackups/.stversions/503951269202104071502.zip)
So yeah, while this should never happen, i.e., devices shouldn't reallocate sectors without corrupting data, here it is once again proof that it can happen.