File system I/O errors on cache after switch to ZFS - Suggested Next Steps? - General Support

February 2Feb 2

I've had a rash of file system access issues recently where files become inaccessible, usually resolved by a restart or messing around. I have a pair of Samsung SSD 970 EVO Plus 1TBs mirrored as my cache pool in ZFS (recently switched from BTRFS to experiment with). Other than installing ZFS Master, I have not configured automatic snapshots or any other additional setup for ZFS specifically.

Diagnostics and smart reports attached. I presume it's a hardware issue, but I'm unsure on exact next steps to troubleshoot/triage. Perhaps remove one of the cache drives.

Errors in ZFS pool

root@UNRAID:/mnt/user/Scratch# zpool status -v
  pool: cache
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:09:21 with 16 errors on Mon Feb  2 11:03:53 2026
config:

        NAME           STATE     READ WRITE CKSUM
        cache          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            nvme1n1p1  ONLINE       0     0 27.1K
            nvme0n1p1  ONLINE       0     0 27.1K

errors: Permanent errors have been detected in the following files:

        /mnt/cache/Scratch/vm/isos/haos_ova-13.2.qcow2
        /mnt/cache/Scratch/docker/duplicacy/bin/duplicacy_linux_x64_3.2.5
        /mnt/cache/Scratch/docker/containers/overlay2/89b8086d04d95fd0f21f9937170ae437b3287cd9bdaef19a60ba7728d7d339ed/diff/app/radarr/bin/Radarr.Core.pdb
        /mnt/cache/Scratch/docker/containers/overlay2/59fc7b9bf7f256b860a21f55c080c242ede729a1426482d8e250b2ed86678ee4/diff/app/sonarr/bin/Sonarr.Core.pdb
        /mnt/cache/Scratch/docker/containers/overlay2/8e7e76832d6dc8145f04dcbaf143fb7fa07bbf092a861d7f2e7d2cb1d95daab1/diff/lib/ld-musl-x86_64.so.1
        /mnt/cache/Scratch/docker/containers/overlay2/1e6272eee2a99e4e6e1c44367080fad9cdbc4e85e15a2b03d274f222e2cef33a/diff/ffmpeg/lib/libavformat.so.59
        /mnt/cache/Scratch/docker/containers/overlay2/567a58c5ce269d7b47c3a2fb6feeefa5439f5967713a6d7e2b70ff7ecbb60f24/diff/usr/local/bin/duplicacy_web
        /mnt/cache/Scratch/docker/containers/overlay2/1d414267a5606678ba2e5e1b07b9f90758de45e4af5cf35414add951a77a8658/diff/usr/local/bin/gosu
        /mnt/cache/Scratch/docker/profilarr/db/.git/objects/pack/pack-18d9e5ff64883fdd7223068fa2c8356d0866cd90.pack
        /mnt/cache/Scratch/docker/containers/overlay2/aaa312ab84c321ccb41621197c069613bfc182270621dd1864e5f053ae6a4b04/diff/app/prowlarr/bin/System.Private.Xml.dll
        /mnt/cache/Scratch/docker/containers/overlay2/89b8086d04d95fd0f21f9937170ae437b3287cd9bdaef19a60ba7728d7d339ed/diff/app/radarr/bin/System.Private.CoreLib.dll

Cache 1 Smart

smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.54-Unraid] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 1TB
Serial Number:                      S59ANM0R546074E
Firmware Version:                   2B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            544,146,423,808 [544 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 551191b229
Local Time is:                      Mon Feb  2 13:14:22 2026 EST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.80W       -        -    0  0  0  0        0       0
 1 +     6.00W       -        -    1  1  1  1        0       0
 2 +     3.40W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3      210    1200
 4 -   0.0100W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        46 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    22%
Data Units Read:                    298,745,209 [152 TB]
Data Units Written:                 843,580,978 [431 TB]
Host Read Commands:                 1,829,510,180
Host Write Commands:                6,662,348,226
Controller Busy Time:               45,227
Power Cycles:                       53
Power On Hours:                     26,285
Unsafe Shutdowns:                   16
Media and Data Integrity Errors:    0
Error Information Log Entries:      159
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               46 Celsius
Temperature Sensor 2:               60 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS  Message
  0        159     0  0x0018  0x4004      -            0     0     -  Invalid Field in Command

Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
 0   Extended          Completed without error               26285            -     -   -   -    -

Cache 2 Smart

smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.54-Unraid] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 1TB
Serial Number:                      S4P4NF0M600551L
Firmware Version:                   2B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            544,159,617,024 [544 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5691b3268d
Local Time is:                      Mon Feb  2 13:14:33 2026 EST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.80W       -        -    0  0  0  0        0       0
 1 +     6.00W       -        -    1  1  1  1        0       0
 2 +     3.40W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3      210    1200
 4 -   0.0100W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        38 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    22%
Data Units Read:                    332,652,519 [170 TB]
Data Units Written:                 837,685,747 [428 TB]
Host Read Commands:                 1,982,650,828
Host Write Commands:                6,638,082,258
Controller Busy Time:               42,500
Power Cycles:                       49
Power On Hours:                     25,973
Unsafe Shutdowns:                   16
Media and Data Integrity Errors:    0
Error Information Log Entries:      149
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               38 Celsius
Temperature Sensor 2:               41 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS  Message
  0        149     0  0x0008  0x4004      -            0     0     -  Invalid Field in Command

Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
 0   Extended          Completed without error               25972            -     -   -   -    -

unraid-diagnostics-20260202-1107.zip unraid-smart-20260202-1314-cache2.zip unraid-smart-20260202-1313-cache1.zip

Edited February 2Feb 2 by SirKelsALot

Quote

February 3Feb 3

Community Expert
Solution

Checksum errors are often the result of bad RAM, start by running Memtest.

Quote

February 4Feb 4

Author

Thanks for the tip. It seems 3 out of 4 of my 16GB sticks are failing. Good thing RAM is so cheap now....

Quote

February 4Feb 4

Community Expert

You must never attempt to run any computer unless RAM is working perfectly. Everything goes through RAM. The OS and other executable code, your DATA. EVERYTHING! The CPU can't do anything with anything until it is loaded into RAM.

Quote

February 4Feb 4

Author

Well certainly. This server is about 8 years old and just recently began having issues that coincided with switching my cache pool to ZFS. So only recently having stability issues was a surprise to find 3/4 sticks with errors (1 of them is particularly bad, but that one is less than a year old and starting an RMA process for the set).

Quote

File system I/O errors on cache after switch to ZFS - Suggested Next Steps?

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)