February 2Feb 2 I've had a rash of file system access issues recently where files become inaccessible, usually resolved by a restart or messing around. I have a pair of Samsung SSD 970 EVO Plus 1TBs mirrored as my cache pool in ZFS (recently switched from BTRFS to experiment with). Other than installing ZFS Master, I have not configured automatic snapshots or any other additional setup for ZFS specifically.Diagnostics and smart reports attached. I presume it's a hardware issue, but I'm unsure on exact next steps to troubleshoot/triage. Perhaps remove one of the cache drives.Errors in ZFS poolroot@UNRAID:/mnt/user/Scratch# zpool status -v pool: cache state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:09:21 with 16 errors on Mon Feb 2 11:03:53 2026 config: NAME STATE READ WRITE CKSUM cache ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 nvme1n1p1 ONLINE 0 0 27.1K nvme0n1p1 ONLINE 0 0 27.1K errors: Permanent errors have been detected in the following files: /mnt/cache/Scratch/vm/isos/haos_ova-13.2.qcow2 /mnt/cache/Scratch/docker/duplicacy/bin/duplicacy_linux_x64_3.2.5 /mnt/cache/Scratch/docker/containers/overlay2/89b8086d04d95fd0f21f9937170ae437b3287cd9bdaef19a60ba7728d7d339ed/diff/app/radarr/bin/Radarr.Core.pdb /mnt/cache/Scratch/docker/containers/overlay2/59fc7b9bf7f256b860a21f55c080c242ede729a1426482d8e250b2ed86678ee4/diff/app/sonarr/bin/Sonarr.Core.pdb /mnt/cache/Scratch/docker/containers/overlay2/8e7e76832d6dc8145f04dcbaf143fb7fa07bbf092a861d7f2e7d2cb1d95daab1/diff/lib/ld-musl-x86_64.so.1 /mnt/cache/Scratch/docker/containers/overlay2/1e6272eee2a99e4e6e1c44367080fad9cdbc4e85e15a2b03d274f222e2cef33a/diff/ffmpeg/lib/libavformat.so.59 /mnt/cache/Scratch/docker/containers/overlay2/567a58c5ce269d7b47c3a2fb6feeefa5439f5967713a6d7e2b70ff7ecbb60f24/diff/usr/local/bin/duplicacy_web /mnt/cache/Scratch/docker/containers/overlay2/1d414267a5606678ba2e5e1b07b9f90758de45e4af5cf35414add951a77a8658/diff/usr/local/bin/gosu /mnt/cache/Scratch/docker/profilarr/db/.git/objects/pack/pack-18d9e5ff64883fdd7223068fa2c8356d0866cd90.pack /mnt/cache/Scratch/docker/containers/overlay2/aaa312ab84c321ccb41621197c069613bfc182270621dd1864e5f053ae6a4b04/diff/app/prowlarr/bin/System.Private.Xml.dll /mnt/cache/Scratch/docker/containers/overlay2/89b8086d04d95fd0f21f9937170ae437b3287cd9bdaef19a60ba7728d7d339ed/diff/app/radarr/bin/System.Private.CoreLib.dllCache 1 Smartsmartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.54-Unraid] (local build) Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: Samsung SSD 970 EVO Plus 1TB Serial Number: S59ANM0R546074E Firmware Version: 2B2QEXM7 PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 1,000,204,886,016 [1.00 TB] Unallocated NVM Capacity: 0 Controller ID: 4 NVMe Version: 1.3 Number of Namespaces: 1 Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB] Namespace 1 Utilization: 544,146,423,808 [544 GB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 002538 551191b229 Local Time is: Mon Feb 2 13:14:22 2026 EST Firmware Updates (0x16): 3 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 85 Celsius Critical Comp. Temp. Threshold: 85 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 7.80W - - 0 0 0 0 0 0 1 + 6.00W - - 1 1 1 1 0 0 2 + 3.40W - - 2 2 2 2 0 0 3 - 0.0700W - - 3 3 3 3 210 1200 4 - 0.0100W - - 4 4 4 4 2000 8000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff) Critical Warning: 0x00 Temperature: 46 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 22% Data Units Read: 298,745,209 [152 TB] Data Units Written: 843,580,978 [431 TB] Host Read Commands: 1,829,510,180 Host Write Commands: 6,662,348,226 Controller Busy Time: 45,227 Power Cycles: 53 Power On Hours: 26,285 Unsafe Shutdowns: 16 Media and Data Integrity Errors: 0 Error Information Log Entries: 159 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 46 Celsius Temperature Sensor 2: 60 Celsius Error Information (NVMe Log 0x01, 16 of 64 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS Message 0 159 0 0x0018 0x4004 - 0 0 - Invalid Field in Command Self-test Log (NVMe Log 0x06, NSID 0xffffffff) Self-test status: No self-test in progress Num Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code 0 Extended Completed without error 26285 - - - - - Cache 2 Smartsmartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.54-Unraid] (local build) Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: Samsung SSD 970 EVO Plus 1TB Serial Number: S4P4NF0M600551L Firmware Version: 2B2QEXM7 PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 1,000,204,886,016 [1.00 TB] Unallocated NVM Capacity: 0 Controller ID: 4 NVMe Version: 1.3 Number of Namespaces: 1 Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB] Namespace 1 Utilization: 544,159,617,024 [544 GB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 002538 5691b3268d Local Time is: Mon Feb 2 13:14:33 2026 EST Firmware Updates (0x16): 3 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 85 Celsius Critical Comp. Temp. Threshold: 85 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 7.80W - - 0 0 0 0 0 0 1 + 6.00W - - 1 1 1 1 0 0 2 + 3.40W - - 2 2 2 2 0 0 3 - 0.0700W - - 3 3 3 3 210 1200 4 - 0.0100W - - 4 4 4 4 2000 8000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff) Critical Warning: 0x00 Temperature: 38 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 22% Data Units Read: 332,652,519 [170 TB] Data Units Written: 837,685,747 [428 TB] Host Read Commands: 1,982,650,828 Host Write Commands: 6,638,082,258 Controller Busy Time: 42,500 Power Cycles: 49 Power On Hours: 25,973 Unsafe Shutdowns: 16 Media and Data Integrity Errors: 0 Error Information Log Entries: 149 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 38 Celsius Temperature Sensor 2: 41 Celsius Error Information (NVMe Log 0x01, 16 of 64 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS Message 0 149 0 0x0008 0x4004 - 0 0 - Invalid Field in Command Self-test Log (NVMe Log 0x06, NSID 0xffffffff) Self-test status: No self-test in progress Num Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code 0 Extended Completed without error 25972 - - - - - unraid-diagnostics-20260202-1107.zip unraid-smart-20260202-1314-cache2.zip unraid-smart-20260202-1313-cache1.zip Edited February 2Feb 2 by SirKelsALot
February 3Feb 3 Community Expert Solution Checksum errors are often the result of bad RAM, start by running Memtest.
February 4Feb 4 Author Thanks for the tip. It seems 3 out of 4 of my 16GB sticks are failing. Good thing RAM is so cheap now....
February 4Feb 4 Community Expert You must never attempt to run any computer unless RAM is working perfectly. Everything goes through RAM. The OS and other executable code, your DATA. EVERYTHING! The CPU can't do anything with anything until it is loaded into RAM.
February 4Feb 4 Author Well certainly. This server is about 8 years old and just recently began having issues that coincided with switching my cache pool to ZFS. So only recently having stability issues was a surprise to find 3/4 sticks with errors (1 of them is particularly bad, but that one is less than a year old and starting an RMA process for the set).
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.