August 5, 2025Aug 5 Hello everyone, I would like to get some help about a problem I am having for the last few weeks.First of all and before describing the main topic, I had a weird thing happening to my server, its name (host.local) changed to host-2.local. Still reachable with 192.168.1.xxx but its name got a -2 appendix... Sometimes, it would even be reachable with host.local and after a few minutes, it is not available anymore, host-2.local is the one working.I don't know if that would be of any interest for the main topic.So my configuration is a 14 disks array, 2 parity. XFS.2 cache nvme, samsung evo 970, btfrs.I noticed one day that the disk were full, very surprised and trying to use the mover did not emptied the disks.So I opened their log file and saw this kind of output :The main NVMEAug 5 11:49:23 VEDA emhttpd: part nvme1n1p1 2048 2000397885440Aug 5 11:49:23 VEDA emhttpd: device nvme1n1 partition: nvme1n1p1 type: dos start: 2048 size: 1953513560, code: 0x83 (4)Aug 5 11:49:23 VEDA emhttpd: import 30 pool device: (nvme1n1) Samsung_SSD_970_EVO_Plus_2TB_S6P1NS0T218097NAug 5 11:49:24 VEDA emhttpd: read SMART /dev/nvme1n1Aug 5 11:49:31 VEDA emhttpd: /bin/lsblk -lnbo TYPE,NAME,START,SIZE /dev/nvme1n1 2>&1Aug 5 11:49:31 VEDA emhttpd: disk nvme1n1 2000398934016Aug 5 11:49:31 VEDA emhttpd: part nvme1n1p1 2048 2000397885440Aug 5 11:49:31 VEDA emhttpd: device nvme1n1 partition: nvme1n1p1 type: dos start: 2048 size: 1953513560, code: 0x83 (4)Aug 5 11:49:31 VEDA emhttpd: import 30 pool device: (nvme1n1) Samsung_SSD_970_EVO_Plus_2TB_S6P1NS0T218097NAug 5 11:50:29 VEDA emhttpd: shcmd (159): /sbin/wipefs -af --lock /dev/nvme1n1p1Aug 5 11:50:29 VEDA root: /dev/nvme1n1p1: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4dAug 5 11:50:29 VEDA emhttpd: shcmd (160): /sbin/blkdiscard /dev/nvme1n1p1Aug 5 11:50:31 VEDA emhttpd: /sbin/mkfs.btrfs -K -f -d raid1 -m raid1 /dev/nvme1n1p1 /dev/nvme0n1p1Aug 5 11:50:31 VEDA emhttpd: 1 1.82TiB /dev/nvme1n1p1Aug 5 11:50:31 VEDA kernel: BTRFS: device fsid 808ce4b7-939f-4fb4-9803-c88631a31031 devid 1 transid 8 /dev/nvme1n1p1 (259:1) scanned by mkfs.btrfs (17504)Aug 5 11:50:31 VEDA emhttpd: devid 1 size 1.82TiB used 2.01GiB path /dev/nvme1n1p1Aug 5 11:50:31 VEDA kernel: BTRFS info (device nvme1n1p1): first mount of filesystem 808ce4b7-939f-4fb4-9803-c88631a31031Aug 5 11:50:31 VEDA kernel: BTRFS info (device nvme1n1p1): using crc32c (crc32c-intel) checksum algorithmAug 5 11:50:31 VEDA kernel: BTRFS info (device nvme1n1p1): using free-space-treeAug 5 11:50:31 VEDA kernel: BTRFS info (device nvme1n1p1): checking UUID treeAug 5 11:50:31 VEDA kernel: BTRFS info (device nvme1n1p1 state M): turning on async discardAug 5 12:25:46 VEDA kernel: BTRFS warning (device nvme1n1p1): csum failed root 5 ino 264 off 23114084352 csum 0xc9ffeae6 expected csum 0x79285d08 mirror 1Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0Aug 5 12:25:46 VEDA kernel: BTRFS warning (device nvme1n1p1): csum failed root 5 ino 264 off 23116075008 csum 0x42991881 expected csum 0xb673d817 mirror 1Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0Aug 5 12:25:46 VEDA kernel: BTRFS warning (device nvme1n1p1): csum failed root 5 ino 264 off 23116079104 csum 0x2e7e8ea0 expected csum 0x7e21569d mirror 1Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0Aug 5 12:25:46 VEDA kernel: BTRFS warning (device nvme1n1p1): csum failed root 5 ino 264 off 23116075008 csum 0x9910ff6c expected csum 0xb673d817 mirror 2Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0Aug 5 12:25:46 VEDA kernel: BTRFS warning (device nvme1n1p1): csum failed root 5 ino 264 off 23116079104 csum 0x87f407ef expected csum 0x7e21569d mirror 2Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0Aug 5 12:25:46 VEDA kernel: BTRFS warning (device nvme1n1p1): csum failed root 5 ino 264 off 23114084352 csum 0xdfdb5394 expected csum 0x79285d08 mirror 2Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0Aug 5 12:25:46 VEDA kernel: BTRFS warning (device nvme1n1p1): csum failed root 5 ino 264 off 23114084352 csum 0xcb5b98df expected csum 0x79285d08 mirror 1Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0Aug 5 12:25:46 VEDA kernel: BTRFS warning (device nvme1n1p1): csum failed root 5 ino 264 off 23114084352 csum 0xbfbd1aee expected csum 0x79285d08 mirror 2Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0Aug 5 12:25:46 VEDA kernel: BTRFS warning (device nvme1n1p1): csum failed root 5 ino 264 off 23114084352 csum 0x04415b26 expected csum 0x79285d08 mirror 1Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0Aug 5 12:25:46 VEDA kernel: BTRFS warning (device nvme1n1p1): csum failed root 5 ino 264 off 23114084352 csum 0x8c1d12ce expected csum 0x79285d08 mirror 2Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0Appuyer une touche pour fermer cette fenêtreThe second NVME :Aug 5 11:48:46 VEDA kernel: nvme0n1: p1Aug 5 11:49:23 VEDA emhttpd: online: Samsung_SSD_970_EVO_Plus_2TB_S4J4NX0RA34479X (nvme0n1) 512 3907029168Aug 5 11:49:23 VEDA emhttpd: /bin/lsblk -lnbo TYPE,NAME,START,SIZE /dev/nvme0n1 2>&1Aug 5 11:49:23 VEDA emhttpd: disk nvme0n1 2000398934016Aug 5 11:49:23 VEDA emhttpd: part nvme0n1p1 2048 2000397885440Aug 5 11:49:23 VEDA emhttpd: device nvme0n1 partition: nvme0n1p1 type: dos start: 2048 size: 1953513560, code: 0x83 (4)Aug 5 11:49:23 VEDA emhttpd: import 31 pool device: (nvme0n1) Samsung_SSD_970_EVO_Plus_2TB_S4J4NX0RA34479XAug 5 11:49:24 VEDA emhttpd: read SMART /dev/nvme0n1Aug 5 11:49:31 VEDA emhttpd: /bin/lsblk -lnbo TYPE,NAME,START,SIZE /dev/nvme0n1 2>&1Aug 5 11:49:31 VEDA emhttpd: disk nvme0n1 2000398934016Aug 5 11:49:31 VEDA emhttpd: part nvme0n1p1 2048 2000397885440Aug 5 11:49:31 VEDA emhttpd: device nvme0n1 partition: nvme0n1p1 type: dos start: 2048 size: 1953513560, code: 0x83 (4)Aug 5 11:49:31 VEDA emhttpd: import 31 pool device: (nvme0n1) Samsung_SSD_970_EVO_Plus_2TB_S4J4NX0RA34479XAug 5 11:50:30 VEDA emhttpd: shcmd (161): /sbin/wipefs -af --lock /dev/nvme0n1p1Aug 5 11:50:30 VEDA root: /dev/nvme0n1p1: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4dAug 5 11:50:30 VEDA emhttpd: shcmd (162): /sbin/blkdiscard /dev/nvme0n1p1Aug 5 11:50:31 VEDA emhttpd: /sbin/mkfs.btrfs -K -f -d raid1 -m raid1 /dev/nvme1n1p1 /dev/nvme0n1p1Aug 5 11:50:31 VEDA emhttpd: 2 1.82TiB /dev/nvme0n1p1Aug 5 11:50:31 VEDA kernel: BTRFS: device fsid 808ce4b7-939f-4fb4-9803-c88631a31031 devid 2 transid 8 /dev/nvme0n1p1 (259:3) scanned by mkfs.btrfs (17504)Aug 5 11:50:31 VEDA emhttpd: devid 2 size 1.82TiB used 2.01GiB path /dev/nvme0n1p1Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0Aug 5 12:25:46 VEDA kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0Appuyer une touche pour fermer cette fenêtreSo these outputs are from today because I finally decided to format the two NVMEs and put them back in the pool.I installed Jdownloader again, it got the file from the web but again, corruption, everywhere as you can see.I am puzzled how 2 NVMEs could go bad at the same time.So if you can help me, let me know what info you need, I am not an expert but I can follow instructions. I am running a parity check right now, the last one had lots of errors, 900 or something so I got worried, thgat is why I am creating this ticket.
August 5, 2025Aug 5 Author First SMART extended test : smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.24-Unraid] (local build)Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org=== START OF INFORMATION SECTION ===Model Number: Samsung SSD 970 EVO Plus 2TBSerial Number: S6P1NS0T218097NFirmware Version: 4B2QEXM7PCI Vendor/Subsystem ID: 0x144dIEEE OUI Identifier: 0x002538Total NVM Capacity: 2,000,398,934,016 [2.00 TB]Unallocated NVM Capacity: 0Controller ID: 6NVMe Version: 1.3Number of Namespaces: 1Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB]Namespace 1 Utilization: 59,547,701,248 [59.5 GB]Namespace 1 Formatted LBA Size: 512Namespace 1 IEEE EUI-64: 002538 5221415defLocal Time is: Tue Aug 5 15:18:55 2025 CESTFirmware Updates (0x16): 3 Slots, no Reset requiredOptional Admin Commands (0x0017): Security Format Frmw_DL Self_TestOptional NVM Commands (0x0057): Comp Wr_Unc DS_Mngmt Sav/Sel_Feat TimestmpLog Page Attributes (0x0f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_LgMaximum Data Transfer Size: 128 PagesWarning Comp. Temp. Threshold: 82 CelsiusCritical Comp. Temp. Threshold: 85 CelsiusSupported Power StatesSt Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 7.59W - - 0 0 0 0 0 0 1 + 7.59W - - 1 1 1 1 0 200 2 + 7.59W - - 2 2 2 2 0 1000 3 - 0.0500W - - 3 3 3 3 2000 1200 4 - 0.0050W - - 4 4 4 4 500 9500Supported LBA Sizes (NSID 0x1)Id Fmt Data Metadt Rel_Perf 0 + 512 0 0=== START OF SMART DATA SECTION ===SMART overall-health self-assessment test result: PASSEDSMART/Health Information (NVMe Log 0x02)Critical Warning: 0x00Temperature: 45 CelsiusAvailable Spare: 100%Available Spare Threshold: 10%Percentage Used: 2%Data Units Read: 105,082,200 [53.8 TB]Data Units Written: 166,882,781 [85.4 TB]Host Read Commands: 167,266,757Host Write Commands: 306,469,559Controller Busy Time: 4,656Power Cycles: 1,413Power On Hours: 1,364Unsafe Shutdowns: 68Media and Data Integrity Errors: 0Error Information Log Entries: 0Warning Comp. Temperature Time: 0Critical Comp. Temperature Time: 0Temperature Sensor 1: 45 CelsiusTemperature Sensor 2: 51 CelsiusError Information (NVMe Log 0x01, 16 of 64 entries)No Errors LoggedSelf-test Log (NVMe Log 0x06)Self-test status: No self-test in progressNum Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code 0 Extended Completed without error 1364 - - - - - 1 Extended Completed without error 1347 - - - - - 2 Short Completed without error 1329 - - - - -
August 5, 2025Aug 5 Author Second NVME extended smart testsmartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.24-Unraid] (local build)Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org=== START OF INFORMATION SECTION ===Model Number: Samsung SSD 970 EVO Plus 2TBSerial Number: S4J4NX0RA34479XFirmware Version: 2B2QEXM7PCI Vendor/Subsystem ID: 0x144dIEEE OUI Identifier: 0x002538Total NVM Capacity: 2,000,398,934,016 [2.00 TB]Unallocated NVM Capacity: 0Controller ID: 4NVMe Version: 1.3Number of Namespaces: 1Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB]Namespace 1 Utilization: 59,546,669,056 [59.5 GB]Namespace 1 Formatted LBA Size: 512Namespace 1 IEEE EUI-64: 002538 5a11b29dbdLocal Time is: Tue Aug 5 15:55:03 2025 CESTFirmware Updates (0x16): 3 Slots, no Reset requiredOptional Admin Commands (0x0017): Security Format Frmw_DL Self_TestOptional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat TimestmpLog Page Attributes (0x03): S/H_per_NS Cmd_Eff_LgMaximum Data Transfer Size: 512 PagesWarning Comp. Temp. Threshold: 85 CelsiusCritical Comp. Temp. Threshold: 85 CelsiusSupported Power StatesSt Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 7.50W - - 0 0 0 0 0 0 1 + 5.90W - - 1 1 1 1 0 0 2 + 3.60W - - 2 2 2 2 0 0 3 - 0.0700W - - 3 3 3 3 210 1200 4 - 0.0050W - - 4 4 4 4 2000 8000Supported LBA Sizes (NSID 0x1)Id Fmt Data Metadt Rel_Perf 0 + 512 0 0=== START OF SMART DATA SECTION ===SMART overall-health self-assessment test result: PASSEDSMART/Health Information (NVMe Log 0x02)Critical Warning: 0x00Temperature: 55 CelsiusAvailable Spare: 100%Available Spare Threshold: 10%Percentage Used: 3%Data Units Read: 151,303,560 [77.4 TB]Data Units Written: 229,018,503 [117 TB]Host Read Commands: 507,030,412Host Write Commands: 844,078,201Controller Busy Time: 3,307Power Cycles: 1,573Power On Hours: 3,347Unsafe Shutdowns: 106Media and Data Integrity Errors: 0Error Information Log Entries: 9,747Warning Comp. Temperature Time: 0Critical Comp. Temperature Time: 0Temperature Sensor 1: 55 CelsiusTemperature Sensor 2: 59 CelsiusError Information (NVMe Log 0x01, 16 of 64 entries)Num ErrCount SQId CmdId Status PELoc LBA NSID VS Message 0 9747 0 0x4004 0x4004 - 0 0 - Invalid Field in CommandSelf-test Log (NVMe Log 0x06)Self-test status: No self-test in progressNum Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code 0 Extended Completed without error 3347 - - - - - 1 Short Completed without error 3290 - - - - -
August 5, 2025Aug 5 Author From The 2 smart tests, can I conclude that the NVME Serial Number: S4J4NX0RA34479X needs to be replaced ?
August 5, 2025Aug 5 Community Expert Btrfs detecting data corruption is not typically a device problem, but please post the diagnostics.
August 5, 2025Aug 5 Community Expert Solution I would recommend running memtest first, if nothing is found scrub the pool and post the results from the GUI. The balance failed due to data corruption found, not a device issue.
August 6, 2025Aug 6 Author It looks like the memory is failing a lot.With this information, next step is to change the memory sticks? Or is there other tests to run?
August 6, 2025Aug 6 Author Also, do you have any ideas about the address name change?From veda.local to veda-2.local?Seems begnine but how does it happen?Due to the network config? Using a second network interface (10g)?
August 6, 2025Aug 6 Community Expert 1 hour ago, Daaadou said:next step is to change the memory sticks?Yep, you can try to test one at a time to find the bad DIMM, typically there's only one.
August 6, 2025Aug 6 Author 45 minutes ago, JorgeB said:Yep, you can try to test one at a time to find the bad DIMM, typically there's only one.Thank you Jorge, you've been very helpful.Is tehre a way to find the corrupted file on my array disks ? Some shares are set to copy directly to the array, not going through the cache, how can I assess these files ?
August 6, 2025Aug 6 Community Expert 55 minutes ago, Daaadou said:Is tehre a way to find the corrupted file on my array disks ?XFS doesn't have checksums for data, so only if you had preexisting external checksums.
August 7, 2025Aug 7 Author Hello, It is solved, I had one bad RAM stick as you thought. I needed to plug a monitor in to troubleshoot properly but I have been able to identify the culprit thanks to the built in memtest plugin, very useful. I have a question, how can I have access to all the warning etc? The corruption started when I had the first message of the cache getting full because the mover was not able top empty the corrupted files. I would like to find these messages and I will suppress all files copied on the NAS since then. How can I find that info ?
August 7, 2025Aug 7 Community Expert You can use the script below to monitor pools, any errors you get a notification:https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/#findComment-700582
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.