June 1, 20251 yr I have a multi disk ZFS portion with shares via SMB. I run daily incremental and monthly full backups. The daily incremental have never failed. I've had a few monthly full backups fail when unraid SMB stops responding and I see at least one core pegged at 100%. The console is responsive yet I've been unable to stop the array as that task never completes. I've tried powerdown -r from the console and that also never completes. This is what I see on the console screen: Edited June 1, 20251 yr by Morris0
June 2, 20251 yr Community Expert Enable the syslog server and post that after it happens again, together with fresh diagnostics.
June 2, 20251 yr Author Thank you Jorge,I set up logging to disk yesterday and also found that the ZFS pool needed to be updated which I did. Neither of them should matter yet I had a full backup complete last night and I have another one running now. We will see what happens.Morris
June 2, 20251 yr Author It happened again. By the way this is on V 1.1.1You may notice a time discrepancy in the log. This is the result of the server sleeping, I use the Dynamix S3 Sleep plugin. I'm attaching a screenshot of the console and the log.syslog-NAS.logAlso interesting, I see 3 cores at 100% on the dashboard yet a top from the console shows no process reaching 2%Thank you in advance for looking into this,Morris Edited June 2, 20251 yr by Morris0
June 2, 20251 yr Community Expert There's a zfs related crash, suggesting a problem with the pool, or a hardware issue making zfs crash, post the diags mostly to see the hardware used, in cases there are any known issues.
June 2, 20251 yr Author Thank you Jorge,For my education, what told you it's ZFS? I'm attaching the diagsnas-diagnostics-20250602-1104.zip
June 2, 20251 yr Author Hi Jorge,I have two pools, and the one I'm not writing the backup to has a corrupted file:pool: ssdpool state: ONLINEstatus: One or more devices has experienced an error resulting in data corruption. Applications may be affected.action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:20:48 with 0 errors on Thu May 1 04:20:49 2025config: NAME STATE READ WRITE CKSUM ssdpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sdb1 ONLINE 0 0 0 sdf1 ONLINE 0 0 0errors: Permanent errors have been detected in the following files: /mnt/ssdpool/system/docker/docker.imgPossibly this is the issue. I can delete /mnt/ssdpool/system/docker/docker.imgIs there a way for me to get a replacement?Can you tell which pool is the cause of the hang?Thank you,Morris
June 2, 20251 yr Community Expert 41 minutes ago, Morris0 said:For my education, what told you it's ZFS?The call trace mentions zfs:Jun 2 08:12:41 NAS kernel: ddt_enter+0xe/0x20 [zfs]Jun 2 08:12:41 NAS kernel: zio_ddt_write+0x62/0x420 [zfs]Jun 2 08:12:41 NAS kernel: ? zio_push_transform+0x34/0x80 [zfs]Jun 2 08:12:41 NAS kernel: zio_execute+0xba/0xf0 [zfs]I missed that on the screenshot before, since it's less visible.28 minutes ago, Morris0 said:datacorruptionThis typically indicates a hardware issue, mot often bad RAM, and it can be the cause of the other issue, start by running memtest.
June 2, 20251 yr Author Thank you Jorge,I'm going to run the RAM test. That will take some time with 32-GB. When I started the shutdown, I observed that 3 of my HDD did not spin up while the other two did. Interesting that 3 CPU cores are pegged. I have two different SATA controllers in the system. One with 4 ports from the motherboard and the other is an M2 SATA adapter with 6 Ports. When I get the system up, I will look at the relationship of the disks that did not spin up and the controler they are on.Morris
June 2, 20251 yr Author Solution Hi Jorge,I'm confirming it's a memory issue, easy enough to address.What can I do to replace the damaged file in my system share?Permanent errors have been detected in the following files:/mnt/ssdpool/system/docker/docker.imgThank you,Morris Edited June 2, 20251 yr by Morris0
June 2, 20251 yr Community Expert 26 minutes ago, Morris0 said:What can I do to replace the damaged file in my system share?Since it's the docker image, you can just delete and recreate:https://docs.unraid.net/unraid-os/manual/docker-management/#re-create-the-docker-image-fileThen:https://docs.unraid.net/unraid-os/manual/docker-management/#re-installing-docker-applicationsAlso see below if you have any custom docker networks:https://docs.unraid.net/unraid-os/manual/docker-management/#docker-custom-networks
June 3, 20251 yr Author the memory issue is correctedI was able to replace the bad file on the ssdpool and that pools appears healthy. I now have zfs corruption in the hddpool. This is causing the same symptoms I had earlier with one or more CPU cores at 100%. I don't recognize the file being reported as bad yet it resembles one of my share names. I can access one of the sub directories fine, the other results nothing when I traverse the directory from the shares tab. Accessing all but this directory via SMB is fine yet if I access it, SMB access hangs and I can no longer connect to the Unraid server via SMB. I've tried a scrub and observe no disk IO from the main tab and the disk access light is off. I'd like to wipe the zfs pool. From the command line I tried zpool destroy -f hddpool. The command prompt does not come back and again no disk IO is observed. How can I get rid of this pool?Thank you,Morris
June 3, 20251 yr Community Expert You can try to import the pool in read-only mode, then backup and you can and recreate zpool import -o readonly=on <pool name>
June 3, 20251 yr Author I have the the data. I'm thinking I need to destroy the pool and recreate it along with shares. I tried the command you suggested and got:root@NAS:~# zpool import -o readonly=on hddpoolcannot import 'hddpool': a pool with that name already existsuse the form 'zpool import <pool | id> <newpool>' to give it a new nameroot@NAS:~# I can't stop the array that never finishes
June 3, 20251 yr Author How can I destroy this pool as I can not stop the aray? I've tried zpool destroy -f hddpoolzfs unmount -f hddpoolboth fail to completeIs there a way for me to start unraid without mounting the pools?or do I need to wipe the disks external to unraid?Thank you,Morris
June 3, 20251 yr Community Expert If the data is already backed up, you can just export the pool with zpool export hddpool, then erase the pool using the GUI, click on the first pool device and "erase pool", then start the array and reformat.
June 3, 20251 yr Author How long should the zpool export take on a 20 TB pool? It's not coming back to the command prompt and I can not stop the aray as the stop array stays at syncing file systems. Edited June 3, 20251 yr by Morris0
June 3, 20251 yr Community Expert Typically, it's instant, but it could take a few seconds at most, unless there are open files preventing the export, but since the data is recovered you can for a reboot if needed.
June 3, 20251 yr Author Even using zpool export -f hddpool I don't get the command prompt after. I restart the system and the pool is still there and I can't stop the aray. I observe an error on the console and I'm providing that as a screenshot and also the error log.Uploading Attachment...Uploading Attachment...
June 3, 20251 yr Author I decided to come up in safe gui mode and that stopped the aray. I've recreated the pool and restarted. I no longer see the high CPU cores. I'm about to create shares. Wanted to let you know so you don't waste time looking at the log and screen shot. I'm probably good now. Will confirm in a little while
June 3, 20251 yr Author I had not started the aray. As soon as I did this, 2 cores went to 100%. I was able to access a share on the SSD pool. I have not created shares on the HD pool. I'm going to scrub the SSD Pool.
June 3, 20251 yr Community Expert Destroy the pool the way I mentioned above before starting the array, the pool won't mount before that, at least not stock.
June 3, 20251 yr Author Correct, I forgot to erase the pool. I've done that and started the aray and so far things look good. The system is restarting now. This has been quite a learning experience and while frustrating it's good. I've got lots of experience including in TrueNAS. It's the differences that have been biting me. I would have never thought of a RAM issue, your help there has been critical.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.