SMB stops responding during large backup

June 1, 20251 yr

I have a multi disk ZFS portion with shares via SMB. I run daily incremental and monthly full backups. The daily incremental have never failed. I've had a few monthly full backups fail when unraid SMB stops responding and I see at least one core pegged at 100%. The console is responsive yet I've been unable to stop the array as that task never completes. I've tried powerdown -r from the console and that also never completes. This is what I see on the console screen:

Edited June 1, 20251 yr by Morris0

Quote

June 2, 20251 yr

Community Expert

Enable the syslog server and post that after it happens again, together with fresh diagnostics.

Quote

June 2, 20251 yr

Author

Thank you Jorge,

I set up logging to disk yesterday and also found that the ZFS pool needed to be updated which I did. Neither of them should matter yet I had a full backup complete last night and I have another one running now. We will see what happens.

Morris

Quote

June 2, 20251 yr

Author

It happened again. By the way this is on V 1.1.1

You may notice a time discrepancy in the log. This is the result of the server sleeping, I use the Dynamix S3 Sleep plugin. I'm attaching a screenshot of the console and the log.syslog-NAS.log

Also interesting, I see 3 cores at 100% on the dashboard yet a top from the console shows no process reaching 2%

Thank you in advance for looking into this,

Morris

Edited June 2, 20251 yr by Morris0

Quote

June 2, 20251 yr

Community Expert

There's a zfs related crash, suggesting a problem with the pool, or a hardware issue making zfs crash, post the diags mostly to see the hardware used, in cases there are any known issues.

Quote

June 2, 20251 yr

Author

Thank you Jorge,

For my education, what told you it's ZFS? I'm attaching the diagsnas-diagnostics-20250602-1104.zip

Quote

June 2, 20251 yr

Author

Hi Jorge,

I have two pools, and the one I'm not writing the backup to has a corrupted file:

pool: ssdpool

state: ONLINE

status: One or more devices has experienced an error resulting in data

corruption. Applications may be affected.

action: Restore the file in question if possible. Otherwise restore the

entire pool from backup.

see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A

scan: scrub repaired 0B in 00:20:48 with 0 errors on Thu May 1 04:20:49 2025

config:

NAME STATE READ WRITE CKSUM

ssdpool ONLINE 0 0 0

mirror-0 ONLINE 0 0 0

sdb1 ONLINE 0 0 0

sdf1 ONLINE 0 0 0

errors: Permanent errors have been detected in the following files:

/mnt/ssdpool/system/docker/docker.img

Possibly this is the issue. I can delete /mnt/ssdpool/system/docker/docker.img

Is there a way for me to get a replacement?

Can you tell which pool is the cause of the hang?

Thank you,

Morris

Quote

June 2, 20251 yr

Community Expert

41 minutes ago, Morris0 said:
For my education, what told you it's ZFS?

The call trace mentions zfs:

Jun 2 08:12:41 NAS kernel: ddt_enter+0xe/0x20 [zfs]

Jun 2 08:12:41 NAS kernel: zio_ddt_write+0x62/0x420 [zfs]

Jun 2 08:12:41 NAS kernel: ? zio_push_transform+0x34/0x80 [zfs]

Jun 2 08:12:41 NAS kernel: zio_execute+0xba/0xf0 [zfs]

I missed that on the screenshot before, since it's less visible.

28 minutes ago, Morris0 said:
data
corruption

This typically indicates a hardware issue, mot often bad RAM, and it can be the cause of the other issue, start by running memtest.

Quote

June 2, 20251 yr

Author

Thank you Jorge,

I'm going to run the RAM test. That will take some time with 32-GB. When I started the shutdown, I observed that 3 of my HDD did not spin up while the other two did. Interesting that 3 CPU cores are pegged. I have two different SATA controllers in the system. One with 4 ports from the motherboard and the other is an M2 SATA adapter with 6 Ports. When I get the system up, I will look at the relationship of the disks that did not spin up and the controler they are on.

Morris

Quote

June 2, 20251 yr

Author
Solution

Hi Jorge,

I'm confirming it's a memory issue, easy enough to address.

What can I do to replace the damaged file in my system share?

Permanent errors have been detected in the following files:

/mnt/ssdpool/system/docker/docker.img

Thank you,

Morris

Edited June 2, 20251 yr by Morris0

Quote

June 2, 20251 yr

Community Expert

26 minutes ago, Morris0 said:
What can I do to replace the damaged file in my system share?

Since it's the docker image, you can just delete and recreate:

https://docs.unraid.net/unraid-os/manual/docker-management/#re-create-the-docker-image-file

Then:

https://docs.unraid.net/unraid-os/manual/docker-management/#re-installing-docker-applications

Also see below if you have any custom docker networks:

https://docs.unraid.net/unraid-os/manual/docker-management/#docker-custom-networks

Quote

June 2, 20251 yr

Author

Thank you, easy procedure to follow.

Quote

June 3, 20251 yr

Author

the memory issue is corrected

I was able to replace the bad file on the ssdpool and that pools appears healthy. I now have zfs corruption in the hddpool. This is causing the same symptoms I had earlier with one or more CPU cores at 100%. I don't recognize the file being reported as bad yet it resembles one of my share names. I can access one of the sub directories fine, the other results nothing when I traverse the directory from the shares tab. Accessing all but this directory via SMB is fine yet if I access it, SMB access hangs and I can no longer connect to the Unraid server via SMB. I've tried a scrub and observe no disk IO from the main tab and the disk access light is off. I'd like to wipe the zfs pool. From the command line I tried zpool destroy -f hddpool. The command prompt does not come back and again no disk IO is observed. How can I get rid of this pool?

Thank you,

Morris

Quote

June 3, 20251 yr

Community Expert

You can try to import the pool in read-only mode, then backup and you can and recreate zpool import -o readonly=on <pool name>

Quote

June 3, 20251 yr

Author

I have the the data. I'm thinking I need to destroy the pool and recreate it along with shares. I tried the command you suggested and got:

root@NAS:~# zpool import -o readonly=on hddpool

cannot import 'hddpool': a pool with that name already exists

use the form 'zpool import <pool | id> <newpool>' to give it a new name

root@NAS:~#

I can't stop the array that never finishes

Quote

June 3, 20251 yr

Author

How can I destroy this pool as I can not stop the aray?

I've tried

zpool destroy -f hddpool

zfs unmount -f hddpool

both fail to complete

Is there a way for me to start unraid without mounting the pools?

or do I need to wipe the disks external to unraid?

Thank you,

Morris

Quote

June 3, 20251 yr

Community Expert

If the data is already backed up, you can just export the pool with zpool export hddpool, then erase the pool using the GUI, click on the first pool device and "erase pool", then start the array and reformat.

Quote

June 3, 20251 yr

Author

How long should the zpool export take on a 20 TB pool? It's not coming back to the command prompt and I can not stop the aray as the stop array stays at syncing file systems.

Edited June 3, 20251 yr by Morris0

Quote

June 3, 20251 yr

Community Expert

Typically, it's instant, but it could take a few seconds at most, unless there are open files preventing the export, but since the data is recovered you can for a reboot if needed.

Quote

June 3, 20251 yr

Author

Even using zpool export -f hddpool I don't get the command prompt after. I restart the system and the pool is still there and I can't stop the aray. I observe an error on the console and I'm providing that as a screenshot and also the error log.

Uploading Attachment...Uploading Attachment...

Quote

June 3, 20251 yr

Author

nas-syslog-20250603-1538.zip

Quote

June 3, 20251 yr

Author

I decided to come up in safe gui mode and that stopped the aray. I've recreated the pool and restarted. I no longer see the high CPU cores. I'm about to create shares. Wanted to let you know so you don't waste time looking at the log and screen shot. I'm probably good now. Will confirm in a little while

Quote

June 3, 20251 yr

Author

I had not started the aray. As soon as I did this, 2 cores went to 100%. I was able to access a share on the SSD pool. I have not created shares on the HD pool. I'm going to scrub the SSD Pool.

Quote

June 3, 20251 yr

Community Expert

Destroy the pool the way I mentioned above before starting the array, the pool won't mount before that, at least not stock.

Quote

June 3, 20251 yr

Author

Correct, I forgot to erase the pool. I've done that and started the aray and so far things look good. The system is restarting now.

This has been quite a learning experience and while frustrating it's good. I've got lots of experience including in TrueNAS. It's the differences that have been biting me. I would have never thought of a RAM issue, your help there has been critical.

Quote

1

SMB stops responding during large backup

Featured Replies

Solved by Morris0

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)