d3m3zs Posted May 25 Share Posted May 25 Hi! Today I faced very weird issues: pool: backup_zfs state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: scrub repaired 0B in 00:00:01 with 0 errors on Sat May 25 12:13:28 2024 config: NAME STATE READ WRITE CKSUM backup_zfs ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 /dev/sdd1 ONLINE 0 0 0 /dev/sdb1 ONLINE 0 490 0 errors: No known data errors Another disk in array: pool: disk2 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub in progress since Sat May 25 12:03:21 2024 1.65T scanned at 1.28G/s, 255G issued at 197M/s, 5.63T total 0B repaired, 4.41% done, 07:57:55 to go config: NAME STATE READ WRITE CKSUM disk2 ONLINE 0 0 0 /dev/md2p1 ONLINE 0 0 0 errors: 5 data errors, use '-v' for a listhttps://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A Despite verifying the health of my disks and recreating the pool, I continue to encounter errors. Surprisingly, Unraid notifications remained silent, and I stumbled upon the issue while inspecting the pool status. I even ruled out data cable problems related to motherboard connectors or the LSI controller. Considering the persistent ZFS bug, I’m contemplating creating a Btrfs pool as an alternative. What should I do? Quote Link to comment
bmartino1 Posted May 25 Share Posted May 25 unraid verison... Diagnostic file please. Picture of main tab to show layout of disks. did you install all the zfs Community app plugins? it looks like you make a single disk pool with 1 disk. and uses a partition instead of the disk itself... Review: https://docs.oracle.com/cd/E19253-01/819-5461/gamml/index.html Quote Link to comment
d3m3zs Posted May 26 Author Share Posted May 26 21 hours ago, bmartino1 said: it looks like you make a single disk pool with 1 disk. and uses a partition instead of the disk itself... I executed yesterday ZFS scrub command on disk2 and after 14 hours see no issues: pool: disk2 state: ONLINE scan: scrub repaired 0B in 14:56:12 with 0 errors on Sun May 26 02:59:33 2024 config: NAME STATE READ WRITE CKSUM disk2 ONLINE 0 0 0 /dev/md2p1 ONLINE 0 0 0 errors: No known data errors Today I got errors on disk1 during parity sync (it is continue running) and according to screenshots - size is 0 and I can not open any files on this disk: pool: disk1 state: SUSPENDED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC scan: scrub repaired 0B in 13:30:27 with 0 errors on Wed May 1 12:30:38 2024 config: NAME STATE READ WRITE CKSUM disk1 UNAVAIL 0 0 0 insufficient replicas /dev/md1p1 FAULTED 3 0 0 too many errors errors: 16 data errors, use '-v' for a list ZFS mirror "backup-zfs" still have the same status: pool: backup_zfs state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J scan: scrub repaired 0B in 00:00:01 with 0 errors on Sat May 25 12:13:28 2024 config: NAME STATE READ WRITE CKSUM backup_zfs DEGRADED 0 0 0 mirror-0 DEGRADED 19 6.63K 0 /dev/sdd1 ONLINE 21 7.53K 0 /dev/sdb1 UNAVAIL 21 21.7K 0 errors: No known data errors A few months as I switched to ZFS and only yesterday it started happen and also what I don`t like that Unraid notified me only about these errors on disk1 during parity sync, nothing about ZFS DEGRADED or SUSPENDED statuses. I was trying to download diagnostic file, but it failed in chrome tab Will try to get it from boot/logs if it is created. Quote Link to comment
d3m3zs Posted May 26 Author Share Posted May 26 11 minutes ago, bmartino1 said: diagnostic diagnosticS thanks! blackbox-diagnostics-20240526-2353.7z Quote Link to comment
bmartino1 Posted May 26 Share Posted May 26 Thank you. So the mod may be more assistance here. It looks like you used zfs as the array file format per the main pciture... I've had terrible experience with unraid doing that. Thee aray and how it works done't act nor work right with unraid array setups. See docs: https://docs.unraid.net/unraid-os/manual/storage-management/ if you're doing ZFS it should be pool devices only. At least that is the recomend keeping the array formatted as xfs/btrfs. some of the weirdness can be explained as the array and partiy are not tralking/working as they dhould based on the file system type. Quote Link to comment
d3m3zs Posted May 26 Author Share Posted May 26 15 minutes ago, bmartino1 said: It looks like you used zfs as the array file format per the main pciture... Exactly, worked smoothly before... 16 minutes ago, bmartino1 said: I've had terrible experience with unraid doing that. looks like it is now my turn to get this experience 👌 17 minutes ago, bmartino1 said: if you're doing ZFS it should be pool devices only. I thought to leave at least one ZFS drive for snapshots from nvme pool. Quote Link to comment
JorgeB Posted May 27 Share Posted May 27 First thing is to try and fix the disk1 issues, looks more like a power/connection issue, replace both cable for that disk and post new diags after array start. Quote Link to comment
d3m3zs Posted May 28 Author Share Posted May 28 On 5/27/2024 at 10:42 AM, JorgeB said: First thing is to try and fix the disk1 issues, looks more like a power/connection issue, replace both cable for that disk and post new diags after array start. Tried, rebooted, changed cables. First reboot I couldn`t see GUI, but was able to login via ssh and removed network config, rebooted from ssh again and was able to see GUI. But array didn`t start, status "Starting " and that is it, created diagnostics archive (in attachment) and rebooted server again from console. And now again - don`t have GUI, but can login via ssh: Any ideas what should I do? I already copied all config folder from server, maybe need to remove all files from config folder and reboot one more time? Also, I checked logs from statistics archive and found few errors on sda drive, so maybe USB issue? May 28 13:42:12 BlackBox kernel: critical medium error, dev sda, sector 2057 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 2 May 28 13:42:12 BlackBox kernel: Buffer I/O error on dev sda1, logical block 9, async page read May 28 13:42:12 BlackBox kernel: Buffer I/O error on dev sda1, logical block 10, async page read May 28 13:42:12 BlackBox kernel: Buffer I/O error on dev sda1, logical block 11, async page read May 28 13:42:12 BlackBox kernel: Buffer I/O error on dev sda1, logical block 12, async page read May 28 13:42:12 BlackBox kernel: Buffer I/O error on dev sda1, logical block 13, async page read May 28 13:42:12 BlackBox kernel: Buffer I/O error on dev sda1, logical block 14, async page read May 28 13:42:12 BlackBox kernel: Buffer I/O error on dev sda1, logical block 15, async page read May 28 13:42:12 BlackBox kernel: Buffer I/O error on dev sda1, logical block 8, async page read blackbox-diagnostics-20240528-1405.zip Quote Link to comment
d3m3zs Posted May 28 Author Share Posted May 28 Oh, maybe it helpful - right now all 8 HDD`s connected via LSI controller, previously 4 were connected to LSI and 4 to motherboard. But command lsblk can return list of disks. Quote Link to comment
JorgeB Posted May 28 Share Posted May 28 sda is the flash drive, if it's logging errors you need to replace it, or you can also try to reformat it first to see if it helps. Quote Link to comment
d3m3zs Posted May 28 Author Share Posted May 28 Just now, JorgeB said: sda is the flash drive, if it's logging errors you need to replace it, or you can also try to reformat it first to see if it helps. Will try to format, but what will be with license and how can I avoid losing all my settings? Copy again /config folder to drive? Quote Link to comment
JorgeB Posted May 28 Share Posted May 28 Backup the complete /config folder, then restore it. 1 Quote Link to comment
d3m3zs Posted May 28 Author Share Posted May 28 1 hour ago, JorgeB said: Backup the complete /config folder, then restore it. I formatted drive, flashed, copied again /config folder and Array is starting for 30 minutes... Quote Link to comment
d3m3zs Posted May 28 Author Share Posted May 28 Almost 1 hour Array is trying to start sdb = disk1, and sde and sdi = both are "backup_zfs" pool What should I do? pool: disk1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 348M in 1 days 03:05:31 with 7992612 errors on Tue May 28 10:56:43 2024 config: NAME STATE READ WRITE CKSUM disk1 ONLINE 0 0 0 /dev/md1p1 ONLINE 0 0 15.3M errors: 7992612 data errors, use '-v' for a list May 28 16:06:27 BlackBox kernel: I/O error, dev sdb, sector 24 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2 May 28 16:06:27 BlackBox kernel: I/O error, dev sdb, sector 24 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:06:27 BlackBox kernel: Buffer I/O error on dev sdb, logical block 3, async page read May 28 16:06:27 BlackBox kernel: I/O error, dev sdb, sector 23437770560 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2 May 28 16:06:27 BlackBox kernel: I/O error, dev sdb, sector 23437770560 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:06:27 BlackBox kernel: Buffer I/O error on dev sdb1, logical block 23437770496, async page read May 28 16:06:27 BlackBox kernel: I/O error, dev sdb, sector 23437770561 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:06:27 BlackBox kernel: Buffer I/O error on dev sdb1, logical block 23437770497, async page read May 28 16:06:27 BlackBox kernel: I/O error, dev sdb, sector 23437770562 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:06:27 BlackBox kernel: Buffer I/O error on dev sdb1, logical block 23437770498, async page read May 28 16:06:27 BlackBox kernel: I/O error, dev sdb, sector 23437770563 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:06:27 BlackBox kernel: Buffer I/O error on dev sdb1, logical block 23437770499, async page read May 28 16:06:27 BlackBox kernel: I/O error, dev sdb, sector 23437770564 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:06:27 BlackBox kernel: Buffer I/O error on dev sdb1, logical block 23437770500, async page read May 28 16:06:27 BlackBox kernel: I/O error, dev sdb, sector 23437770565 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:06:27 BlackBox kernel: Buffer I/O error on dev sdb1, logical block 23437770501, async page read May 28 16:06:27 BlackBox kernel: I/O error, dev sdb, sector 23437770566 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:06:27 BlackBox kernel: Buffer I/O error on dev sdb1, logical block 23437770502, async page read May 28 16:06:27 BlackBox kernel: Buffer I/O error on dev sdb1, logical block 23437770503, async page read May 28 16:07:58 BlackBox emhttpd: status: One or more devices has experienced an error resulting in data May 28 16:09:54 BlackBox kernel: I/O error, dev sdi, sector 592 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:09:54 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sdi1 error=5 type=1 offset=270336 size=8192 flags=b08c1 May 28 16:09:54 BlackBox kernel: I/O error, dev sdi, sector 35156654672 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:09:54 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sdi1 error=5 type=1 offset=18000207159296 size=8192 flags=b08c1 May 28 16:09:54 BlackBox kernel: I/O error, dev sdi, sector 35156655184 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:09:54 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sdi1 error=5 type=1 offset=18000207421440 size=8192 flags=b08c1 May 28 16:09:54 BlackBox kernel: I/O error, dev sdi, sector 592 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:09:54 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sdi1 error=5 type=1 offset=270336 size=8192 flags=b08c1 May 28 16:09:54 BlackBox kernel: I/O error, dev sdi, sector 35156654672 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:09:54 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sdi1 error=5 type=1 offset=18000207159296 size=8192 flags=b08c1 May 28 16:09:54 BlackBox kernel: I/O error, dev sdi, sector 35156655184 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:09:54 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sdi1 error=5 type=1 offset=18000207421440 size=8192 flags=b08c1 May 28 16:10:05 BlackBox kernel: I/O error, dev sde, sector 3597621920 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 2 May 28 16:10:05 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sde1 error=5 type=1 offset=1841982390272 size=4096 flags=1808a0 May 28 16:10:05 BlackBox kernel: I/O error, dev sde, sector 3597607896 op 0x0:(READ) flags 0x700 phys_seg 3 prio class 2 May 28 16:10:05 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sde1 error=5 type=1 offset=1841975209984 size=12288 flags=40080ca0 May 28 16:10:05 BlackBox kernel: I/O error, dev sde, sector 3597616616 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 2 May 28 16:10:05 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sde1 error=5 type=1 offset=1841979674624 size=4096 flags=1808a0 May 28 16:10:05 BlackBox kernel: I/O error, dev sde, sector 3597627120 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 2 May 28 16:10:05 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sde1 error=5 type=1 offset=1841985052672 size=4096 flags=1808a0 May 28 16:10:05 BlackBox kernel: I/O error, dev sde, sector 592 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:10:05 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sde1 error=5 type=1 offset=270336 size=8192 flags=b08c1 May 28 16:10:05 BlackBox kernel: I/O error, dev sde, sector 35156654672 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:10:05 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sde1 error=5 type=1 offset=18000207159296 size=8192 flags=b08c1 May 28 16:10:05 BlackBox kernel: I/O error, dev sde, sector 35156655184 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 May 28 16:10:05 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sde1 error=5 type=1 offset=18000207421440 size=8192 flags=b08c1 May 28 16:10:05 BlackBox kernel: I/O error, dev sde, sector 2062909672 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 2 May 28 16:10:05 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sde1 error=5 type=1 offset=1056209719296 size=4096 flags=1808a0 May 28 16:10:05 BlackBox kernel: I/O error, dev sde, sector 2062950344 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 2 May 28 16:10:05 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sde1 error=5 type=1 offset=1056230543360 size=8192 flags=1808a0 May 28 16:10:05 BlackBox kernel: I/O error, dev sde, sector 2062952920 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 2 May 28 16:10:05 BlackBox kernel: zio pool=backup_zfs vdev=/dev/sde1 error=5 type=1 offset=1056231862272 size=4096 flags=1808a0 Quote Link to comment
JorgeB Posted May 28 Share Posted May 28 I assume disk1 is still disabled and being emulated? If yes post the complete syslog, or the diags, there could be errors with other disks causing those. Quote Link to comment
d3m3zs Posted May 28 Author Share Posted May 28 Just now, JorgeB said: I assume disk1 is still disabled and being emulated? If yes post the complete syslog, or the diags, there could be errors with other disks causing those. You are absolutely right. So, just copied syslog: syslog.txt Quote Link to comment
JorgeB Posted May 28 Share Posted May 28 I don't see errors with disk1 on that log, do see errors on both pool devices, causing this: May 28 16:10:05 BlackBox kernel: WARNING: Pool 'backup_zfs' has encountered an uncorrectable I/O failure and has been suspended. A pool is suspended when zfs cannot continue to access it, because there were errors with both disks, this will make Linux get stuck, and Unraid will never continue without a reboot. You will need to force a shutdown, then check/replace cables for both pool disks and post new diags/syslog after array start. Quote Link to comment
d3m3zs Posted May 28 Author Share Posted May 28 13 minutes ago, JorgeB said: I don't see errors with disk1 on that log Many these records 00 00 May 28 16:06:27 BlackBox kernel: I/O error, dev sdb, sector 23437770560 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2 May 28 16:06:27 BlackBox kernel: sd 7:0:0:0: [sdb] tag#851 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK sdb is disk1. 15 minutes ago, JorgeB said: You will need to force a shutdown, then check/replace cables for both pool disks and post new diags/syslog after array start. Thank you, on my way. Quote Link to comment
JorgeB Posted May 28 Share Posted May 28 54 minutes ago, d3m3zs said: Many these records I missed those, since they are during boot before array start, but not the main issue for now, since disk1 is disabled. Quote Link to comment
d3m3zs Posted May 28 Author Share Posted May 28 (edited) Replaced cables, don`t see errors for disk1 anymore, but now I can see errors on parity drive... Decided to delete backup_zfs pool. Can`t see log Edited May 28 by d3m3zs Quote Link to comment
JorgeB Posted May 29 Share Posted May 29 You are having constant errors with multiple disks, suggesting some underlying issue, like bad PSU, or bad HBA for example, if you have some spares try swapping some parts around, you need to fix those errors first. Quote Link to comment
Solution d3m3zs Posted May 31 Author Solution Share Posted May 31 On 5/29/2024 at 11:04 AM, JorgeB said: You are having constant errors with multiple disks, suggesting some underlying issue, like bad PSU, or bad HBA for example, if you have some spares try swapping some parts around, you need to fix those errors first. I want to say thank you, it seems issue is resolved, now I know how to backup flash drive and which logs can be important. Solution was - add active cooling to LSI adapter: I will take a look at my drives in few days to be sure. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.