December 14, 20232 yr Hello all, A bit of background: I've been using Unraid (6.12.6) on my server with 3 16 TB Seagate drives in an array, with one of them used as parity, for the past 2-3 months. The drives are all formated to ZFS. One of the drives is used solely for storing Nextcloud data (Nextcloud itself is running in Docker, using the LinuxServer image, updated regularly). There is a bash script that is run every night that uses `rclone` to sync the contents of the individual users' folders to a remote NAS server. A couple of days ago, I woke up to an email from the server saying Quote Event: Unraid Disk 2 error Subject: Alert [VALINOR] - Disk 2 in error state (disk dsbl) Description: ST16000NM001G-2KK103_ZL2LSB9A (sdc) Importance: alert I asked on Discord and was told to try to rebuild the drive on itself, which I promptly did. Last night, a day and a half later, the process completed, I rebooted the server and everything looked in order once again. This morning I was greeted with the exact same email. The drive is marked with a red X and it says its contents are emulated. Here's an exerpt from the System log, which I thought might be relevant: https://pastebin.com/YSWCfj6F. I'm also attaching a diagnostics archive. For the time being, I've stopped Nextcloud altogether, so that nothing writes to the drive and I've left it to perfrom a long SMART test (results pending). Could someone please help me understand what's causing this problem and how to avoid it in the future? If you need additional information, I'd be happy to provide it. Thank you! valinor-diagnostics-20231214-1345.zip Edited December 14, 20232 yr by zkvvoob
December 14, 20232 yr Community Expert Disk was already disabled at boot, so we can't see what happened, rebuild and if it happens again post new diags before rebooting, I would recommend replacing/swapping cables before the rebuild, to rule that out.
December 14, 20232 yr Author All right, I'll come back in a couple of days... 😔 EDIT: Wait, I'm fairly certain I didn't reboot the server after the drive failed this morning. Here's what I did: last night I waited for the rebuild process to complete, rebooted then, made sure all Docker containers were up an running and went to sleep. This morning I noticed the drive had failed and stopped Nextcloud, then initiated a SMART test... Edited December 14, 20232 yr by zkvvoob
December 14, 20232 yr Community Expert 5 minutes ago, zkvvoob said: EDIT: Wait, I'm fairly certain I didn't reboot the server after the drive failed this morning. Diags posted cover from Dec 14 09:45:32 to Dec 14 13:38:46 and there are no rebuilds or disks getting disabled, an unclean shutdown detected though, so possibly the server rebooted on its own, which is not a good sign and likely unrelated to the disk issue.
December 18, 20232 yr Author On 12/14/2023 at 2:08 PM, JorgeB said: Diags posted cover from Dec 14 09:45:32 to Dec 14 13:38:46 and there are no rebuilds or disks getting disabled, an unclean shutdown detected though, so possibly the server rebooted on its own, which is not a good sign and likely unrelated to the disk issue. So, I followed @JorgeB's advice and initiated a new data-rebuild process. That was 4 days ago. Three days in, the server started behaving weirdly, as in no sound from the drives reading/writing, even though the process was supposed to be under way. Then I lost external access to all the services I was running (Docker containers). I couldn't even reboot the server, neither from the button on the Dashboard, nor from terminal. That lead me to force a shutdown. Afterwards, I mounted the array and let the data-rebuild process to start again. I can hear the drives running now and when I last checked 3 hours ago, it showed 25.1% completing. However, at this point, 3 hours in, the numbers haven't changed (even though the disks are being written to/read from). Furthermore, I can't even download the diagnostics report: the process is stuck at sed -ri 's/^(share(Comment|ReadList|WriteList)=")[^"]+/\1.../' '/valinor-diagnostics-20231218-1420/shares/appdata.cfg' 2>/dev/null sed -ri 's/^(share(Comment|ReadList|WriteList)=")[^"]+/\1.../' '/valinor-diagnostics-20231218-1420/shares/b-----s.cfg' 2>/dev/null sed -ri 's/^(share(Comment|ReadList|WriteList)=")[^"]+/\1.../' '/valinor-diagnostics-20231218-1420/shares/d----r.cfg' 2>/dev/null I've replaced all SATA cables. What should I do here, please? Thank you! EDIT: the problems with the docker services began again – the drives stopped making a sound and all of my external applications are no longer accessible, the error being 504 gateway timeout. Edited December 18, 20232 yr by zkvvoob
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.