February 27, 20251 yr I need help with the rebuild process of my ZFS array. One of my hard drives (Disk 2) failed and was replaced. Since I only had a 12TB drive available and no 10TB drive, I had to use the 12TB drive as the new parity disk. To do this, I swapped the old 10TB parity disk with the 12TB drive. This process completed successfully. After that, I used the previous 10TB parity disk as a replacement for Disk 2, which started the data rebuild process. However, at around 30%, the server suddenly became unreachable on the network. A connected USB keyboard stopped responding, and there was no display output. The hard drives were still running but made no audible data transfer sounds. I had no choice but to cut the power and restart the server. Once the server booted up again, I restarted the rebuild process. It ran stably for a while, but I found it strange that the hard drives were active without any audible write operations. Only at around 40% did the drives start making noticeable write noises, but at the same time, the data transfer speed kept dropping – from an initial 150 Mbit/s down to 50 Mbit/s until it completely stopped at 46%. I tried pausing the process, but when I click the "Pause" button, the Unraid loading animation appears briefly, and then the button remains as "Pause" without any effect. In the Unraid web UI under "Main," no hard drives are listed anymore. However, they are still visible in the dashboard and currently in standby mode. Through the dashboard, I can access each disk and wake them up. I ran a short SMART test on all disks, and they all passed without errors. Even though the disks audibly performed the self-test, they are still displayed as "Standby" or "Rebuilding" in the dashboard, which cannot be correct. How can I restart the rebuild process or find out why it is stuck? Which logs should I check? I’m worried that if I restart the server now, I’ll lose all progress and get stuck in a loop. Disk configuration: Before Disk 2 failure: Parity: 10TB Disk 1: 10TB Disk 2: 10TB (failed) Disk 3: 10TB After replacing Disk 2 with a new 12TB drive: Parity: 12TB (swapped with the previous 10TB parity disk) Disk 1: 10TB Disk 2: 10TB (former parity disk) Disk 3: 10TB EDIT: I just turned on the connected Monitor and to my surprise theres a huge Error message which I´m to dumb to read. Can anyone figure something out here? I also switched on the syslog. Edited February 27, 20251 yr by VanillaThunder Additional Info
February 27, 20251 yr Author 3 minutes ago, JorgeB said: Please post the diagnostics. There you go, thanks für the reply arrakis-diagnostics-20250227-1027.zip
February 27, 20251 yr Community Expert Solution The Unraid driver crashed, this is typically a hardware issue, you will need to reboot, then recommend upgrading to 7.0.1 and trying again, if the same thing happens with the newer kernel, it's almost certainly hardware.
February 27, 20251 yr Author 2 hours ago, JorgeB said: The Unraid driver crashed, this is typically a hardware issue, you will need to reboot, then recommend upgrading to 7.0.1 and trying again, if the same thing happens with the newer kernel, it's almost certainly hardware. Alright, thanks. I did exactly that and the Rebuild is currently running again. Hopefully it will succeed. Do you have any other advise I could look for in the Syslog to do a deeper investigation if this Problem will occur again? So that I could focus on a specific hardware component or BIOS setting?
February 27, 20251 yr Community Expert If there are more crashes, see if they are related to the Unraid driver (md_mod) Feb 26 22:23:47 Arrakis kernel: ? copy_data+0x16/0x219 [md_mod] Feb 26 22:23:47 Arrakis kernel: ? kernel_fpu_end+0x24/0x39 Feb 26 22:23:47 Arrakis kernel: ? raid5_generate_d+0xce/0x109 [md_mod] Feb 26 22:23:47 Arrakis kernel: copy_write_data+0x48/0x8f [md_mod] Feb 26 22:23:47 Arrakis kernel: unraidd+0xbf7/0x1140 [md_mod] Feb 26 22:23:47 Arrakis kernel: md_thread+0xf4/0x122 [md_mod] Feb 26 22:23:47 Arrakis kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20 Feb 26 22:23:47 Arrakis kernel: ? signal_pending+0x1d/0x1d [md_mod] If yes, RAM would be the first suspect, CPU/board the next ones.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.