Jump to content

System becomes unresponsive during data rebuild of 2 disabled drives


Go to solution Solved by FearlessAttempt,

Recommended Posts

Ran out of free ports so I recently added an Intel RES2SV240 SAS-2 Expander to my LSI 9207-8i in order to add a new 18tb drive to the array as Parity 2 and move the current 14tb Parity 2 drive to a data drive. Dual link sas connection from the expander to the hba and new sas to sata breakout cables. Expander plugged into pci slot for power. After setting everything up and booting up the system all drives were recognized but 1 drive was disabled and there were crc errors. I changed out the sata breakout cables and checked all connections and now 2 drives were disabled. Smart test completed without error for both drives. Ended up running the check filesystem on both drives with -L. Drives are now mountable but contents emulated.

 

When attempting to rebuild the drives from parity the webgui eventually becomes unavailable. Server does not appear to be on the network anymore. I also have a pikvm hooked up to the server and the unraid console there is also locked up. I have let it run the expected rebuild time hoping it would recover but did not and eventually did an unclean shutdown.

 

Replaced the PSU with a brand new larger unit, thinking the drives did not have enough power.

 

I have since removed the new hard drive and sas expander and returned to the original data cables with the same issue happening during rebuild.

 

Attempted various combinations of booting in safe mode and running the rebuild in maintenance mode or regular mode. Sometimes the rebuild will run for many hours before the system locks up and sometimes it happens in less than 30 minutes.

 

Ran memtest with no errors after one pass.

 

Not sure what to do now.

 

Diagnostics attached.

 

Unraid Version: 6.12.10

CPU: Intel I5-9600K

Motherboard: ASRock Z390 Extreme4

RAM: 64GB G.Skill F4-3200C16-16GUK (4x 16GB) DDR4-3200

PSU: Seasonic TX-1300

Cache: 2x SAMSUNG 2TB 870 EVO (zfs)

HDDs: Various size/age shucked WD (xfs)

HBA: LSI 9207-8i

 

1444147502_ScreenShot2024-04-15at9_46.32PMcopy.thumb.jpg.82b0400755cd64fc57353e9b292bed44.jpg1204113957_ScreenShot2024-04-17at9_18_41PM.thumb.png.c47cb65f0466b7a3a4b33860f887df3a.png

 

apollo-diagnostics-20240417-1916.zip

Link to comment

Did not spot anything obvious in the diagnostics.

 

The syslog in the diagnostics is the RAM version that starts afresh every time the system is booted.  You should enable the syslog server (probably with the option to Mirror to Flash set) to get a syslog that survives a reboot so we can see what leads up to a crash.  The mirror to flash option is the easiest to set up (and if used the file is then automatically included in any diagnostics), but if you are worried about excessive wear on the flash drive you can put your server's address into the remote server field. 

 

Link to comment

There's nothing relevant logged, and the server still crashing in maintenance mode points more to a hardware issue, could just be overheating, or power related, since a rebuild will use more power/cause more heat.

 

15 minutes ago, FearlessAttempt said:

Should I try reformatting the 2 emulated drives and then doing a rebuild?

Reformatting the emulated drives will delete all data there, and IMHO nlikely that the disks are the problem, I would check CPU temps and/or try again with a different PSU, if available.

 

Link to comment
  • Solution

So it ended up being the HBA causing the issue. Not sure if its overheating or what. Did have a fan on the heatsink. Replaced it temporarily with some pcie sata cards until I can get a replacement. Thanks to itimpi and JorgeB for taking the time to check my logs and point me in the right direction.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...