FearlessAttempt Posted April 18 Share Posted April 18 Ran out of free ports so I recently added an Intel RES2SV240 SAS-2 Expander to my LSI 9207-8i in order to add a new 18tb drive to the array as Parity 2 and move the current 14tb Parity 2 drive to a data drive. Dual link sas connection from the expander to the hba and new sas to sata breakout cables. Expander plugged into pci slot for power. After setting everything up and booting up the system all drives were recognized but 1 drive was disabled and there were crc errors. I changed out the sata breakout cables and checked all connections and now 2 drives were disabled. Smart test completed without error for both drives. Ended up running the check filesystem on both drives with -L. Drives are now mountable but contents emulated. When attempting to rebuild the drives from parity the webgui eventually becomes unavailable. Server does not appear to be on the network anymore. I also have a pikvm hooked up to the server and the unraid console there is also locked up. I have let it run the expected rebuild time hoping it would recover but did not and eventually did an unclean shutdown. Replaced the PSU with a brand new larger unit, thinking the drives did not have enough power. I have since removed the new hard drive and sas expander and returned to the original data cables with the same issue happening during rebuild. Attempted various combinations of booting in safe mode and running the rebuild in maintenance mode or regular mode. Sometimes the rebuild will run for many hours before the system locks up and sometimes it happens in less than 30 minutes. Ran memtest with no errors after one pass. Not sure what to do now. Diagnostics attached. Unraid Version: 6.12.10 CPU: Intel I5-9600K Motherboard: ASRock Z390 Extreme4 RAM: 64GB G.Skill F4-3200C16-16GUK (4x 16GB) DDR4-3200 PSU: Seasonic TX-1300 Cache: 2x SAMSUNG 2TB 870 EVO (zfs) HDDs: Various size/age shucked WD (xfs) HBA: LSI 9207-8i apollo-diagnostics-20240417-1916.zip Quote Link to comment
itimpi Posted April 18 Share Posted April 18 Did not spot anything obvious in the diagnostics. The syslog in the diagnostics is the RAM version that starts afresh every time the system is booted. You should enable the syslog server (probably with the option to Mirror to Flash set) to get a syslog that survives a reboot so we can see what leads up to a crash. The mirror to flash option is the easiest to set up (and if used the file is then automatically included in any diagnostics), but if you are worried about excessive wear on the flash drive you can put your server's address into the remote server field. Quote Link to comment
FearlessAttempt Posted April 18 Author Share Posted April 18 I have enabled the syslog server and set to mirror to flash. Screenshot with the settings used below. Attempted the rebuild again in maintenance mode with the same outcome and eventual unclean shutdown. New diagnostics file attached. apollo-diagnostics-20240418-1222.zip Quote Link to comment
FearlessAttempt Posted April 19 Author Share Posted April 19 I looked at the syslog previous and didn't see anything obvious but I don't know exactly what to look for. Any thoughts on next steps? Quote Link to comment
FearlessAttempt Posted April 19 Author Share Posted April 19 Should I try reformatting the 2 emulated drives and then doing a rebuild? Maybe replace them with new drives, but I would be limited to 12 or 14tb as replacement options because of the parity, would of course prefer to go larger. Is it possible I need a new flash drive? Quote Link to comment
JorgeB Posted April 19 Share Posted April 19 There's nothing relevant logged, and the server still crashing in maintenance mode points more to a hardware issue, could just be overheating, or power related, since a rebuild will use more power/cause more heat. 15 minutes ago, FearlessAttempt said: Should I try reformatting the 2 emulated drives and then doing a rebuild? Reformatting the emulated drives will delete all data there, and IMHO nlikely that the disks are the problem, I would check CPU temps and/or try again with a different PSU, if available. Quote Link to comment
Solution FearlessAttempt Posted April 21 Author Solution Share Posted April 21 So it ended up being the HBA causing the issue. Not sure if its overheating or what. Did have a fan on the heatsink. Replaced it temporarily with some pcie sata cards until I can get a replacement. Thanks to itimpi and JorgeB for taking the time to check my logs and point me in the right direction. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.