hi, so plobably not related to this issue but last week i made major changes to my system in order to add a ssd cache and a gpu for a vm for this i added 2 lsi 9207 cards via a asus hyper m.2 expander (due to only having 2 pci slot with 4 lanes or more one of with was already in use) anyways, a little complex but seems to work(ish) but having problems with the lsi cards ether dropping out and then chrashing, this is using 6x ST16000NM001G with 4 data drives and dual parity and this issue has so far caused 4 disks to become disabled, data 2 and 3 on the first party check after the changes, at around 5% completion, the second attempt worked and data 2 and 3 rebuilt successfully at this point i made a full backup and ran another party check to confirm it was running ok and chrash at approx 10% with parity 1 disabled at this point i sawpped the controllers around so now controller 2 had the hdds connected and controller 1 had the ssds ran parity rebuild for parity 1 disk and at around 4% chrash and data 3 again disabled, unforchanelty id only enabled logging on the second chrash and didnt realise these was only saved on ram so have no logs
im aware that ST16000NM001G is a not a ironwolf but read up that these are very similar to their 16tb ironwolf drives so maybe affected, i origianlly thought this was due to bent pins on the cpu with happened during this rebuild where i dropped the cpu after it attaching to the underside of the cooler and crushed it with the case while tiring to catch it, this affected 8 pins compleatly flattening them but according to the diagram on wiki chip these are for memory channel A and GND (pin 2 from the corner broke but this is only power) the cpu ran happly during stress test and is currently 2 hours through a mem test with 0 errors, so if this isnt the issue then i can only assume it to be the signal intrerty between the cpu and the 9207's which ill test by dropping the link speed down to gen 2 and hope this dont affect my 10gb nic
full system spec before
DATA: ST16000NM001G x6
cache- none
vm data - samsung 860 1tb via unassigned drives
docker data - sandisk 3d ultra 960gb via unassigned drives
these was connected via mobo ports and via a cheap sata card i had lying around in pciex1_1
GPU: 1660super for plex (in pciex16_1)
CPU: 3950X
mobo: asus b550-m
ram: 64gb corasir vengence (non ecc) @3600mhz
psu; corsair 850W RMX
case: fracal design node 804
with APC UPS 700VA
damaged pin details:
according to wiki chip (link to pic )
damaged pins was C39 - K39 (C39 - K38 fully flattened) and AP1 to AU1 was slightly bent but these, after repair B39 fell off as it was not only flattened but had achally folded in half and A39, C39 E39 and J39 still had a thin section on the top part of the pin right where it was bent, systems booted and passed CPU stress test ect, (didnt consider doing a mem test at this time)
full system spec after
DATA: ST16000NM001G x6
cache- 2x MX500 2tb
vm data - 2x samsung 860 1tb via pools
docker data - sandisk 3d ultra 960gb and samsung 860 1tb via pools
these are via 2x lsi 9207 in slots pciex16_1 via hpyer m.2 slot 2 and 3 with the HDDs in one card and the SDD's in the other card)
NIC: asus XG-C100C (in pciex16_1 via hpyer m.2 slot 4)
GPU: 1660super for plex (in pciex16_1 via hpyer m.2 slot 1)
GPU2: RX570 (intended for win 10 vm currently unsued in pciex16_2)
CPU: 3950X (nowwith bent and missing pins)
ram: 64gb corasir vengence (non ecc) @3600mhz
mobo: asus b550-m
psu; corsair 850W RMX
with APC UPS 700VA
case: fracal design node 804 (yeah its very tight build)
ill update if i find the issue (or get logs of it now i have those set up) but slim chance its related (still got at least 22 hours of mem test to go tho)
sorry for the long comment but more detail hopfully helps