WoRie Posted June 18, 2023 Share Posted June 18, 2023 Hi I ran the 6.12 rc5 before updating to the final and I now encounter errors with my lsi 9300 hba. If everything is freshly booted and spun up, all is fine. However, when disks spin down, the gui still shows a green dot for the disks connected to the hba, while the ones connected to my mainboard sata controller have the correct grey dot. This wouldn't bother me, but the hba disks also suddenly report read errors after some time, which again are fixed after rebooting the host until the disks enter standby. Could this be due to power management of the pci e devices? Or is my hba on the way out? Quote Link to comment
JorgeB Posted June 18, 2023 Share Posted June 18, 2023 Please post the diagnostics. Quote Link to comment
WoRie Posted June 19, 2023 Author Share Posted June 19, 2023 there you are. I've since disabled anything APSM related in the BIOS. But now again all disks are spun down and won't come up again. wonas-diagnostics-20230619-2019.zip Quote Link to comment
JorgeB Posted June 19, 2023 Share Posted June 19, 2023 Jun 19 19:50:26 WoNas kernel: mpt3sas_cm0: _base_fault_reset_work: Running mpt3sas_dead_ioc thread success !!!! You are having HBA issues, make sure it's sufficiently cooled and well seated, or try a different PCIe slot if available. Quote Link to comment
WoRie Posted June 21, 2023 Author Share Posted June 21, 2023 Hi JorgeB, so, i think I f*cked up... I pulled the HBA and repasted the heatspreader on the chip. The old thermal compound was completely dry, solid and oozed a solidified liquid that looked like treesap. After reassembling, i booted unraid and started the rebuild of disk 10 (as referred in the screenshot). During this, the connection broke down again. I believe the card is toast and dies when getting too warm (it's pretty warm here the last few days), even with new thermal compound and a directly attached fan. The issue now is, that the rebuild of disk 10 hasn't finished and disk 8 showed also suddenly as "disabled - content emulated." And this is where I made a mistake I think... I stopped the array, set the failed disk 8 to "disabled", started the array in maintenance, stopped it again and tried reassigning the disk to slot 8. But now it shows as a new device... I can power up the array only when i set disk 8 to unassigned, otherwise too many disks are missing / changed. I don't want to carry on with the rebuild of disk 10 with this shot HBA, a new one should arrive tomorrow. However, will I be able to fix this situation at all and what would be the best course of action? Will I be able to correctly reassign disk 8 after disk 10 has been rebuilt, or is the data on disk 8 gone and I have to add it as a new device? The partition is still there in unassigned devices and my array is only 30% full, so if I can save the files somehow, that would be great. The files are not irreplacable, but nevertheless would be a hassle to aquire again. Quote Link to comment
WoRie Posted June 23, 2023 Author Share Posted June 23, 2023 There you are. I've just installed a new HBA, it says Data Rebuild but the two discs affected by the outage show up as unmountable wonas-diagnostics-20230623-1637.zip Quote Link to comment
JorgeB Posted June 23, 2023 Share Posted June 23, 2023 Unraid cannot emulate two disks with single parity, what happened to disk8? It's not even assigned. Quote Link to comment
WoRie Posted June 23, 2023 Author Share Posted June 23, 2023 (edited) 11 minutes ago, JorgeB said: Unraid cannot emulate two disks with single parity I know, thats why I'm a bit scared 11 minutes ago, JorgeB said: [...] what happened to disk8? It's not even assigned. It's showing up as new. The data on it is still accessible when mounted through unassigned devices, but I cannot reintroduce it into the array like this. And I also can't check the filesystem, because if i assign both, I cannot start the array due to 2 missing/new disks with single parity. Edited June 23, 2023 by WoRie Quote Link to comment
JorgeB Posted June 23, 2023 Share Posted June 23, 2023 SMART looks OK, I assume disk8 was the seconds one to get disabled? If yes we can force enable it to try and rebuild disk10, assuming parity is still valid. Quote Link to comment
WoRie Posted June 23, 2023 Author Share Posted June 23, 2023 Yes, Disk8 suddendly showed up as dead. How can I force it back into the array? I believe parity should still be valid and the disks should be fine. The issue was the HBA. In my old case it was directly cooled through a case fan that was near it, in the new case this fan is missing and it was 30° C the last few days. I believe that was the culprit and the HBA died. I zip tied a small Noctua to the new HBA to be safe in the future Quote Link to comment
JorgeB Posted June 23, 2023 Share Posted June 23, 2023 This will only work if parity is still valid, but if nothing else should re-enable disk8 and its data: -Tools -> New Config -> Retain current configuration: All -> Apply -Check all assignments and assign any missing disk(s) if needed, including the old disk8 and current disk10 -IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked) -Stop array -Unassign disk10 -Start array (in normal mode now) and post new diags. Quote Link to comment
WoRie Posted June 23, 2023 Author Share Posted June 23, 2023 Here are the new diagnostics. How would I go about rebuilding disk10 now, if (hopefully) everything is fine again? wonas-diagnostics-20230623-1834.zip Quote Link to comment
JorgeB Posted June 23, 2023 Share Posted June 23, 2023 I assume disk10 was xfs? If yes stop array, click on disk10, change fs from auto to xfs, post new diags after array start. Also this is not very good: Jun 23 18:35:00 WoNas kernel: md: disk2 read error, sector=128 Quote Link to comment
WoRie Posted June 23, 2023 Author Share Posted June 23, 2023 I changed disk10 to xfs and started the array. Disk10 shows as unsupported file system. I don't know about the read errors. When the hba became instable I saw reported write errors, than cleaned up after an reboot. wonas-diagnostics-20230623-2212.zip Quote Link to comment
JorgeB Posted June 24, 2023 Share Posted June 24, 2023 Jun 23 22:10:47 WoNas kernel: md: disk2 read error, sector=8589934608 Jun 23 22:10:47 WoNas kernel: md: disk3 read error, sector=8589934608 There are read errors on multiple disks while trying to emulate disk10, run an extended SMART test on both. Quote Link to comment
WoRie Posted June 25, 2023 Author Share Posted June 25, 2023 Both disks completed the extend test without error. wonas-diagnostics-20230625-1128.zip Quote Link to comment
JorgeB Posted June 25, 2023 Share Posted June 25, 2023 Replace cables for both disks and post new diags after array start. Quote Link to comment
WoRie Posted June 25, 2023 Author Share Posted June 25, 2023 Here you go. Started the array in maintenance with new cables fresh out the box. wonas-diagnostics-20230625-1733.zip Quote Link to comment
WoRie Posted June 25, 2023 Author Share Posted June 25, 2023 disk 5 had problems negotiating a link, even with new cables. Now after some up and down the array is up. disk 10 reports as being emulated and i only can perform a read check but no rebuild of disk 10... I think if i will be able to restore the array in full, i immediatly should move all files from these old disks, some from 2011 to the newer 18tb drives... Can I rebuild disk10 which currently appears empty in the array or should I wipe it and readd it? wonas-diagnostics-20230625-1811.zip Quote Link to comment
JorgeB Posted June 26, 2023 Share Posted June 26, 2023 Still having issues with multiple disks, including a disk dropping offline, could be a power problem. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.