liujason Posted June 8, 2022 Share Posted June 8, 2022 (edited) I have not experienced this before in the past (10 years?) with unraid. Console boots and it looks normal (see attached), but I can't get to the web gui. The IP address (192.168.1.3) is unreachable via ping. Other iLO IP address is fine, indicating NIC is ok. Checked clients via router, should not have ip conflict. How should I diagnose further? (diagnostics attached) tower-diagnostics-20220608-1442.zip Edited June 14, 2022 by liujason Adding diagnostics Quote Link to comment
ChatNoir Posted June 8, 2022 Share Posted June 8, 2022 Would that apply to you ? It seems that you are running HP hardware. Quote Link to comment
SimonF Posted June 8, 2022 Share Posted June 8, 2022 (edited) 7 minutes ago, ChatNoir said: Would that apply to you ? It seems that you are running HP hardware. I think so as ruuning nics that need tg3 Edited June 8, 2022 by SimonF Quote Link to comment
liujason Posted June 8, 2022 Author Share Posted June 8, 2022 (edited) Thanks for the prompt reply! Yes. It is an HP. Is the next step to create the empty file config/modprobe.d/tg3.conf ? Is there anyway to verify tg3 will work with the build? (I don't want to wait till data corruption happens. Edit: Read through this post, and seems like the recommendation for now is to disable VT-d. Edited June 8, 2022 by liujason Quote Link to comment
Solution liujason Posted June 9, 2022 Author Solution Share Posted June 9, 2022 Disabled VT-d, and webGUI is back! Thanks! Quote Link to comment
JorgeB Posted June 9, 2022 Share Posted June 9, 2022 A couple of questions if you don't' mind, did you upgrade from v6.9.2 (or older) directly to v6.10.2? And was your cache unmountable before doing that? Quote Link to comment
liujason Posted June 9, 2022 Author Share Posted June 9, 2022 It was 6.9.x Sorry I don't remember the minor version. Cache was not unmountable. Restarted the server after the update as usual. Quote Link to comment
JorgeB Posted June 9, 2022 Share Posted June 9, 2022 11 minutes ago, liujason said: Cache was not unmountable. Thanks, so cache was mounting before updating, correct? And the diags are from the first boot with v6.10.2 or did you do any previous boots with this release? Quote Link to comment
liujason Posted June 9, 2022 Author Share Posted June 9, 2022 Cache was mounting before updating - correct. I was doing a big cache flush (600GB or so) before the reboot following the update. Diags are unfortunately not from the first boot. I rebooted it several times wondering why I couldn't connect to the webGUI. Quote Link to comment
JorgeB Posted June 9, 2022 Share Posted June 9, 2022 1 minute ago, liujason said: Diags are unfortunately not from the first boot. I rebooted it several times wondering why I couldn't connect to the webGUI. Ok, thanks, looks like your server is affected by the possible corruption issue, as long as you leave vt-d disable for now there shouldn't be any more issues, and hopefully a fix will come soon. Quote Link to comment
liujason Posted June 9, 2022 Author Share Posted June 9, 2022 Just now, JorgeB said: Ok, thanks, looks like your server is affected by the possible corruption issue, as long as you leave vt-d disable for now there shouldn't be any more issues, and hopefully a fix will come soon. Thank you for all the support! Best of luck finding the fix! Quote Link to comment
liujason Posted June 12, 2022 Author Share Posted June 12, 2022 (edited) On 6/9/2022 at 12:05 AM, JorgeB said: A couple of questions if you don't' mind, did you upgrade from v6.9.2 (or older) directly to v6.10.2? And was your cache unmountable before doing that? Now I'm seeing my cache unmountable. (attaching diags). I have not restarted the system. "Unmountable: Wrong or no file sysem" ("system" misspelled BTW). tower-diagnostics-20220611-2205.zip Edited June 12, 2022 by liujason Quote Link to comment
JorgeB Posted June 12, 2022 Share Posted June 12, 2022 3 hours ago, liujason said: Now I'm seeing my cache unmountable Yes, that was already visible in your first diags, there are some recovery options here, but based on the error I'm not very optimist they will work, you might need to re-format and restore form backups, if available. Quote Link to comment
liujason Posted June 13, 2022 Author Share Posted June 13, 2022 Ah... bonkers. All docker apps/libraries are running in cache drive. I may have lost all the appconfig. What caused the drive to be unmountable? is this the data corruption that happened? I thought the update blocked the NIC, and data corruption wouldn't have occured. Quote Link to comment
JorgeB Posted June 13, 2022 Share Posted June 13, 2022 2 hours ago, liujason said: is this the data corruption that happened? Unfortunately that's what most likely happened, just by starting the array with VT-d enable for a few minutes can be enough to corrupt some data, and usually anything btrfs is the first to go since it's much more susceptible to kernel memory corruption. Note that if you need vt-d you can upgrade to v6.10.3-rc1, I'm confident that the corruption issue that affects mostly same vintage HP servers is resolved. Quote Link to comment
liujason Posted June 14, 2022 Author Share Posted June 14, 2022 22 hours ago, JorgeB said: Unfortunately that's what most likely happened, just by starting the array with VT-d enable for a few minutes can be enough to corrupt some data, and usually anything btrfs is the first to go since it's much more susceptible to kernel memory corruption. Note that if you need vt-d you can upgrade to v6.10.3-rc1, I'm confident that the corruption issue that affects mostly same vintage HP servers is resolved. Got it. Alright, rebuilt cache file system as XFS. Thanks for your support. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.