LuckJury Posted August 7, 2022 Share Posted August 7, 2022 (edited) I'm about ready to pull my hair out with this one. A couple of days ago I shut down my server to install a new 2TB nvme to replace the cheapie SSD that I have been using for my cache drive up until now. I thought it would be a simple process, but when I booted the server back up, all hell broke loose. I got an error about btrfs file system issues from the cache drive, and 3 of my 5 data drives were showing up in the array and also under unassigned devices. Some googling at this point suggested various things, including issues with the cables, so I did a couple of rounds shutdown and power back on during which I reseated the sata cables and also updated the motherboard's bios, as one thread I read suggested a potential problem with the controller. Eventually I wound up with no drives erroneously showing under unassigned devices but Parity 2 and Disk 5 were both disabled in an error state. I did a rebuild on Disk 5, which seemed to complete without issue, and I've gone through the process of moving all of my cache files onto the array, reassigning to the new nvme, and moving the files back onto the cache. The old cache drive has been disconnected from sata and power. Where I'm running into a further issue is with trying to reassign parity 2. I've done the process of stop array, assign no device, start array, stop array, reassign the hard drive, and then start array again, but both times I've done that the parity check pauses almost immediately saying read errors on parity 2 (?), and I can neither cancel nor resume the parity check from the gui. From this state, shutdown and reboot are my only working options. I am now booted into safe mode rebuilding the data on disk 5 for the second time. After that finishes sometime tomorrow, I'll of course want to get parity 2 rebuilt as well. Other potentially pertinent details: At one point the USB drive was not seen by the bios and the machine booted from the windows boot manager on the other NVME that is passed through to my windows 10 vm. I plugged the usb into my desktop and checked for errors, finding none, and then plugged it back in and it has booted fine. Is my flash drive dying? Checking the system log during times when I had drives showing in both the array and unassigned devices showed an error to the tune of "disk with the ID <WDC SERIAL NUMBER> is not set to auto mount" and other errors saying that the disk labeled sde is now sdi (for example, don't remember the actual drive labels in question), with the drive labels corresponding to the ones that were showing up in the array and UD simultaneously. Additionally, I was unable to uninstall the unassigned devices plugin, getting either a blank screen or a 502 bad gateway from nginx where the uninstall progress should have been. The parity 2 disk is my newest disk, having been installed only a few months ago. Other hardware installed includes an RTX2060 and usb hub card which are passed through to a vm, a quadro p400 for transcoding, and a pcie sata card (which is just hosting an optical drive, none of my actual array or pool drives.) Anyone have any advice or experience? Please let me know if there is any additional information that I should provide. Edit - Should have mentioned, this system is running a Ryzen 9 3950X in an Asus X570-Plus Tuf Gaming Motherboard. Maybe a PCI Lane issue? Edited August 7, 2022 by LuckJury Quote Link to comment
JorgeB Posted August 8, 2022 Share Posted August 8, 2022 Please post the diagnostics. Quote Link to comment
LuckJury Posted August 8, 2022 Author Share Posted August 8, 2022 I should have thought ahead to do that, my apologies for creating an extra step. See attached. ironhide-syslog-20220807-1929.zip Quote Link to comment
JorgeB Posted August 8, 2022 Share Posted August 8, 2022 That is just the syslog, and it's being heavily spammed, reboot and post the complete diags after array start. Quote Link to comment
trurl Posted August 8, 2022 Share Posted August 8, 2022 18 minutes ago, JorgeB said: That is just the syslog 7 hours ago, JorgeB said: Please post the diagnostics. The word diagnostics in that post requesting diagnostics, and wherever it occurs in this post (or any other), is a link explaining how to get diagnostics Quote Link to comment
LuckJury Posted August 8, 2022 Author Share Posted August 8, 2022 I beg your pardon, this has me frazzled and I was moving too fast. I downloaded the diagnostics from the server and then selected the wrong file to upload. I've attached the diagnostics from earlier this morning to THIS post, and once the running data rebuild finishes in a couple of hours, I'll reboot out of safe mode and will gladly pull a fresh diagnostic and upload it if that will be helpful. ironhide-diagnostics-20220808-0908.zip Quote Link to comment
JorgeB Posted August 8, 2022 Share Posted August 8, 2022 Everything looks good in the beginning, then there's nothing else due to log spam: Aug 7 14:27:17 Ironhide kernel: vfio-pci 0000:0c:00.0: BAR 1: can't reserve [mem 0xb0000000-0xbfffffff 64bit pref] ### [PREVIOUS LINE REPEATED 1082469 TIMES] ### Quote Link to comment
LuckJury Posted August 9, 2022 Author Share Posted August 9, 2022 Alright, after the disk 5 rebuild completed in safe mode, I shut down the server. I removed the pcie 1x 2-port sata card that was installed to host an optical drive. I booted up, started the array, and ran a diagnostic. I then corrected the vfio settings in the system devices that had been changed due to the hardware changes, rebooted, assigned the parity 2 drive, and started the array (while holding my breath.) Everything seems to be working as expected at this point, making me think that I did in fact have a pcie lane issue. For the sake of making sure we're good, and in case someone googling in the future runs across this thread, I've attached both diagnostics to this post for review. ironhide-diagnostics-20220808-1859.zip ironhide-diagnostics-20220808-1838.zip Quote Link to comment
JorgeB Posted August 9, 2022 Share Posted August 9, 2022 Everything looks good so far. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.