November 4, 20232 yr hey guys, Keep getting an odd one, so had 2 drives disable them selfs, 1 drive is 2nd parity other one is data disk, have smart checked both drives and both come up fine but everytime i attempt to rebuild on the data drive i get i/o error dev sde sector xxxxx assuming a certain area on the drive is failing/failed? Unraid 6.12.4 Upon i/o error has disabled all my drives until i reboot, could it be the SAS card is failing? Have replaced bot SAS cables, have reflashed my sas card which is a HP H220 flashed in IT mode with latest firmware and bios. Also attached my syslog Cheers nas-syslog-20231104-0310.zip Edited November 4, 20232 yr by Epicslayer2
November 4, 20232 yr Community Expert Solution Nov 4 11:02:03 NAS kernel: mpt2sas_cm0: SAS host is non-operational !!!! Nov 4 11:02:04 NAS kernel: mpt2sas_cm0: SAS host is non-operational !!!! Nov 4 11:02:05 NAS kernel: mpt2sas_cm0: SAS host is non-operational !!!! Nov 4 11:02:06 NAS kernel: mpt2sas_cm0: SAS host is non-operational !!!! Nov 4 11:02:07 NAS kernel: mpt2sas_cm0: SAS host is non-operational !!!! Nov 4 11:02:08 NAS kernel: mpt2sas_cm0: SAS host is non-operational !!!! HBA problem, make sure it's well seated and sufficiently cooled, or try a different PCIe slot if possible.
November 5, 20232 yr Author Thinking my hba was overheating, installed a fan on it and will monitor it and see how it goes. Thanks
November 7, 20232 yr Author Yep ended up being the HBA overheating, reapplied thermal paste and installed a fan on it and both drives have rebuilt fine with no issues. Thanks mate.
February 29, 20242 yr I'm having a similar issue were a couple drives get disabled every few days. I rebuild (which takes a couple days) then everything is fine, till it isn't. Things I've seen on other posts and tried which did not solve the problem: Replace HBA with specific make and model listed Remove HBA heatsink, apply thermal past, reinstall, add fan to heatsink Replace HBA to SATA cables Replace power cables Reseat everything Replace power supply I'm at a loss and my wit's end. Any help would be appreciated. Diagnostics attached. theark-diagnostics-20240229-0642.zip
February 29, 20242 yr Community Expert 49 minutes ago, disposable-alleviation3423 said: I'm having a similar issue were a couple drives get disabled every few days. HBA problem: Feb 28 23:44:43 TheArk kernel: mpt3sas_cm0: SAS host is non-operational !!!! Make sure it well seated and sufficiently cooler, you can also try a different PCIe slot.
February 29, 20242 yr Thanks for the reply. The HBA has new thermal paste, a 40mm Noctua fan attached to the heat sink, and I left the case open with a box fan blowing on it to rule out a heat issue. I have reseated it several times. If the connection were the issue, would it work for several days and then fail? I've used 2 HBAs now with the same result. Both HBAs were tried in the same slot. I'll try the other slot tonight but my GPU only fits if it's in the slot it's in now so if this works, I'll have to figure something out. I've noticed these events always occur overnight. I put a UPS on the server to clean the power and prevent any blips. Is it possible that the card or the port has some sleep setting where it powers down due to some inactivity? Would that explain why only 1 or 2 of the drive fail instead of all 7?
February 29, 20242 yr Community Expert 6 minutes ago, disposable-alleviation3423 said: Is it possible that the card or the port has some sleep setting where it powers down due to some inactivity? Seems unlikely but won't say it's not possible. 7 minutes ago, disposable-alleviation3423 said: Would that explain why only 1 or 2 of the drive fail instead of all 7? Unraid only disables has many data disks as there are parity disks.
March 9, 20242 yr Been having the same issue. Different disks dropping out randomly. Some days apart, other weeks apart. Have two disks disabled now. Found this thread. Been driving me mental for the last 3-4 months. I moved the HBA up a slot to open up airflow and put an extra fan blowing directly toward it. Hoping it was an overheating issue like others have faced - it was a little cramped in its former slot and pretty hot to the touch. Rebuilding both disks now.
March 9, 20242 yr Community Expert 12 minutes ago, mmagl said: Found this thread This thread is marked solved. Probably better to start your own thread with your diagnostics.
March 9, 20242 yr Nah, I was just commenting on how it helped me realize the likely issue. I was appreciating the find.
March 13, 20242 yr Okay, sorry for the wait. Here is my report. Previously, I had a GPU (3070) in the PCIEx8 slot and the HBA in the PCIEx16 slot. To eliminate variables, I removed the GPU altogether, relocated the HBA to the x8 slot and ran the server for a full week. Not a single drive disabled and everything was great. Tonight, I reinstalled the GPU, changing nothing else. The server booted up and ran fine for an hour, then, a drive was disabled. Either the GPU is the issue or the motherboard does not like having 2 cards installed. I checked the manual and it states that if a card is installed in the x16 and x8 slots, both slots will run at x8. Diagnostics attached. Do I just have to buy another motherboard that will support 2 cards? Very confused. theark-diagnostics-20240312-1915.zip
March 13, 20242 yr Community Expert Mar 12 17:10:10 TheArk kernel: mpt3sas_cm0: SAS host is non-operational !!!! Make sure the HBA is well seated and sufficiently cooled, you can also try a different PCIe slot.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.