rcmpayne Posted August 19, 2018 Share Posted August 19, 2018 (edited) Hello All, I've been trying to troubleshoot an issue I started to get after switching to a 1U HP Server where random HDD's will start reporting errors. If I leave it long enough, the drive will go disabled due to not able to write, however, as soon as I start to see errors and restart the server clears, and I am good for a week or two. I use an internal LSI SAS controller with two SAS to sata cables for the drives. I've since switched to external LSI controller and new external SAS cable to sata cables and still have the same issue. I thought the external HDD case was the issue, so I decided to remove that from the picture and used a separate PC power supply (with jumper) and connected to the drives directly, however that doesn't help. (note: my friend is doing this with zero issues) I read that memory might be an issue but due to having dual CPU's i don't know if I can remove sticks (I have a lot of them). Thoughts? Can memory trigger these types of issues? Before people start advising the external drive setup if the cause, I have four internal bays with direct onboard sata cables and these too have issues. It seems to be completely random on what drive starts with the errors, and if you search my name, you will see a few other posts with issues rebuilding what I thought was bad drives, but I now know replacing drives with newer ones is not fixing anything either. When I get the email error alert, I see sometimes one drive with a single error, and sometimes i check its 5 of 7 drives with errors, again with a reboot, its fixed for sometime. I have attached a new diag report after a restart, but I can start grabbing them once I see errors again before I reboot. Some of my old post with errors also have diag reports and I assume they are all the same issues and not what i thought was bad drives. server-diagnostics-20180819-0613.zip Edited August 19, 2018 by rcmpayne Quote Link to comment
JorgeB Posted August 19, 2018 Share Posted August 19, 2018 If I understand correctly you already replaced the HBA and are using a different PSU with the drives, my next suspect would be the motherboard, I would consider very unlikely CPU and RAM causing these. Quote Link to comment
rcmpayne Posted August 19, 2018 Author Share Posted August 19, 2018 9 hours ago, johnnie.black said: If I understand correctly you already replaced the HBA and are using a different PSU with the drives, my next suspect would be the motherboard, I would consider very unlikely CPU and RAM causing these. I have a Dell R415 1u I can try. Can I just swap the lsi controller, usb boot drive + the hdds and boot in the new system? Quote Link to comment
JorgeB Posted September 1, 2018 Share Posted September 1, 2018 On 8/20/2018 at 12:22 AM, rcmpayne said: Can I just swap the lsi controller, usb boot drive + the hdds and boot in the new system? It should work. Quote Link to comment
rcmpayne Posted September 1, 2018 Author Share Posted September 1, 2018 Ok I did that new Dell server and started getting the errors again. Restart fixed it. Last night one drive went disabled. Trying to fix that now but all the xfs switches I try to set it back to green icon. It's disk7 which is a 2 month old WD red server-diagnostics-20180901-0325.zip Quote Link to comment
JorgeB Posted September 1, 2018 Share Posted September 1, 2018 Diags are after rebooting so not much help, but if you already replaced HBA, PSU and now the same happens with a different server it's very odd, SMART for the disabled disk looks fine. Quote Link to comment
rcmpayne Posted September 1, 2018 Author Share Posted September 1, 2018 Yea I am at a loss. New server, new lsi controller, new cables, new PSU, new ram, new hdds. So how can I get this disabled drive back its brand new? Quote Link to comment
JorgeB Posted September 1, 2018 Share Posted September 1, 2018 You can rebuild or do a new config (if no new data was written to that disk since if got emulated), rebuild would recommend only if using a new disk, since the server is unstable any issues during the rebuild might leave it worse than it is now. Quote Link to comment
rcmpayne Posted September 1, 2018 Author Share Posted September 1, 2018 If I do a new config, will it rebuild the missing data from that drive?Sent from my Pixel 2 using Tapatalk Quote Link to comment
bonienl Posted September 1, 2018 Share Posted September 1, 2018 A new config can NEVER rebuild a disk. See also the text in the GUI. Quote Link to comment
JorgeB Posted September 2, 2018 Share Posted September 2, 2018 12 hours ago, rcmpayne said: If I do a new config, will it rebuild the missing data from that drive? No, it would reenable the disabled disk as it was before it got disabled, it would lose any data saved to the emulated disk, if there was any. Quote Link to comment
rcmpayne Posted September 3, 2018 Author Share Posted September 3, 2018 Ok I backed up the drive to a spare and removed and re-added it to the array. After it rebuilt, everything was backed and matched the backup I took. Sent from my Pixel 2 using Tapatalk Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.