Jump to content

Random Disk Errors


Recommended Posts

Posted

Hi guys,

I have recently been having issues with my unraid server. Every couple weeks or so my parity drive becomes disabled. I have replaced the HDD, sata connectors and even switched entirely to an HBA. for reference i have:

AMD Ryzen 7 2700
MSI X570-A PRO Motherboard
Nvdia GeForce GT 710 2GB
32gb RAM
LSI IBM SAS 9200-8i in IT Mode
8TB WD Red (whitelabel) Parity1
8TB Seagate Exos Parity2 - Not Installed
8TB Seagate Exos Data Drive
4TB Seagate Data Drive
2TB Dell Enterprise Data Drive
2TB Dell Enterprise Data Drive
120GB Kingston SSD Cache Drive
Rosewill RSV-L4312 using this for hot swap drives/back plane

After replacing my parity drive the first time (Seagate Exos 8TB) with a shucked WD Red 8TB (White label) and new sata cables i ran tests on the Exos with Seagates Seatools and found no HDD errors and it is perfectly health. About two weeks went by then the WD drive appeared disabled. it showed to have read errors and crc errors. at that point (today) i switched from new sata connectors to an HBA solution and installed the old Exos 8tb in the server to become a second parity drive. I brought the server online and it detected all the drives so i removed the current parity drive that was disabled from the array and rebooted. it came back up i reassigned the WD to parity 1 and Exos to parity 2 and started the array. once the gui refreshed it showed my data drive of 8TB was missing. i panicked and stopped the array. i had a hunch maybe my power supply is causing these issues. so i powered down and removed the exos parity2 drive and booted it back up. Now it detects all the drives again. so to be safe i decided to start rebuilding the WD parity drive so that at least im protected by that since right now nothing is in valid parity and im vulnerable to another drive failure. this server originally was made from spare parts i had laying around and thus my power supply is a 650watt rosewill arc-m650. i thought that should be enough to power all these devices but i may be wrong.

I'm really asking if this all makes sense and what power supplies would be reccomended to support all this and a total of 14drives. My case can house 12 drives plus i can have two extra SSDs for cache. This case also uses molex to power the drives so i would need something that could have 6 molex connectors to power 12 drives. for a better understanding of the case here is the link https://www.newegg.com/out-surface-painting-black-only-rosewill-rsv-l4312/p/N82E16811147316?Item=N82E16811147316

 

 

 

After all of this, and switching to the HBA everything has been fine for a couple weeks again. BUT today i received errors that my main data disk is disabled after encountering errors. At this point i have no idea what is causing issues since it has been for 3 different drives in different bays. Can someone help me troubleshoot this? I attached some screenshots and log files. Please let me know if there is anything else I can do.

2020-05-09 20_51_43-Tower_Main.png

diskloginfo.png

tower-diagnostics-20200509-2103.zip tower-syslog-20200510-0104.zip

Posted

I swapped the drive into a different bay. was able to rebuild the disk in Unraid with no problems. So far recovering from these disabled disks has been fine but i dont want this to keep happening roughly every two weeks. I read that it may help to clean all the contact points for power and my SAS breakout cables with isopropryl alcohol. I will do that this week and hope for the best.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...