Jump to content

Brand new drives are failing left and right, not sure what is causing the issue


sloob

Recommended Posts

I've been plagued with drive failures for a while now, the only component I haven't changed yet is my case (it uses a SATA backplane) but before pulling the trigger on an expensive rackmount case I'd like to get some input and maybe some pointers. It's not my first time trying my luck on the unraid forums with this issue but unfortunately never got clear answers on the following points. So here it goes:

 

Right now the server is shut off because 2 drives are disabled (I have 2 parity drives) and I don't trust the system to rebuild a drive without another one going bad.

 

if worse comes to worst and another drive fails because of write errors can I force unraid to trust it (I understand write errors occured so some files are incomplete/corrupted) but most of the files on the drive are still good, no?

 

I felt like I was pretty secured with 2 parity drive but it sounds like if one of your SAS controller or if your backplane goes bad and causes write error you can fail every single drives in your system in a few minutes/seconds?

 

Thank you for reading this.

 

I am slowly going insane.

unraid-diagnostics-20221125-2105.zip

Edited by sloob
Link to comment
51 minutes ago, sloob said:

the only component I haven't changed yet is my case

Did you change out SATA power splitters and data cables?   Is it always the same drives?  Are these drives in the same slots?  

 

53 minutes ago, sloob said:

Right now the server is shut off because 2 drives are disabled

I had a look at the SMART folder and noticed that you only had one disk disabled (disk 5) at the time when you captured the Diagnostics.  Did a second disk fail after you did that?  (There is another disk (sdj) that appears to be unassigned.  Is that the missing parity 1 disk?)

 

I also had a quick look at the SMART data for that disk and it looks fine.  I suspect that you have another problem.  

 

PS can sometimes to a source of hardware related issues.  What is the wattage of that unit and does it have a single 12V rail?   

 

There have been backplane failures in the past.     Have you looked up your case and looked at the reviews to see what other users have reported?

Link to comment

Hi! Thanks for the answer,

 

Did you change out SATA power splitters and data cables?   Is it always the same drives?  Are these drives in the same slots?

 

Yes, I changed the power supply, SATA cables, SAS controller, drives and it's not always the same slot (although recently it's been happening more with disk 5)

 

 

I had a look at the SMART folder and noticed that you only had one disk disabled (disk 5) at the time when you captured the Diagnostics.  Did a second disk fail after you did that?  (There is another disk (sdj) that appears to be unassigned.  Is that the missing parity 1 disk?)

 

I'm not sure why the logs don't show it but here is what happened: Yesterday I upgraded one of my drive from 2TB to 4TB, while upgrading disk 5 failed (write error), I let unraid finish the data rebuild on the upgraded disk and then today I started my usual procedure when one of my drive fails due to write error (Stop the array, assign the failed disk to "none", Start the array, Stop the array, assign the now empty slot to the old drive and re-start the array. Unraid then rebuilds the drive using the parity.) Except this time Disk 5 (the failed drive) re-failed almost immediately while rebuilding itself. A few minutes later of of my parity drive also showed the red X so I immediately took a diagnostic and stopped the system before a 3rd drive failed.

 

What is the wattage of that unit and does it have a single 12V rail?   

I believe it's a 650W hot swappable redundant server power supply. I'm not sure if it's a single 12V rail or not.

 

Quote

There have been backplane failures in the past.     Have you looked up your case and looked at the reviews to see what other users have reported?

It's an old Chenbro RM31408, couldn't find anyone else having this issue.

Link to comment
2 hours ago, JorgeB said:

Disk5 was already disabled in the diags and it looks healthy, parity dropped offline, because of that there's no SMART but this is usually a power/connection problem, if you've already changing everything else the backplane is a good possibility.

I understand,  I think I will buy another case then. If another drive fails while rebuilding after that, can I force unraid to trust the drive that is moostly good but has a few read errors like I explained above or is there absolutely no other way and I will lose that drive?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...