Jump to content

Disk turns bad during parity check


Go to solution Solved by mfarlow,

Recommended Posts

Okay, I finally surrender.  I have an ongoing issue where every time I run a parity check one of the disks goes bad (red X).  This has been going on for close to a year now.  I run my parity check at the begining of the month each month, and each month the parity starts runs for a short time then the array goes bad and I am left with a bad disk.  I should mention this started happening after I added my 10th data drive to the array.  I should also mention that it is not the same disk going bad.  One month it is disk2, the next month it might be disk8 or disk6.  When I run a smart check on the disk, it comes back with no errors. 

 

I am currently running Unraid 6.12.6

 

Right now I am sitting with the array stopped so the issue doesn't progess any further.  In the past I would perform a new config, reassign the drives and it will run for a short time before the disk goes bad and becomes unmountable.  At that point I would run the built-in File System Check took, which would tell me that something is corrupted and it tries to rebuild it. (Sorry I forget what gets corrupted in the file system, probably one of the nodes.)  At that point I would pull the drive replace it with a new one and everything will work again until next month.

 

Initially I thought it might have been the drives, and just kept replacing them with new drives.  But then it started happening with the new drives as well.  So I turned my attention to the HBA.  My parity and cache plug directly into the motherboard SATA ports. My HBA supports 8 data drives (the other 2 are on the mobo).  I figured maybe the HBA was struggling so I replaced it.  I was able to get 1 parity check out of it before the issue returned.  Next I decided to run 2 HBA's and split the load across them, 4 drives each.  Again the issue returned.  At this point I thought maybe the temps in the case were too high (they were), So I added cooling fans on top of the HBA's which drastically reduced the temps.  The issue still occurred.

 

At this point I thought perhaps it was a power draw issue.  I have 4 data drives connected to a Silverston CP06 -E4 power splitter.  I have 4 drives on each splitter.  All together I am running 3 splitters.  I decided to add 2 more splitters for the data drives so there is only 2 drives per sliptter.  I also added another SATA power cable to my power supply so that I have more power connections to spread around.  Again none of these seemed to help.

 

I am not worried too much about data loss as I have backups, but it is getting to be a PITA having to restore backups every single month.  We're talking about 10-14 TB of data to restore every month.  

 

So at this point I am tired of banging my head against the wall and was hoping someone from this forum might have a suggestion or idea that I can try.  

 

 

tower-syslog-20240203-1851.zip tower-diagnostics-20240203-1350.zip

Link to comment

Disk looks fine and it's not logged as a disk problem, so most likely power/connection, if it's happening to multiple disks power would be a good place to start, see if you can test a different PSU, also make sure no power splitters are in use, or at least an acceptable amount only.

Link to comment

I currently have 2 silverstone splitters each running 4 drives (the type with the capacitors).  I have actually replaced those already, just to be safe.  I'm going to try rewiring so I only have 2 drives per splitter, maybe that will help. 

 

In the meantime I will try to aquire another power supply from and see if that helps.  Do you think my current power supply might be underpowered?  

 

I appreciate alll the help!

Link to comment
  • 1 month later...

I was finally able to replace my power supply.  I ordered a 650W EVGA that was on the A-Tier of the PSU list.  First one took 2 weeks to arrive from Amazon.  Then due to work I was unable to rewire my Unraid server for a while.  Finally got the replacement PSU in, but one of the SATA ports on the PSU was bad which limited me to powering only 4 drives, so off to order another replacement.  Turns out they sent me a returned PSU.  The 2nd replacement arrived and I was able to quickly get it installed.  I was able to start the array, perform a new config to get rid of the red X on my "bad" drive and run a full parity check which took a couple of days.

 

The parity drive completed, but oddly I received an error message that there were x number of errors (it was alot).  But after running a smart report I found no errors with any of the drives.  So far the drives have not turned bad.  I assume the error I saw very breifly had to do with the parity drive being in a error state during the parity check.

I think it is too early to say for certain it was the power supply, but for now it is working fine.  I plan on running another parity check in a week or so to see if the issue returns.  For now, I am backing up the data just in case it happens again.  I want a full backup before I run the parity check.

 

I wanted to thank everyone who chimed in on this.  I was getting pretty frustrated with UnRaid, and was considering switching to another storage solution.  Glad I didn't have to switch. :)

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...