Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Disk in error state - looking for some much needed guidance

Featured Replies

Hi all - first off, here is the overview of my server:

 

Unraid v: 6.9.2 (trial)

Plugins: CA, FCP, My Servers, Nerd Tools, Tips and Tweaks, Unassigned Devices, Dynamix System Buttons

No dockers or VMs

Hardware: ASUS P8B-E/4L mobo, Xeon CPU E3-1220 V2 @ 3.10GHz, 16 GB ECC RAM, 6 total disks all connected through onboard Intel C204 chipset controller

Array:

Parity: (1) WD60EFZX - 6TB

Data disks: (4) WD30EFRX - 3TB

Cache: (1) Samsung 870 EVO 500GB

 

Background: I originally set this hardware up about 9 years ago with Win Server 2012 Essentials and it pretty much ran fine until recently.  The OS drive (Samsung 840 Pro) seemed to have a disk error and became unbootable.  I restored the backup to a new 870 EVO, but then started having a problem with one of the disks in the storage pool (a WD30EFRX).  Since I knew I was overdue to move to a new OS (that wasn't 10 years old!), I decided to scrap the whole thing and try Unraid instead of another version of Win Server.

 

I set the Unraid server up last week with a new WD60EFZX (6TB) as the parity drive and the 870 EVO (basically brand new) as the cache drive.  I used 4 WD30EFRX's for the data disks.  During the first parity check, on of my WD30EFRX's had read errors.  I checked all the cables and replaced it with another (I have 6 total) - not the one that had the errors previously in the Win Server setup.  This time around the parity check was all good so I started up the array and began restoring all my data.

 

Problem: After restoring all my data, things seemed to be running well.  Then, I started working on setting up backups to iDrive using their Linux scripts.  I got it pretty much figured out and ran two selective backups (as tests) successfully.  On the third one, it hung up and then I lost connectivity to the WebGUI.  I logged into the console directly but I could not get it to shut down cleanly - I read a lot pages with explanations on how to do this, but apparently I didn't get all the commands to run successfully before rebooting.

 

After it rebooted, a parity check immediately started.  During that check, Disk 3 started showing a ton of read errors.  When it got to something like 48k read errors, Unraid disabled the disk and the rest of the parity check finished with no other errors (all other disks show healthy).  I stopped the array, downloaded diagnostics, then shut down the server and checked all cables.  I restarted the server and tried to run an extended SMART test on disk 3.  The test hardly started before coming back with errors and stopping.

 

Unfortunately I did not get a diagnostics before the unclean restart, but I attached the diagnostics from after the parity check along with the SMART test report for Disk 3.

 

Questions:

1) Based on the attached reports, is disk 3 definitely toast? (I assume it is)

2) Assuming yes to 1, then is this the proper procedure to follow to replace it: https://wiki.unraid.net/Manual/Storage_Management#Replacing_failed.2Fdisabled_disk.28s.29

3) Should I assume the other old WD30EFRX's are likely to fail soon also?  Essentially, I have lost 3 of 6 so far.  I am running extended SMART tests on them currently.

4) My plan is to add a second parity disk (another WD60EFZX 6TB), a second cache disk (another 870 EVO 500GB) and replace Disk 3 with a new WD60EFZX 6TB.  But I am wondering if the other 3 WD30EFRX's are ticking time bombs and I should just decommission and replace all of them?

 

That's all I can think of for now.  I really appreciate any help and insights since I am new to Unraid and drinking the manual, guides, etc. like a fire hose currently, but still very green at this point!

 

 

 

bnt-unraid-smart-20220223-0909.zip bnt-unraid-diagnostics-20220223-0908.zip

  • Community Expert

It's logged as a disk issue and the SMART test failed, so yes, the disk needs to be replaced.

  • Community Expert
8 minutes ago, BlkNTan said:

Assuming yes to 1, then is this the proper procedure to follow to replace it: https://wiki.unraid.net/Manual/Storage_Management#Replacing_failed.2Fdisabled_disk.28s.29

Yes.

 

8 minutes ago, BlkNTan said:

Should I assume the other old WD30EFRX's are likely to fail soon also?  Essentially, I have lost 3 of 6 so far.  I am running extended SMART tests on them currently.

Wait for the extended test results, but if they just did a parity check they should be OK for now.

 

 

  • Author

Jorge - thank you so much for the quick response!!

 

Quick follow up question - is there anything specific in the SMART test report that I should be focusing on or is it just the fact that it couldn't finish and had errors.  In other words, assuming the other 3 disks pass their extended tests, is there anything I should look for in those results that might tell me the drive is ok now, but likely to have trouble soon?

 

Thanks again!

  • Community Expert

With WD disks it's good practice to monitor these attributes:

 

  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    997
200 Multi_Zone_Error_Rate   ---R--   100   253   000    -    0

 

A non zero value is never a good sign, especially if it keeps climbing, this one is from the failed disk, other ones are still at 0, so they should be good for now.

 

  • Community Expert

Passing the extended tests is a MUST for assuming a disk is healthy.

 

Other signs of potential problems even is extended tests complete successfully is reallocated sectors starting to increase or Pending sectors being non-zero

  • Author

Thanks both for the replies! I did add attributes 1 and 200 to SMART monitoring based on another post - thanks for confirming.

 

As an update, disk 4 passed the extended test. Disk 1 is in process.

 

Reallocated and pending sectors are attribute numbers 5 and 197 - correct?

 

Thanks again for all the help!!

  • Author

Update - the other 3 disks pass their extended tests.  I added a replacement disk and the data rebuild finished with zero errors - yay!

 

Next up is adding a second parity disk and a second cache disk.

 

In case anyone else reads this and is looking to learn more about SMART attributes, etc., I found these two articles helpful:

https://wiki.unraid.net/Understanding_SMART_Reports

https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures/

 

Thanks again Jorge and itimpi!

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.