Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Verify Failing Drive, added Smart Report

Featured Replies

Last night my server did its monthly full array check. Three hours into the check the error log started filling with errors related to the one drive.

 

Jan  1 03:00:13 Tower kernel: sd 3:0:3:0: [sdi] Unhandled error code (Errors)

Jan  1 03:00:13 Tower kernel: sd 3:0:3:0: [sdi]  (Drive related)

Jan  1 03:00:13 Tower kernel: Result: hostbyte=0x04 driverbyte=0x00 (System)

Jan  1 03:00:13 Tower kernel: sd 3:0:3:0: [sdi] CDB:  (Drive related)

Jan  1 03:00:13 Tower kernel: cdb[0]=0x2a: 2a 00 56 be 44 a7 00 00 08 00

Jan  1 03:00:13 Tower kernel: sd 3:0:3:0: [sdi] Unhandled error code (Errors)

Jan  1 03:00:13 Tower kernel: sd 3:0:3:0: [sdi]  (Drive related)

Jan  1 03:00:13 Tower kernel: Result: hostbyte=0x04 driverbyte=0x00 (System)

Jan  1 03:00:13 Tower kernel: sd 3:0:3:0: [sdi] CDB:  (Drive related)

Jan  1 03:00:13 Tower kernel: cdb[0]=0x2a: 2a 00 56 be 44 9f 00 00 08 00

Jan  1 03:00:13 Tower kernel: md: disk4 write error, sector=1455310784 (Errors)

Jan  1 03:00:13 Tower kernel: md: disk4 write error, sector=1455310776 (Errors)

Jan  1 03:00:13 Tower kernel: md: disk4 write error, sector=1455310768 (Errors)

Jan  1 03:00:13 Tower kernel: md: disk4 write error, sector=1455310760 (Errors)

Jan  1 03:00:13 Tower kernel: md: disk4 write error, sector=1455310752 (Errors)

Jan  1 03:00:13 Tower kernel: md: disk4 write error, sector=1455310744 (Errors)

Jan  1 03:00:13 Tower kernel: md: disk4 write error, sector=1455310736 (Errors)

Jan  1 03:00:13 Tower kernel: md: disk4 write error, sector=1455310728 (Errors)

 

When I try to run HDParm info  against the drive I am getting no information. The Smart Status Report gives me the following:

Smartctl: Device Read Identity Failed: Input/output error

 

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

 

So is disk4 dead/dying? What is my next steps in verifying the failure / fixing it.

 

Is this the correct replacement procedure?

1. Buy new drive

2. Preclear / test new drive on another machine. Stop array and shutdown.

3. Replace drive, bring up system

4. Add new drive into disk 4's slot

5. Rebuild array while crossing fingers

 

On a related note, right now my drives are all 2 TB or smaller. Can I replace the failing drive with a larger drive and then also replace the parity with a larger drive? Or am I stuck since I need the parity drive to be same/larger to rebuild the array?

syslog-2016-01-01.zip

No you can not add a larger data disk as it violates the requirement that the parity disk is as large or larger than the biggest data disk.

 

However there is a special procedure known as swap/disable to follow which replaces the parity disk with a larger parity disk and then rebuilds the failed data disk onto what was previously the old parity disk.

 

Read the instructions carefully as they must be correctly followed to ensure that the process is successful.

 

Once the rebuild of the failed data drive  is complete you can then replace the rebuilt data drive with a larger disk.

  • Author

So it sounds like my best bet would be to order a new drive the same size, as I don't want to add complications.

 

Will my array work normally (but degraded) until I can replace the failed disk 4? I know normal raid5 would still function in this scenario but in a degraded state as it is using the parity to calculate what should be on disk 4 whenever I access information. Does unraid follow the same principle?

However you may wish to wait for advice from people who are experienced with handling a data disk failure during a parity check as I am not sure whether the failure might have marked the parity disk as invalid and you may need to perform additional step first.

So it sounds like my best bet would be to order a new drive the same size, as I don't want to add complications.

 

Will my array work normally (but degraded) until I can replace the failed disk 4? I know normal raid5 would still function in this scenario but in a degraded state as it is using the parity to calculate what should be on disk 4 whenever I access information. Does unraid follow the same principle?

 

Yes, the failed disk will be emulated by all other disks + parity, performance will be degraded.

I would strongly suggest rebooting the system to see if the problem drive now comes online so that a SMART report can be obtained.

 

More often than not when a disk start throwing errors it is caused by an external factor (such as a loose cable) and the drive itself is fine.  If that is the case and the disk that was reporting errors is actually fine then there may be valid alternative approaches going forward.  For example:

  • Stop the array and back up the contents of the USB drive somewhere (e.g. your PC)
  • Do a New Config assigning all the current data disks, and assign a new larger parity disk.  In this case you should select for the parity disk the new maximum disk size you intend to use going forward as the parity drive can never be smaller than the data drives.
  • Start the array to build new parity from the current set of data disks.  Keep the old parity disk intact in case anything goes wrong and you need to revert to the current configuration.  If at the end everything goes well it can be added to the array as an additional data disk if so desired

  • Author

I will try rebooting the system later today to see if the drive comes back online so I can get the Smart report from it. Would it be better to try running diagnostics from the server using the disk management tools or pulling the drive to another machine to run the diagnostics?

 

Edit: I also did order a new 2TB as a direct replacement for the drive as well that should be here on Sunday.

I will try rebooting the system later today to see if the drive comes back online so I can get the Smart report from it. Would it be better to try running diagnostics from the server using the disk management tools or pulling the drive to another machine to run the diagnostics?

 

Edit: I also did order a new 2TB as a direct replacement for the drive as well that should be here on Sunday.

The SMART reports go with the drive, so it is up to you whether you obtain them on the server or plug it elsewhere to get the report.  Do whatever is the most convenient.

  • Author

So I rebooted the server and afterwords the drive still shows red status after reboot. I was able to run HDParm info and Smart status report and short smart test.

 

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      85

  3 Spin_Up_Time            0x0027  164  161  021    Pre-fail  Always      -      6800

  4 Start_Stop_Count        0x0032  096  096  000    Old_age  Always      -      4352

  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0

  9 Power_On_Hours          0x0032  042  042  000    Old_age  Always      -      42720

10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      0

11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      69

192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      37

193 Load_Cycle_Count        0x0032  159  159  000    Old_age  Always      -      125327

194 Temperature_Celsius    0x0022  123  107  000    Old_age  Always      -      27

196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      2

198 Offline_Uncorrectable  0x0030  200  200  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x0008  200  199  000    Old_age  Offline      -      80

Full report is attached.

 

So on one hand, I don't see anything major in the Smart report, but it is odd that the drive still shows red status and that it does not show the temp.

 

I have a drive that should be delivered tomorrow, but it will take 2-3 days to preclear/check the disc. What should I do in the mean time? I have not written any new data to the server since the error. Is it okay to start the array? And if so should I change disk 4 to no device in the device status?

HDPArm_Info.txt

Smart_Status.txt

syslog-2016-01-02.zip

  • Author

So I am guessing the 2 pending does mean the drive is failing and it should probably be replaced correct?

 

So I can just start the array and Unraid will ignore the drive for now and I can run the array in a degraded state for the next 2-3 days until I add the new precleared drive? The array is only in a home, so I will not write any new data, and can keep the reading from the "emulated" drive to a minimum.

 

Or would it be crucial to turn off the server until I add the new precleared drive and can rebuild?

 

Edit: I found the Current_pending_sector in: http://lime-technology.com/wiki/index.php/Troubleshooting#What_do_I_do_if_I_get_a_red_X_next_to_a_hard_disk.3F

and it does sound pretty bad.

"An equally important attribute is the "Current_Pending_Sector", the RAW_VALUE is a count of suspect sectors pending reallocation. It should ALWAYS be zero and must be zero if the drive is to be used to reconstruct another. If it's not zero, then you will probably (but not always) see the Reallocated Sector Count increase in the future, when this does return to zero. Before remapping a suspect sector, it tests it one last time, and *may* pass it and not remap it. (There are good reasons why it is designed to work this way.)"

So I am guessing the 2 pending does mean the drive is failing and it should probably be replaced correct?

 

So I can just start the array and Unraid will ignore the drive for now and I can run the array in a degraded state for the next 2-3 days until I add the new precleared drive? The array is only in a home, so I will not write any new data, and can keep the reading from the "emulated" drive to a minimum.

 

Or would it be crucial to turn off the server until I add the new precleared drive and can rebuild?

I think all of this is answered at the link I gave if you just keep reading, including pending sectors. I wouldn't necessarily give up on that drive.
  • Author

Yeah I saw the section in the wiki of:

 

Resolving a Pending Sector

Pending sectors occur as a result of a read failures. An unreadable sector will interfere with the reconstruction of a failed drive. Pending sectors need to be cleared as soon as possible because 2 drives with unreadable sectors will most likely be unrecoverable within unRAID. Data disks with a small number of pending sectors should be fairly easy to recover with utilities in Linux or Windows and If anyone knows of a Mac utility that can recover Reiserfs please update this entry.

 

The safest procedure is to replace the drive with a pre-cleared spare. The original drive can then be pre-cleared and the pending sector count should go to zero. The original drive can then be used as a spare. Multiple pre-clear cycles should not be required and the disk should be RMAed if 1 cycle doesn't work. If the drive cannot be returned then multiple cycles may restore the drive to a usable state.

 

If no spare is available then follow the next procedure to re-enable the drive. The pending sector count should be zero after rebuilding. If not then replace.

 

So I think I might replace the drive with a precleared drive and then run the preclear on the old drive to see if the drive is actually failing. Given the relatively cheap cost of drives though, I don't know if I would ever really trust that drive again though. 80 bucks for the new drive is probably worth the peace of mind. This whole thing is really making me question how to do better backups of my Unraid system. Alas, I don't think there really is a very good backup for an Unraid system besides a 2nd system.

Yeah I saw the section in the wiki of:

 

Resolving a Pending Sector

Pending sectors occur as a result of a read failures. An unreadable sector will interfere with the reconstruction of a failed drive. Pending sectors need to be cleared as soon as possible because 2 drives with unreadable sectors will most likely be unrecoverable within unRAID. Data disks with a small number of pending sectors should be fairly easy to recover with utilities in Linux or Windows and If anyone knows of a Mac utility that can recover Reiserfs please update this entry.

 

The safest procedure is to replace the drive with a pre-cleared spare. The original drive can then be pre-cleared and the pending sector count should go to zero. The original drive can then be used as a spare. Multiple pre-clear cycles should not be required and the disk should be RMAed if 1 cycle doesn't work. If the drive cannot be returned then multiple cycles may restore the drive to a usable state.

 

If no spare is available then follow the next procedure to re-enable the drive. The pending sector count should be zero after rebuilding. If not then replace.

 

So I think I might replace the drive with a precleared drive and then run the preclear on the old drive to see if the drive is actually failing. Given the relatively cheap cost of drives though, I don't know if I would ever really trust that drive again though. 80 bucks for the new drive is probably worth the peace of mind. This whole thing is really making me question how to do better backups of my Unraid system. Alas, I don't think there really is a very good backup for an Unraid system besides a 2nd system.

Sounds reasonable and probably what I would do too since I can afford it.

 

As for backups, some do have a 2nd system, others just choose what is important and back that up. unRAID parity is not a backup in any sense. The only thing that counts as a backup of a file is another copy of the file. It makes sense to have priorities if you aren't able/willing to have another copy of everything. 1st priority is anything that is truly irreplaceable, like personal documents, photos, videos, anything like that which you have created and so has no other source.

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.