Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Main page showing errors

Featured Replies

Hey guys,

So I need some help. Once I set up unrid almost a year ago everything was running smooth and I haven't had to touch it at all.

 

Today I logged in for my monthly partiy check and noticed that on the main page my parity drive is showing

1,666 errors!

 

Actually that number was closer to 900 a few days ago I believe and I ran a parity check. Parity check shows "0" errors, the drives still has a green ball next to it, but it is showing 1,666 error in the row next to the parity drive.

 

Can someone help me out with this? I guess this is the downside of unraid (the lack of being intuitive and user friendly." So now I'm concerned.

 

Would appreciate your help

Version 5.0.3

 

 

See SYSTEM LOG in my sig. Also, paste a SMART report.

  • Author

syslog for couple of days attached

Syslog_copy.doc

  • Author

Smart report

Model Family:    Seagate Desktop HDD.15

Device Model:    ST4000DM000-1F2168

Serial Number:    Z300ASTX

LU WWN Device Id: 5 000c50 050381f70

Firmware Version: CC51

User Capacity:    4,000,787,030,016 bytes [4.00 TB]

Sector Sizes:    512 bytes logical, 4096 bytes physical

Rotation Rate:    5900 rpm

Device is:        In smartctl database [for details use: -P show]

ATA Version is:  ATA8-ACS T13/1699-D revision 4

SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)

Local Time is:    Mon Apr  7 17:04:28 2014 EDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

was never started.

Auto Offline Data Collection: Disabled.

Self-test execution status:      (  0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (  623) seconds.

Offline data collection

capabilities: (0x73) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

No Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  1) minutes.

Extended self-test routine

recommended polling time: ( 553) minutes.

Conveyance self-test routine

recommended polling time: (  2) minutes.

SCT capabilities:       (0x1085) SCT Status supported.

 

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  119  099  006    Pre-fail  Always      -      218486096

  3 Spin_Up_Time            0x0003  091  091  000    Pre-fail  Always      -      0

  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      563

  5 Reallocated_Sector_Ct  0x0033  100  100  010    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x000f  069  060  030    Pre-fail  Always      -      8043421

  9 Power_On_Hours          0x0032  092  092  000    Old_age  Always      -      7361

10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      2

183 Runtime_Bad_Block      0x0032  099  099  000    Old_age  Always      -      1

184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0

187 Reported_Uncorrect      0x0032  074  074  000    Old_age  Always      -      26

188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -      0 0 0

189 High_Fly_Writes        0x003a  093  093  000    Old_age  Always      -      7

190 Airflow_Temperature_Cel 0x0022  074  060  045    Old_age  Always      -      26 (Min/Max 7/40)

191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0

192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      1

193 Load_Cycle_Count        0x0032  099  099  000    Old_age  Always      -      2474

194 Temperature_Celsius    0x0022  026  040  000    Old_age  Always      -      26 (0 7 0 0 0)

197 Current_Pending_Sector  0x0012  100  098  000    Old_age  Always      -      72

198 Offline_Uncorrectable  0x0010  100  098  000    Old_age  Offline      -      72

199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0

240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      464h+08m+22.617s

241 Total_LBAs_Written      0x0000  100  253  000    Old_age  Offline      -      29559569616

242 Total_LBAs_Read        0x0000  100  253  000    Old_age  Offline      -      117867593215

 

SMART Error Log Version: 1

ATA Error Count: 26 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

 

Error 26 occurred at disk power-on lifetime: 7282 hours (303 days + 10 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  43d+04:21:26.618  READ DMA EXT

  25 00 00 ff ff ff ef 00  43d+04:21:23.234  READ DMA EXT

  25 00 00 ff ff ff ef 00  43d+04:21:23.169  READ DMA EXT

  25 00 00 ff ff ff ef 00  43d+04:21:21.691  READ DMA EXT

  25 00 00 ff ff ff ef 00  43d+04:21:21.627  READ DMA EXT

 

Error 25 occurred at disk power-on lifetime: 7282 hours (303 days + 10 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  43d+04:21:05.694  READ DMA EXT

  25 00 00 ff ff ff ef 00  43d+04:21:05.656  READ DMA EXT

  35 00 a8 ff ff ff ef 00  43d+04:21:05.014  WRITE DMA EXT

  25 00 58 ff ff ff ef 00  43d+04:21:04.933  READ DMA EXT

  25 00 00 ff ff ff ef 00  43d+04:21:04.817  READ DMA EXT

 

Error 24 occurred at disk power-on lifetime: 7282 hours (303 days + 10 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  43d+04:21:00.683  READ DMA EXT

  25 00 00 ff ff ff ef 00  43d+04:21:00.613  READ DMA EXT

  35 00 38 ff ff ff ef 00  43d+04:21:00.292  WRITE DMA EXT

  25 00 c8 ff ff ff ef 00  43d+04:21:00.181  READ DMA EXT

  25 00 00 ff ff ff ef 00  43d+04:20:56.331  READ DMA EXT

 

Error 23 occurred at disk power-on lifetime: 7282 hours (303 days + 10 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  43d+04:20:52.589  READ DMA EXT

  25 00 00 ff ff ff ef 00  43d+04:20:52.531  READ DMA EXT

  35 00 68 ff ff ff ef 00  43d+04:20:51.589  WRITE DMA EXT

  25 00 98 ff ff ff ef 00  43d+04:20:51.468  READ DMA EXT

  25 00 00 ff ff ff ef 00  43d+04:20:48.375  READ DMA EXT

 

Error 22 occurred at disk power-on lifetime: 7282 hours (303 days + 10 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 00 ff ff ff ef 00  43d+04:20:42.833  READ DMA EXT

  25 00 00 ff ff ff ef 00  43d+04:20:42.786  READ DMA EXT

  35 00 78 ff ff ff ef 00  43d+04:20:41.132  WRITE DMA EXT

  25 00 e8 ff ff ff ef 00  43d+04:20:38.853  READ DMA EXT

  25 00 00 ff ff ff ef 00  43d+04:20:38.786  READ DMA EXT

 

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute del

  • Author

Sorry. Don't have the time to read right now at work. That's why j made this post.

 

So is my drive failing? I don't know how to interpret these things... The server hasn't moved at all so don't think it's a loose cable...

 

But also, I thought if there's an actual error there would be a red circle next the drive and not a green one which indicates everything is fine?

 

 

Sent from my iPhone using Tapatalk

Sorry. Don't have the time to read right now at work. That's why j made this post.
Your drive has pending sectors.
197 Current_Pending_Sector  0x0012  100  098  000    Old_age  Always      -      72

So you need to follow the directions at the link he posted. Do you have a specific question about what you need to do?

  • Author

I couldn't read everything from my phone.

 

Can you explain what lending sectors are and why they didn't red ball the drive? So this doesn't mean the hard drive is bad?

 

Thanks

 

 

Sent from my iPhone using Tapatalk

  • Author

Thanks so much guys! You are the best.

 

I stopped the array, unassigned, started, stopped, then re-assigned the disc and it re-building the parity now.

 

Am I correct in that the instructions say if there are more errors after this time that the drive then needs to be replaced?

 

Also, could someone please explain why this still showed a green ball signlaing a functional drive next to it? I thought that unraid would place a red ball next to a drive whenever it detects it is failing?

 

I almost didn't notice the "errors" because I usually just look at the green balls next to the drives and go right into a parity check once a month...

Also, could someone please explain why this still showed a green ball signlaing a functional drive next to it? I thought that unraid would place a red ball next to a drive whenever it detects it is failing?
Until a write operation fails, the drive will be green. Read errors result in the rest of the array being spun up, and the data that should be at that location is calculated from the rest of the drives and written back to the drive. If the write succeeds, then the drive stays green, and the error counter is incremented.

 

Stock unraid does not monitor smart statistics, or try to determine whether or not a drive is healthy. It simply keeps track of how many times a read or write operation failed, and disables the drive and red balls it if a write fails.

  • Author

Got it. Thanks.

 

Now I used putty before with screen so I could close the computer.

 

I have since got a MacBook. Is there anything like screen for it? I only have a laptop so it will eventually be disconnected

 

Also, I see there is a new webgui dominix? Does it have preclear as an add-on from the webgui by any chance?

 

Thanks again

 

 

Sent from my iPhone using Tapatalk

Got it. Thanks.

 

Now I used putty before with screen so I could close the computer.

 

I have since got a MacBook. Is there anything like screen for it? I only have a laptop so it will eventually be disconnected

Screen is running at the unRAID end so the fact a Mac is being used is irrelevant.

 

Got it. Thanks.

 

Now I used putty before with screen so I could close the computer.

 

I have since got a MacBook. Is there anything like screen for it? I only have a laptop so it will eventually be disconnected

 

Also, I see there is a new webgui dominix? Does it have preclear as an add-on from the webgui by any chance?

 

Thanks again

 

 

Sent from my iPhone using Tapatalk

DYNAMIX ... It does NOT include the preclear script as best I can tell. It is however quite nice for overall management.

  • Author

I'll have to research it. I have simple features installed now but I don't remember which files I actually need to erase to remove it. But I guess that's another post.

 

So I can log in to unRAID using "terminal" on the MacBook right?

 

 

Sent from my iPhone using Tapatalk

Sorry. Don't have the time to read right now at work. That's why j made this post.

 

So is my drive failing? I don't know how to interpret these things... The server hasn't moved at all so don't think it's a loose cable...

 

But also, I thought if there's an actual error there would be a red circle next the drive and not a green one which indicates everything is fine?

 

 

Sent from my iPhone using Tapatalk

 

Your questions about this drive are good ones. The short answer is that the drive has not failed. But it has detected 72 internal read errors, and it looks like 26 of them have been reported back to the OS. A pending sector means that an attempt to read a specific sector has failed or triggered error recovery sufficient for the drive to put that sector in a "pending" state. On the next write, that sector will be reevaluated and likely be "reallocated" meaning a spare sector will replace this bad sector. Drives have a limited number of spare sectors to do this type of remapping. The reallocated sector smart attribute, currently zero on this drive, tells you how many times this has happened.

 

I have occasionally seen pending sectors simply go away. Can't explain it but suddenly all the pending sectors are gone. Maybe a bug or something. But when this has happened there seem to be no ill effects. But once sectors actually start to remap, my experience is that they continue to remap and the drive's days are numbered.

 

In your situation I like to run parity checks and monitor the results. If there is a read error unRaid will handle it and use parity to figure out the data and issue the magic write. That write will trigger the reallocation. You'll start to see the smart attributes reflect this type of activity. If you can run 3 straight parity checks and the pending / reallocated sectors hold steady, I would tend to trust the disk and continue to monitor. But if they get worse on every parity check or two, even if only by a few each time, and don't stabilize, I would look to RMA the drive. I think of it like a pot hole. Every time a car goes by a little more pavement is affected and it is just a matter of time before the road is unusable.

 

But everyone has their own tolerance, and some are willing to keep driving and monitor the pothole and only replace the drive when spare sectors are in short supply. The closer the actual value of the smart attribute gets to zero the worse it is, and when it passes the threshold it is considered failed. UnRaid is not monitoring these values to red ball the disk though. It is looking for errors writing to the disk. And frequently write errors are caused by loose cables and not bad disks, so a red ball is actually a poor indicator of bad drive health. 

 

So run a few parity checks and monitor the results. If things get worse every time, RMA the drive. If it stabilizes, keep it but continue to monitor it over time. Hope this helps.

Any pending sector on any array disk will interfere with rebuilding a different disk which has failed. An array with any pending sectors on any array disk is NOT protected. The sectors, corresponding with the pending sectors, on any other failed array disk cannot be determined. The entire disk has not failed but the parts of it that cannot be read are a partial failure. The source of any read errors needs to be corrected.

 

See here: http://lime-technology.com/wiki/index.php/Troubleshooting#Resolving_a_Pending_Sector

Any pending sector on any array disk will interfere with rebuilding a different disk which has failed. An array with any pending sectors on any array disk is NOT protected. The sectors, corresponding with the pending sectors, on any other failed array disk cannot be determined. The entire disk has not failed but the parts of it that cannot be read are a partial failure. The source of any read errors needs to be corrected.

 

See here: http://lime-technology.com/wiki/index.php/Troubleshooting#Resolving_a_Pending_Sector

 

I disagree with the concept that ANY pending reallocation WILL interfere with a rebuild. I would agree that they raise the risk of a problem, but have actually never seen a rebuild negatively impacted by a pending secter. I used to see many more read errors coming back in the IDE drive days, but believe the drives have sophisticated error correcting features build into the hardware and even if the drive has read issues, it is able to return correct data.

Tom did not respond when I asked him how unRAID handles an unreadable sector during a rebuild. It appears not to halt the rebuild but the corresponding data cannot be computed if the sector cannot be read. A sector is marked as unreadable (pending) after the error correction on the disk has failed. A pending sector cannot return correct data. The disk is reporting a read failure has occurred and the integrated error correction has NOT been able to recover.  The corresponding sectors on the other disks may not be important, i.e. the space may be unallocated or the error may appear as static in video or audio. However, they may be very important. Do you want the numbers in a spreadsheet to be modified during a rebuild? How about your tax information? Do you feel lucky?

 

Give any pending sector the same attention that is given a failed disk. The situations are matter of degree; the pending sector represents a partial disk failure.

  • Author

Wow. Thanks guys. You are the best! I didn't even know it was not compatible. It seems to be working fine for me.

 

Guys this whole thing made me realize that I need to somehow have a backup of my cache drive! I configured everything a year ago, but man if I had to do it all over again I would probably put money together for a huge Synology.

 

I guess it's for another thread but I would like my cache drive to be mirrored and be truly plug and play. Drive fails, all I have to do is replace...

 

 

Sent from my iPhone using Tapatalk

  • Author

Just re-read everything. Wow this seems so complex. I need to run that smart test in terminal every time?

 

Is there really no way to have this be more user friendly?

 

How is Synology able to do it so well and keep things simple?

Drive fails, take out replace rebuild all done.

 

Are there any plugins that show the smart errors on the GUI? Or is it enough for me to just watch for errors again on the main page?

 

 

Sent from my iPhone using Tapatalk

I rarely run SMART...only if a disk is failing.

I *do* have 'DYNAMIX' installed. It has a page that queries summary SMART data from all drives automatically when opened.

 

  • Author

Oh okay. Because you guys are talking about these specific errors and I know unRAID only has one error column

 

 

Sent from my iPhone using Tapatalk

  • Author

So what;s the basic rule of thumb? Is it enough to just look at the error column on the main page?

 

The reason I'm being so picky now is because I want to set up a storage server at my parents place too. So I need them to be able to do simple stuff like parity checks, and to detect correct errors, and replace drives as needed (will have hot-swappable cages).

 

But the truth s, if I can't even remember all these things I can't expect them to be able to maintain it. How is this made so simple in Synology systems?

 

Or is it enough to just keep an eye on the error column?

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.