Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Long Term Unraid Drive Issues

Featured Replies

Hi, I've been having off and on drive issues for the last 6 months!

 

About a month ago one of my drives failed and it ended up that the drive wasn't on my parity, or my parity was completely invalid.  After replacing the drive, and replacing the lost data (help from a friend) I rebuilt my parity disk.  It built fine.  Then I ran a parity check because I wasn't confident in my parity at all.  It came through with 0.  So I ran it again! (Yes, I'm a little paranoid at this point) and everything came back clean once again.  A few weeks later, after using a possibly bad power supply for a few days (in the RMA process) my parity disk was disabled.  I've RMA'd that disk since and I'm preclearing.

 

While I was preclearing, I got a LOT of errors in my syslog.  See attached.  Basically a whole lot of:

May 30 05:51:53 Tower kernel: md: diskX read error (Errors)

May 30 05:51:53 Tower kernel: handle_stripe read error: 2434475536/2, count: 1 (Errors)

X = disk2/9

 

I also noticed that the disks where that occurring was in the "not protected" part of the unmenu Main screen, and it looked like it spun down.  I then stopped the array. With my array off, I ran reiserfsck --check and It said there was no superblock.  After checking a few other drives using reiserfsck it's a bigger problem that I thought

 

(Out of 12 disks)

11 disks:

superblock cannot be found

 

1 disk:

Bad root block 0. (which is not disk2 or 9 in the above syslog)

 

It recommends I run --rebuild-sb, which is fine, but I think it's VERY weird that ALL my disks have a superblock issue.  What's going on?  My disk is nearly finished preclearing and I'd rather not do a parity check if all my other data disks are bad.  Using unraid 5.0-rc11.

 

Any help would be appreciated, thanks!

syslog-2013-05-30.zip

You are probably running reiserfsck on the wrong device names.  If you do not run it on the FIRST PARTITION, it will not find a superblock.

 

You must EITHER run it on /dev/sdX1  (note the trailing "1" designating the first partition)

OR on the /dev/mdX device  (the "md" devices already are connected to the first partition)

 

If you run it on the base device name (/dev/sdX without a trailing "1"), it will tell you a superblock can not be found, and it is correct, as you are asking it to look in the wrong place.

 

Joe L.

  • Author

Thanks for the speedy reply.  You're right!  D'oh, new it had to be something since they all were like that.

  • Author

Okay, I checked it (properly) they all seem okay.  I added my replacement party disk, started the array and now it looks like disk2 isn't mounting and says it's 'Unformated'.  Has a bunch of the errors below:

 

May 30 23:14:40 Tower kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)

May 30 23:14:40 Tower kernel: ata7.00: BMDMA stat 0x24 (Drive related)

May 30 23:14:40 Tower kernel: ata7.00: failed command: READ DMA (Minor Issues)

May 30 23:14:40 Tower kernel: ata7.00: cmd c8/00:08:d7:00:01/00:00:00:00:00/e0 tag 0 dma 4096 in (Drive related)

May 30 23:14:40 Tower kernel:          res 51/40:00:d8:00:01/40:00:00:00:00/00 Emask 0x9 (media error) (Errors)

May 30 23:14:40 Tower kernel: ata7.00: status: { DRDY ERR } (Drive related)

May 30 23:14:40 Tower kernel: ata7.00: error: { UNC } (Errors)

May 30 23:14:40 Tower kernel: ata7.00: configured for UDMA/33 (Drive related)

May 30 23:14:40 Tower kernel: ata7: EH complete (Drive related)

 

Is this disk toast?

Okay, I checked it (properly) they all seem okay.  I added my replacement party disk, started the array and now it looks like disk2 isn't mounting and says it's 'Unformated'.  Has a bunch of the errors below:

 

May 30 23:14:40 Tower kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)

May 30 23:14:40 Tower kernel: ata7.00: BMDMA stat 0x24 (Drive related)

May 30 23:14:40 Tower kernel: ata7.00: failed command: READ DMA (Minor Issues)

May 30 23:14:40 Tower kernel: ata7.00: cmd c8/00:08:d7:00:01/00:00:00:00:00/e0 tag 0 dma 4096 in (Drive related)

May 30 23:14:40 Tower kernel:          res 51/40:00:d8:00:01/40:00:00:00:00/00 Emask 0x9 (media error) (Errors)

May 30 23:14:40 Tower kernel: ata7.00: status: { DRDY ERR } (Drive related)

May 30 23:14:40 Tower kernel: ata7.00: error: { UNC } (Errors)

May 30 23:14:40 Tower kernel: ata7.00: configured for UDMA/33 (Drive related)

May 30 23:14:40 Tower kernel: ata7: EH complete (Drive related)

 

Is this disk toast?

To unRAID, any disk that fails to mount is "unformatted" , even if it is formatted. 

(A really poor way of saying "could not mount")

 

the disk that could not mount might have some file-system damage needing correcting.

 

The UNC error(s) are un-crorrectable checksum errors on sectors of a disk.  The contents of a sector do not match the checksum at the end of the sector.

 

We don't know if the two are related, or even if n the same disk until you attach a syslg to your next post (preferably zipped) and also smartctl reports for all your disks.

smartctl -a /dev/sda

smartctl -a /dev/sdb

etc...

 

  • Author

Here's two of the same disks, similar age.  Syslog attached.

 

Disk2 = drive which failed to mount.  Obviously some failures, looks like they're related to the checksum

Statistics for /dev/sdd ST31500341AS_9VS26213

 

smartctl -a -d ata /dev/sdd

smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

=== START OF INFORMATION SECTION ===

Model Family:    Seagate Barracuda 7200.11 family

Device Model:    ST31500341AS

Serial Number:    9VS26213

Firmware Version: CC1H

User Capacity:    1,500,301,910,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:  8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Fri May 31 00:06:42 2013 EDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

See vendor-specific Attribute list for marginal Attributes.

 

General SMART Values:

Offline data collection status:  (0x82) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Enabled.

Self-test execution status:      (  0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: ( 609) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  1) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: (  2) minutes.

SCT capabilities:       (0x103f) SCT Status supported.

SCT Error Recovery Control supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  113  095  006    Pre-fail  Always      -      56809419

  3 Spin_Up_Time            0x0003  100  092  000    Pre-fail  Always      -      0

  4 Start_Stop_Count        0x0032  098  098  020    Old_age  Always      -      2684

  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      1

  7 Seek_Error_Rate        0x000f  078  060  030    Pre-fail  Always      -      65305776

  9 Power_On_Hours          0x0032  064  064  000    Old_age  Always      -      31873

10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      26

12 Power_Cycle_Count      0x0032  099  099  020    Old_age  Always      -      1414

184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0

187 Reported_Uncorrect      0x0032  001  001  000    Old_age  Always      -      272

188 Command_Timeout        0x0032  100  099  000    Old_age  Always      -      65537

189 High_Fly_Writes        0x003a  033  033  000    Old_age  Always      -      67

190 Airflow_Temperature_Cel 0x0022  049  032  045    Old_age  Always  In_the_past 51 (0 17 51 48)

194 Temperature_Celsius    0x0022  051  068  000    Old_age  Always      -      51 (0 14 0 0)

195 Hardware_ECC_Recovered  0x001a  052  016  000    Old_age  Always      -      56809419

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      20

198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      20

199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0

240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      110638357566070

241 Total_LBAs_Written      0x0000  100  253  000    Old_age  Offline      -      3903224057

242 Total_LBAs_Read        0x0000  100  253  000    Old_age  Offline      -      3492161877

 

SMART Error Log Version: 1

ATA Error Count: 344 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

 

Error 344 occurred at disk power-on lifetime: 31873 hours (1328 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 db 00 01 00  Error: UNC at LBA = 0x000100db = 65755

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 d7 00 01 e0 00      00:08:24.793  READ DMA

  27 00 00 00 00 00 e0 00      00:08:24.764  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 02      00:08:24.745  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 02      00:08:24.730  SET FEATURES [set transfer mode]

  27 00 00 00 00 00 e0 00      00:08:24.565  READ NATIVE MAX ADDRESS EXT

 

Error 343 occurred at disk power-on lifetime: 31873 hours (1328 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 dc 00 01 00  Error: UNC at LBA = 0x000100dc = 65756

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 d7 00 01 e0 00      00:07:53.839  READ DMA

  27 00 00 00 00 00 e0 00      00:07:53.632  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 02      00:07:53.513  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 02      00:07:53.481  SET FEATURES [set transfer mode]

  27 00 00 00 00 00 e0 00      00:07:53.453  READ NATIVE MAX ADDRESS EXT

 

Error 342 occurred at disk power-on lifetime: 31873 hours (1328 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 d9 00 01 00  Error: UNC at LBA = 0x000100d9 = 65753

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 d7 00 01 e0 00      00:07:40.185  READ DMA

  27 00 00 00 00 00 e0 00      00:07:40.081  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 02      00:07:39.982  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 02      00:07:39.950  SET FEATURES [set transfer mode]

  27 00 00 00 00 00 e0 00      00:07:39.922  READ NATIVE MAX ADDRESS EXT

 

Error 341 occurred at disk power-on lifetime: 31873 hours (1328 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 d8 00 01 00  Error: UNC at LBA = 0x000100d8 = 65752

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 d7 00 01 e0 00      00:07:37.158  READ DMA

  27 00 00 00 00 00 e0 00      00:07:37.131  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 02      00:07:37.052  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 02      00:07:37.020  SET FEATURES [set transfer mode]

  27 00 00 00 00 00 e0 00      00:07:36.992  READ NATIVE MAX ADDRESS EXT

 

Error 340 occurred at disk power-on lifetime: 31873 hours (1328 days + 1 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 db 00 01 00  Error: UNC at LBA = 0x000100db = 65755

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 d7 00 01 e0 00      00:07:15.559  READ DMA

  27 00 00 00 00 00 e0 00      00:07:15.530  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 02      00:07:15.511  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 02      00:07:15.490  SET FEATURES [set transfer mode]

  27 00 00 00 00 00 e0 00      00:07:12.491  READ NATIVE MAX ADDRESS EXT

 

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

Disk1 - Same type of disk, everything seems okay with it

Statistics for /dev/sdc ST31500341AS_9VS271GN

 

smartctl -a -d ata /dev/sdc

smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

=== START OF INFORMATION SECTION ===

Model Family:    Seagate Barracuda 7200.11 family

Device Model:    ST31500341AS

Serial Number:    9VS271GN

Firmware Version: CC1H

User Capacity:    1,500,301,910,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:  8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Fri May 31 08:24:04 2013 EDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

See vendor-specific Attribute list for marginal Attributes.

 

General SMART Values:

Offline data collection status:  (0x82) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Enabled.

Self-test execution status:      (  0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: ( 617) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  1) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: (  2) minutes.

SCT capabilities:       (0x103f) SCT Status supported.

SCT Error Recovery Control supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  119  099  006    Pre-fail  Always      -      215613250

  3 Spin_Up_Time            0x0003  100  092  000    Pre-fail  Always      -      0

  4 Start_Stop_Count        0x0032  099  099  020    Old_age  Always      -      1652

  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      1

  7 Seek_Error_Rate        0x000f  077  060  030    Pre-fail  Always      -      54235670

  9 Power_On_Hours          0x0032  064  064  000    Old_age  Always      -      31887

10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      26

12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      249

184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0

187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -      0

189 High_Fly_Writes        0x003a  087  087  000    Old_age  Always      -      13

190 Airflow_Temperature_Cel 0x0022  049  033  045    Old_age  Always  In_the_past 51 (0 20 54 49)

194 Temperature_Celsius    0x0022  051  067  000    Old_age  Always      -      51 (0 15 0 0)

195 Hardware_ECC_Recovered  0x001a  051  026  000    Old_age  Always      -      215613250

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0

240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      63406602211363

241 Total_LBAs_Written      0x0000  100  253  000    Old_age  Offline      -      2545477200

242 Total_LBAs_Read        0x0000  100  253  000    Old_age  Offline      -      1772630412

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline      Completed without error      00%        19        -

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

syslog-2013-05-31.zip

The first disk has a lot of pending reallocated sectors, while the second does not.  UnRAID will not work properly with disks that have pending reallocated sectors.

 

Sometimes the pending sectors can indicate that a disk is having problems.  Sometimes it can be a temporary glitch and the sectors will either reallocated when next written or (better) the pending status could be cleared when next attempting to write them.

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.