Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

[Solved] two redballed disks to start New Year...

Featured Replies

Would need some help. Woke up this morning, not with a hangover, but with issues on my unRaid server. At 00:30, the parity sync started and apparently there were some issues. I saw the Parity disk with a red ball this morning (Parity had 700 or so Write errors). All other disks had a green ball. I stopped the array and to my horror I'm seeing now not only the Parity disk with a red ball but also one of the array disks. Syslog and screenshot attached.

 

Is there a way I can still recover from this?

 

syslog : https://dl.dropbox.com/u/3121169/syslog.zip

 

screenshot: https://dl.dropbox.com/u/3121169/unraid.png

 

unraid.png

unraid.png.91cd2a2a8af05193b7dba72ca5961a42.png

Since they both say "no device", odds are it is power related, or disk controller related, or one disk blocking the other on the same controller.

 

Before doing anything, make a copy of your "config" folder.  You can regroup more easily with it in case you need to revert to a known state.

 

Since a "single" write error will make a drive go red, I'd power down completely, then try un-plugging and re-seating anything common with those two drives first and rebooting.

It might be a power splitter, or a back-plane...

 

Odds of two disks both going bad at the exact same time is pretty slim. 

Right now, both are marked in the syslog as either  "missing" or "removed"

 

You've got a lot of drives... is your power supply up to the task?  (what exact make/model are you using?)

 

Joe L.

  • Author

Thanks Joe! I did as you said. After power down and reseating everything, Disk1 came up green, the Parity drive had a blue ball. I rebuild parity and all is green now. However, I do see some 'weird' things :

 

1. a really huge amount of writes...

2. many errors on Disk 1

3. "Parity has not been checked yet." while I did a rebuild of the parity...

 

Are these things to be concerned about?

 

As for the power supply, it's a Trust 520Watt : http://www.trust.com/products/product.aspx?artnr=14996

There are 6 drives in the server, shouldn't 520Watt be enough? (I was thinking of adding some 3Tb drives soon).

 

unRaidStatus.png

I would suspect that is a multi-rail power supply with a capacity of between 16 and 18 amps on the 12 volt rail powering the disks.  I see no indication of it being a single-rail supply, and that feature would be prominently marketed if it existed.

 

With 6 non-green drives, each using 3 amps at spin up, you are probably at/or over the capacity of the power supply, especially since that does not consider the power used by the motherboard or the fans.

 

Typically, the second rail is exclusively limited to the PCIe connectors and not available to the disk power connectors.

 

I would strongly suggest a single-rail power supply of sufficient capacity for your eventual expansion needs.

Post a SMART report for disk1.

+1. 

 

Need to see a smart report for disk1.

 

A rebuild of parity is not a "check" of parity. 

(Your rebuild wrote parity to the parity disk, but you have not yet attempted to read it to verify/check it is correct.)

  • Author

Thanks! I don't seem to be able to run a smart report on Disk1 or the Parity disk though. I first spun up all disks. After doing so, I'm getting :

 

For Disk 1 (similar with Parity disk) :

 

root@Tower:~# smartctl -t short /dev/sda

smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

Smartctl open device: /dev/sda failed: No such device

root@Tower:~#

 

For Disk2 (or all other disks) :

 

root@Tower:~# smartctl -t short /dev/sdb

smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

Sending command: "Execute SMART Short self-test routine immediately in off-line mode".

Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.

Testing has begun.

Please wait 2 minutes for test to complete.

Test will complete after Thu Jan  3 18:55:47 2013

 

Use smartctl -X to abort test.

 

 

Syslog from just before I spun up the disks :

 

...

Jan  3 18:48:29 Tower kernel: mdcmd (84618): spindown 0

Jan  3 18:48:29 Tower emhttp: mdcmd: write: No such device or address

Jan  3 18:48:29 Tower kernel: mdcmd (84619): spindown 1

Jan  3 18:48:32 Tower emhttp: mdcmd: write: No such device or address

Jan  3 18:48:32 Tower kernel: mdcmd (84620): spindown 0

Jan  3 18:48:32 Tower kernel: mdcmd (84621): spindown 1

Jan  3 18:48:32 Tower emhttp: mdcmd: write: No such device or address

Jan  3 18:48:33 Tower last message repeated 2 times

Jan  3 18:48:33 Tower kernel: mdcmd (84622): spindown 0

Jan  3 18:48:33 Tower kernel: mdcmd (84623): spindown 1

Jan  3 18:48:34 Tower emhttp: mdcmd: write: No such device or address

Jan  3 18:48:34 Tower emhttp: mdcmd: write: No such device or address

Jan  3 18:48:34 Tower kernel: mdcmd (84624): spindown 0

Jan  3 18:48:34 Tower kernel: mdcmd (84625): spindown 1

Jan  3 18:48:37 Tower emhttp: mdcmd: write: No such device or address

Jan  3 18:48:37 Tower kernel: mdcmd (84626): spindown 0

Jan  3 18:48:37 Tower kernel: mdcmd (84627): spindown 1

Jan  3 18:48:37 Tower emhttp: mdcmd: write: No such device or address

Jan  3 18:48:39 Tower last message repeated 2 times

Jan  3 18:48:39 Tower kernel: mdcmd (84628): spindown 0

Jan  3 18:48:39 Tower kernel: mdcmd (84629): spindown 1

Jan  3 18:48:39 Tower emhttp: Spinning up all drives...

Jan  3 18:48:39 Tower emhttp: shcmd (332): /usr/sbin/hdparm -S0 /dev/sde &> /dev/null

Jan  3 18:48:39 Tower kernel: mdcmd (84630): spinup 0

Jan  3 18:48:39 Tower kernel: mdcmd (84631): spinup 1

Jan  3 18:48:39 Tower kernel: mdcmd (84632): spinup 2

Jan  3 18:48:39 Tower kernel: mdcmd (84633): spinup 3

Jan  3 18:48:39 Tower kernel: mdcmd (84634): spinup 5

Jan  3 18:49:29 Tower kernel: scsi_verify_blk_ioctl: 36 callbacks suppressed

Jan  3 18:49:29 Tower kernel: hdparm: sending ioctl 2285 to a partition!

Jan  3 18:49:33 Tower last message repeated 5 times

Jan  3 18:49:33 Tower kernel: smartctl: sending ioctl 2285 to a partition!

Jan  3 18:49:33 Tower last message repeated 3 times

Jan  3 18:49:34 Tower kernel: scsi_verify_blk_ioctl: 14 callbacks suppressed

Jan  3 18:49:34 Tower kernel: smartctl: sending ioctl 2285 to a partition!

Jan  3 18:49:34 Tower last message repeated 9 times

Jan  3 18:50:34 Tower kernel: scsi_verify_blk_ioctl: 12 callbacks suppressed

Jan  3 18:50:34 Tower kernel: hdparm: sending ioctl 2285 to a partition!

Jan  3 18:50:38 Tower last message repeated 5 times

Jan  3 18:50:38 Tower kernel: smartctl: sending ioctl 2285 to a partition!

Jan  3 18:50:38 Tower last message repeated 3 times

Jan  3 18:51:39 Tower kernel: scsi_verify_blk_ioctl: 36 callbacks suppressed

Jan  3 18:51:39 Tower kernel: hdparm: sending ioctl 2285 to a partition!

Jan  3 18:51:42 Tower last message repeated 5 times

Jan  3 18:51:42 Tower kernel: smartctl: sending ioctl 2285 to a partition!

Jan  3 18:51:42 Tower last message repeated 3 times

Jan  3 18:52:43 Tower kernel: scsi_verify_blk_ioctl: 36 callbacks suppressed

Jan  3 18:52:43 Tower kernel: hdparm: sending ioctl 2285 to a partition!

Jan  3 18:52:46 Tower last message repeated 5 times

Jan  3 18:52:46 Tower kernel: smartctl: sending ioctl 2285 to a partition!

Jan  3 18:52:46 Tower last message repeated 3 times

Jan  3 18:53:47 Tower kernel: scsi_verify_blk_ioctl: 36 callbacks suppressed

Jan  3 18:53:47 Tower kernel: hdparm: sending ioctl 2285 to a partition!

Jan  3 18:53:51 Tower last message repeated 5 times

...

 

Disk2 is on the same controller as the Parity disk and Disk1. No issues with Disk2 though.

 

Any suggestions or should I first check the parity and then shut everything down again and try once more to reseat modules/swap cables/... ?

It really sounds like your hard disks get underpowered...

 

I would follow the suggestion of Joe and look for a PSU replacement!

 

 

 

As for the power supply, it's a Trust 520Watt : http://www.trust.com/products/product.aspx?artnr=14996

There are 6 drives in the server, shouldn't 520Watt be enough? (I was thinking of adding some 3Tb drives soon).

 

 

Whats your Specs for your build?

CPU/MB/RAM? http://extreme.outervision.com/psucalculatorlite.jsp check how much power u need..

520W should be good but can't say without the other info.. Celeron or i3/i5 power hogs. I think a good PSU is the most important part of your build. More drives more power..Never heard of Trust so cant trust it.. go with a name you know..

1. XFX

2. Corsair

3. PC Power & Cooling/SeaSonic

4. Antec

5. OCZ

 

What happens when you disconnect all drives except those 2 in Question? If not a power issue check cables.

 

Is a SMART test essentially the same as the SMART option in the BIOS.. ? sorry if this is a thread Hijack but seems like a good time to ask.

 

 

As Joe said, amperage on the 12 volt rail is more important than wattage. The link given for the Trust 520Watt does not include any information on the number or rails or their amperage. Most inexpensive PSUs contain 2 or more 12 volt rails with only 17 or 18 amps per rail. These types of PSUs are not suitable for for systems with many HDDs. See here: http://lime-technology.com/forum/index.php?topic=12219.0

  • Author

Finally got some time to do some tests. I have a kit to connect external drives via USB. Used the power supply of it to power one of those 2 disks (Parity and Disk1). Now, with one of them on the external power, things seem better. I'm able to do smart tests now.

 

For Disk1, I think (please correct me if I'm wrong) all is fine :

 

 

root@Tower:~# smartctl -a -d ata /dev/sda

smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

=== START OF INFORMATION SECTION ===

Device Model:    WDC WD2002FAEX-007BA0

Serial Number:    WD-WMAY03237929

Firmware Version: 05.01D05

User Capacity:    2,000,398,934,016 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Tue Jan  8 14:04:36 2013 CET

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x85) Offline data collection activity

was aborted by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      (  0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (30180) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  2) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: (  5) minutes.

SCT capabilities:       (0x3037) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0

  3 Spin_Up_Time            0x0027  253  253  021    Pre-fail  Always      -      3500

  4 Start_Stop_Count        0x0032  099  099  000    Old_age  Always      -      1194

  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0

  9 Power_On_Hours          0x0032  095  095  000    Old_age  Always      -      4010

10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      0

11 Calibration_Retry_Count 0x0032  100  100  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      947

192 Power-Off_Retract_Count 0x0032  199  199  000    Old_age  Always      -      935

193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      258

194 Temperature_Celsius    0x0022  123  105  000    Old_age  Always      -      29

196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0030  200  200  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      0

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline      Completed without error      00%      4010        -

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

root@Tower:~#

 

 

 

For the Parity disk however, I think the RAW_VALUE numbers are good but there are read errors (did the short smart test 3 times, 3 times read error) :

 

root@Tower:~# smartctl -a -d ata /dev/sdd

smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

=== START OF INFORMATION SECTION ===

Model Family:    Western Digital Caviar Green family

Device Model:    WDC WD20EADS-32S2B0

Serial Number:    WD-WCAVY2809029

Firmware Version: 01.00A01

User Capacity:    2,000,398,934,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:  8

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Tue Jan  8 14:26:49 2013 CET

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x85) Offline data collection activity

was aborted by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      ( 121) The previous self-test completed having

the read element of the test failed.

Total time to complete Offline

data collection: (40380) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  2) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: (  5) minutes.

SCT capabilities:       (0x303f) SCT Status supported.

SCT Error Recovery Control supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0

  3 Spin_Up_Time            0x0027  204  142  021    Pre-fail  Always      -      6758

  4 Start_Stop_Count        0x0032  098  098  000    Old_age  Always      -      2424

  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002e  100  253  000    Old_age  Always      -      0

  9 Power_On_Hours          0x0032  082  082  000    Old_age  Always      -      13808

10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      0

11 Calibration_Retry_Count 0x0032  100  100  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  099  099  000    Old_age  Always      -      1441

192 Power-Off_Retract_Count 0x0032  199  199  000    Old_age  Always      -      1354

193 Load_Cycle_Count        0x0032  172  172  000    Old_age  Always      -      86282

194 Temperature_Celsius    0x0022  121  105  000    Old_age  Always      -      31

196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      1

198 Offline_Uncorrectable  0x0030  200  200  000    Old_age  Offline      -      1

199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      0

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline      Completed: read failure      90%    13808        2138713936

# 2  Short offline      Completed: read failure      90%    13807        2138713936

# 3  Short offline      Completed: read failure      80%    13807        2138713936

# 4  Short offline      Completed without error      00%      4524        -

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

root@Tower:~#

 

Is this a bad thing or should I just write & check parity again?

 

I will get myself a decent Power supply. Thanks for the pointers, will first read up on it.

 

Another question I have : I enabled an automatic parity check every 1st of the month. I get an email with the results. Is it possible to do a smart test on all drives automatically once in a while and get also an email with the results? (preferable in a user friendly way, just enable something in unmenu or simplefeatures, not with setting up cron scripts, etc. (if that's the only way to do it then I can try to set it up))

 

As for the specs of my system :

unRAID Version: unRAID Server Pro, Version 5.0-rc8a

Motherboard: ASUSTeK - P8B75-M

Processor: Intel® CoreTM i5-3470 CPU @ 3.20GHz - 3.2 GHz

Cache: L1 = 32 kB  L2 = 256 kB  L3 = 6144 kB 

Memory: 8 GB - DIMM0 = 1600 MHz  DIMM1 = 1600 MHz  DIMM2 = 1600 MHz  DIMM3 = 1600 MHz 

Network: 1000Mb/s - Full Duplex

 

How would this MB be classified on http://extreme.outervision.com/psucalculatorlite.jsp ? Desktop/Server/Regular/High End ?

 

Thanks everybody for all the help I got already on this! Really appreciate it!

The parity disk needs to be totally rewritten. Use the New Config button on the Utils tab. After the rebuild the Current_Pending_Sector RAW_VALUE should be zero. The Current_Pending_Sector RAW_VALUE must be zero.

  • Author

I did a preclear of the parity disk. After that, the smart report is without errors. Result of preclear :

 

1 sector was pending re-allocation before the start of the preclear.

1 sector was pending re-allocation after pre-read in cycle 1 of 1.

0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.

0 sectors are pending re-allocation at the end of the preclear,

    a change of -1 in the number of sectors pending re-allocation.

0 sectors had been re-allocated before the start of the preclear.

0 sectors are re-allocated at the end of the preclear,

    the number of sectors re-allocated did not change.

 

Then I used the New Config util and synced parity again. Did a parity check afterwards. All is working fine now. I ordered a Corsair AX760 PSU, it will arrive in some days.

 

Thanks everybody for the help on this.

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.