Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

DISK_DSBL -- Doesn't sound good

Featured Replies

Red always equals bad, so i am trying to figure out what's happened.

 

If I go to Main screen in UnMenu, beside my one drive it says:

DISK_DSBL

 

I can still browse to the drive, but it looks like it has fallen out of the UnRaid array (its not protected now).

I'm also thinking the "errors" at the end may also be a big concern.

 

WD_error.JPG

 

The drive has 1.5TB of data on it and its only a month old.

 

Suggestions on what to do from here?

  • Replies 64
  • Views 16.4k
  • Created
  • Last Reply

1.  Replace the drive OR copy the data from the virtual drive to somewhere else.

 

2. Remove drive from array, and do diagnostics on it.... LOTS of diagnostics.... and either RMA it or junk it.

  • Author

Is their a specific series of steps I should follow to complete this?

 

1.  Stop the array

2.  Shut machine off

3.  I've just removed another 1.5TB drive from my Popcorn Hour, so I can add it to the system and run pre_clear on it.

 

After I do this, do I just use my Windows Explorer and drag and drop the contents from the failing drive over to the new drive

 

Where to go after this?

http://tower:8080/array_management  and the select "Check and Correct Parity" ?

 

I don't want to mess this up  :)

 

Thanks

 

 

Red always equals bad, so i am trying to figure out what's happened.

 

If I go to Main screen in UnMenu, beside my one drive it says:

DISK_DSBL

 

I can still browse to the drive, but it looks like it has fallen out of the UnRaid array (its not protected now).

I'm also thinking the "errors" at the end may also be a big concern.

 

WD_error.JPG

 

The drive has 1.5TB of data on it and its only a month old.

 

Suggestions on what to do from here?

The very first thing you should do is post a copy of your syslog.  Since you have unMENU loaded, it is as simple as clicking on the link on the syslog plug-in page and then attaching the file to your next post to this thread.

 

Only after looking at how the drive was disabled can we know if the drive itself is at fault, or something else.

 

Also, the disk is disabled because a write to it failed.  It could be the drive itself, or a loose cable, or a loose interface card.  

 

Since you have unMENU installed, you can request "SMART" status reports easily through it.  Post the "status report" output.

 

Also, since you have one disabled drive, it is being "simulated" by reading parity and all the other data drives.  Until you get this resolved, do not add or remove ANY other drives or you will lose the data being simulated by parity and the other disks.

 

You should correct the problem as soon as possible, since if a second drive were to fail, you would lose the data on both failed drives.  It is not just the data on the failed drive that is at risk, all of your data is at the same risk of a second concurrent drive failure.

 

See this link in the wiki:

http://lime-technology.com/wiki/index.php?title=Troubleshooting#What_do_I_do_if_I_get_a_red_ball_next_to_a_hard_disk.3F

 

Do not be misled by the fact that you can still read and write to the drive with a red ball indicator. You are, in fact, writing to the parity drive as if the failed drive was working. When reading, you are reading all of the remaining drives and re-constructing the data on the failed drive. If a drive has a red ball on the unRAID management page, it has been taken out of service. You will need to take corrective action, as a second concurrent disk failure will almost certainly result in lost data.

 

DO NOT press the button labeled "Restore" on the unRAID interface.  It does not restore a disk, but instead sets a new initial configuration based on the current assigned and WORKING disks.  It immediately throws away any old parity data.  If you were to press it now you would erase all knowledge of the failed disk and anything that was on it.   If you replace the disk, you only need to press the "Start" button to get your data rebuilt onto it.

 

So, first post a syslog, before you reboot, before you power down to check the cabling.

 

Joe L.

Is their a specific series of steps I should follow to complete this?

 

1.  Stop the array

2.  Shut machine off

3.  I've just removed another 1.5TB drive from my Popcorn Hour, so I can add it to the system and run pre_clear on it.

 

After I do this, do I just use my Windows Explorer and drag and drop the contents from the failing drive over to the new drive

 

Where to go after this?

http://tower:8080/array_management  and the select "Check and Correct Parity" ?

 

I don't want to mess this up  :)

 

Thanks

No, that is NOT what to do.  You do not need to copy the files and in fact you cannot add any drives to the protected array while it is in a degraded state.

 

You will not be allowed to check parity... You have only one possible solution.

 

1. Post the syslog.

2. Get SMART and hdparm reportss for the failed drive.

If it has actually failed, as opposed to a bad or loose connection, then replace the drive, press "Start" after checking the checkbox under it, and have unRAID build your old contents onto it. 

 

If we suspect a bad connection, you will stop the array, un-assign the drive, re-start the array (It will show it as missing, but still simulate it with the contents) then stop the array once more, re-assign the disk, then press "Start" once more.  It will re-construct the drive.

 

If you are certain it was a loose cable, you can use the "Trust My Parity" procedure as described in the wiki. (It is a special procedure where you do press the button I said not to, but invoke a special command after pressing it but before starting the array so it does not invalidate your parity and throw away all your data)

 

Ask questions BEFORE you do anything, they are a lot easier to answer BEFORE you do something to endanger your data. 

 

Joe L.

  • Author

OK... I will stop doing anything with it right now.

 

1.  Syslog Attached

 

2.  Smart Status Report

 

SMART status Info for /dev/sdg

 

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

=== START OF INFORMATION SECTION ===

Device Model:    WDC WD15EADS-00P8B0

Serial Number:    WD-WMAVU0072618

Firmware Version: 01.00A01

User Capacity:    1,500,301,910,016 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Sun Oct 25 17:34:15 2009 GMT+5

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x84) Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      (  0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (32760) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  2) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: (  5) minutes.

SCT capabilities:       (0x303f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0

  3 Spin_Up_Time            0x0027  200  179  021    Pre-fail  Always      -      4975

  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      119

  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002e  100  253  000    Old_age  Always      -      0

  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      669

10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      0

11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      57

192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      44

193 Load_Cycle_Count        0x0032  199  199  000    Old_age  Always      -      3403

194 Temperature_Celsius    0x0022  122  115  000    Old_age  Always      -      28

196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0030  200  200  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      0

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

3.  HDParm Info

 

HDParm Info for /dev/sdg

 

/dev/sdg:

 

ATA device, with non-removable media

Model Number:      WDC WD15EADS-00P8B0                   

Serial Number:      WD-WMAVU0072618

Firmware Revision:  01.00A01

Transport:          Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5

Standards:

Supported: 8 7 6 5

Likely used: 8

Configuration:

Logical max current

cylinders 16383 16383

heads 16 16

sectors/track 63 63

--

CHS current addressable sectors:  16514064

LBA    user addressable sectors:  268435455

LBA48  user addressable sectors: 2930277168

device size with M = 1024*1024:    1430799 MBytes

device size with M = 1000*1000:    1500301 MBytes (1500 GB)

Capabilities:

LBA, IORDY(can be disabled)

Queue depth: 32

Standby timer values: spec'd by Standard, with device specific minimum

R/W multiple sector transfer: Max = 16 Current = 0

Recommended acoustic management value: 128, current value: 254

DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6

    Cycle time: min=120ns recommended=120ns

PIO: pio0 pio1 pio2 pio3 pio4

    Cycle time: no flow control=120ns  IORDY flow control=120ns

Commands/features:

Enabled Supported:

  * SMART feature set

    Security Mode feature set

  * Power Management feature set

  * Write cache

  * Look-ahead

  * Host Protected Area feature set

  * WRITE_BUFFER command

  * READ_BUFFER command

  * NOP cmd

  * DOWNLOAD_MICROCODE

    Power-Up In Standby feature set

  * SET_FEATURES required to spinup after power up

    SET_MAX security extension

    Automatic Acoustic Management feature set

  * 48-bit Address feature set

  * Device Configuration Overlay feature set

  * Mandatory FLUSH_CACHE

  * FLUSH_CACHE_EXT

  * SMART error logging

  * SMART self-test

  * General Purpose Logging feature set

  * 64-bit World wide name

  * WRITE_UNCORRECTABLE_EXT command

  * {READ,WRITE}_DMA_EXT_GPL commands

  * Segmented DOWNLOAD_MICROCODE

  * SATA-I signaling speed (1.5Gb/s)

  * SATA-II signaling speed (3.0Gb/s)

  * Native Command Queueing (NCQ)

  * Host-initiated interface power management

  * Phy event counters

  * unknown 76[12]

    DMA Setup Auto-Activate optimization

  * Software settings preservation

  * SMART Command Transport (SCT) feature set

  * SCT Long Sector Access (AC1)

  * SCT LBA Segment Access (AC2)

  * SCT Error Recovery Control (AC3)

  * SCT Features Control (AC4)

  * SCT Data Tables (AC5)

    unknown 206[12] (vendor specific)

    unknown 206[13] (vendor specific)

Security:

Master password revision code = 65534

supported

not enabled

not locked

not frozen

not expired: security count

supported: enhanced erase

334min for SECURITY ERASE UNIT. 334min for ENHANCED SECURITY ERASE UNIT.

Logical Unit WWN Device Identifier: 50014ee017dfab1

NAA : 5

IEEE OUI : 14ee

Unique ID : 017dfab1

Checksum: correct

 

  • Author

My syslog was too big for one message, so here is everything before today ( the first one had today's logs )

 

 

For reference, all I have done, over the past few days, is link my Popcorn Hour to the server and then week it ( ie. get all the file names proper, add some .nfo files, images, etc )

The syslog you posted is filled with repeating messages.  The syslog rotation has already copied your original syslog to an alternate file.

 

Type

ls -l /var/log/syslog*

to see them all.

 

As an example, on my server it looks like this:

ls -l /var/log/syslog*

-rw-r--r-- 1 root root   17347 Oct 25 01:54 /var/log/syslog

-rw-r--r-- 1 root root 1088573 Oct 16 01:36 /var/log/syslog.1

 

The syslog.1 file is the first part of my syslog.  when it filled, it was "rotated" out so a single file would not use up all my ram.

 

You may have a few syslog files in your /var/log folder.  We need to see the earlier one, where the error first occurred.

 

You may need to copy it to your flash drive first and then upload it.

 

Joe L.

 

  • Author

If the issue isn't in the log I just posted, it had to be in this one ( as this the balance of my syslog ) from a week ago.

 

This is a new drive from a few weeks ago that I pre-cleared multiple times.

 

Here are some of the pre-clear results from when I first got it ( if it helps any )

 

 

Disk Temperature: 27C, Elapsed Time:  22:33:36

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdb

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Post-Read in progress: 99% complete.

(  1,500,291,072,000  of  1,500,301,910,016  bytes read ) 43.7 MB/s

Disk Temperature: 27C, Elapsed Time:  22:34:47

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdb

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Disk Post-Clear-Read completed                                DONE

Disk Temperature: 26C, Elapsed Time:  22:35:56

============================================================================

==

== Disk /dev/sdb has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

63c63

< 193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      49

---

> 193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      51

============================================================================

root@Tower:/boot#

 

 

 

 

 

 

 

 

 

 

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdb

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Post-Read in progress: 99% complete.

(  1,498,646,016,000  of  1,500,301,910,016  bytes read ) 46.1 MB/s

Disk Temperature: 25C, Elapsed Time:  22:34:39

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdb

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Post-Read in progress: 99% complete.

(  1,500,291,072,000  of  1,500,301,910,016  bytes read ) 46.2 MB/s

Disk Temperature: 25C, Elapsed Time:  22:35:48

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdb

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Disk Post-Clear-Read completed                                DONE

Disk Temperature: 25C, Elapsed Time:  22:36:57

============================================================================

==

== Disk /dev/sdb has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

58c58

<  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0

---

>  7 Seek_Error_Rate        0x002e  100  253  000    Old_age  Always      -      0

63c63

< 193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      76

---

> 193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      77

============================================================================

root@Tower:/boot#

 

 

It's in the latter log... look at Oct 24 20:57:52

 

I'm wondering if the power blip you had had some lingering effects.  What make and size of UPS do you have?

 

I suggest changing your disks from AHCI back to IDE.

You read my mind...   nice...

 

Yes, it looks like a poor connection more than a failed drive.  The SMART report and hdparm look good.

 

The first errors I see are here:

Oct 24 20:57:52 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 24 20:57:52 Tower kernel: ata2.00: irq_stat 0x40000001

Oct 24 20:57:52 Tower kernel: ata2.00: cmd 25/00:08:87:40:d4/00:00:31:00:00/e0 tag 0 dma 4096 in

Oct 24 20:57:52 Tower kernel:          res 41/04:00:87:40:d4/00:00:31:00:00/e0 Emask 0x1 (device error)

Oct 24 20:57:52 Tower kernel: ata2.00: status: { DRDY ERR }

Oct 24 20:57:52 Tower kernel: ata2.00: error: { ABRT }

Oct 24 20:57:52 Tower kernel: ata2.00: configured for UDMA/133

Oct 24 20:57:52 Tower kernel: ata2: EH complete

Oct 24 20:58:02 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 24 20:58:02 Tower kernel: ata2.00: irq_stat 0x40000001

Oct 24 20:58:02 Tower kernel: ata2.00: cmd 25/00:08:87:40:d4/00:00:31:00:00/e0 tag 0 dma 4096 in

Oct 24 20:58:02 Tower kernel:          res 41/04:00:87:40:d4/00:00:31:00:00/e0 Emask 0x1 (device error)

Oct 24 20:58:02 Tower kernel: ata2.00: status: { DRDY ERR }

Oct 24 20:58:02 Tower kernel: ata2.00: error: { ABRT }

Oct 24 20:58:02 Tower kernel: ata2.00: configured for UDMA/133

Oct 24 20:58:02 Tower kernel: ata2: EH complete

Oct 24 20:58:12 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 24 20:58:12 Tower kernel: ata2.00: irq_stat 0x40000001

Oct 24 20:58:12 Tower kernel: ata2.00: cmd 25/00:08:87:40:d4/00:00:31:00:00/e0 tag 0 dma 4096 in

Oct 24 20:58:12 Tower kernel:          res 41/04:00:87:40:d4/00:00:31:00:00/e0 Emask 0x1 (device error)

Oct 24 20:58:12 Tower kernel: ata2.00: status: { DRDY ERR }

Oct 24 20:58:12 Tower kernel: ata2.00: error: { ABRT }

Oct 24 20:58:12 Tower kernel: ata2.00: configured for UDMA/133

Oct 24 20:58:12 Tower kernel: ata2: EH complete

Oct 24 20:58:22 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 24 20:58:22 Tower kernel: ata2.00: irq_stat 0x40000001

Oct 24 20:58:22 Tower kernel: ata2.00: cmd 25/00:08:87:40:d4/00:00:31:00:00/e0 tag 0 dma 4096 in

Oct 24 20:58:22 Tower kernel:          res 41/04:00:87:40:d4/00:00:31:00:00/e0 Emask 0x1 (device error)

Oct 24 20:58:22 Tower kernel: ata2.00: status: { DRDY ERR }

Oct 24 20:58:22 Tower kernel: ata2.00: error: { ABRT }

Oct 24 20:58:22 Tower kernel: ata2.00: configured for UDMA/133

Oct 24 20:58:22 Tower kernel: ata2: EH complete

Oct 24 20:58:31 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 24 20:58:31 Tower kernel: ata2.00: irq_stat 0x40000001

Oct 24 20:58:31 Tower kernel: ata2.00: cmd 25/00:08:87:40:d4/00:00:31:00:00/e0 tag 0 dma 4096 in

Oct 24 20:58:31 Tower kernel:          res 41/04:00:87:40:d4/00:00:31:00:00/e0 Emask 0x1 (device error)

Oct 24 20:58:31 Tower kernel: ata2.00: status: { DRDY ERR }

Oct 24 20:58:31 Tower kernel: ata2.00: error: { ABRT }

Oct 24 20:58:31 Tower kernel: ata2.00: configured for UDMA/133

Oct 24 20:58:31 Tower kernel: ata2: EH complete

Oct 24 20:58:41 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 24 20:58:41 Tower kernel: ata2.00: irq_stat 0x40000001

Oct 24 20:58:41 Tower kernel: ata2.00: cmd 25/00:08:87:40:d4/00:00:31:00:00/e0 tag 0 dma 4096 in

Oct 24 20:58:41 Tower kernel:          res 41/04:00:87:40:d4/00:00:31:00:00/e0 Emask 0x1 (device error)

Oct 24 20:58:41 Tower kernel: ata2.00: status: { DRDY ERR }

Oct 24 20:58:41 Tower kernel: ata2.00: error: { ABRT }

Oct 24 20:58:41 Tower kernel: ata2.00: configured for UDMA/133

Oct 24 20:58:41 Tower kernel: sd 1:0:0:0: [sdb] Result: hostbyte=0x00 driverbyte=0x08

Oct 24 20:58:41 Tower kernel: sd 1:0:0:0: [sdb] Sense Key : 0xb [current] [descriptor]

Oct 24 20:58:41 Tower kernel: Descriptor sense data with sense descriptors (in hex):

Oct 24 20:58:41 Tower kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00

Oct 24 20:58:41 Tower kernel:         31 d4 40 87

Oct 24 20:58:41 Tower kernel: sd 1:0:0:0: [sdb] ASC=0x0 ASCQ=0x0

Oct 24 20:58:41 Tower kernel: end_request: I/O error, dev sdb, sector 835993735

Oct 24 20:58:41 Tower kernel: ata2: EH complete

Oct 24 20:58:41 Tower kernel: md: disk1 read error

 

I'd stop the array, power down, re-seat the connectors (both power AND data) especially if you are using any kind of drive trays, and then power back up.

 

If the disk still looks good via hdparm and smart reports and ALL the other disks are still "green" you are a candidate for the trust my parity procedure (if the hdparm and smartctl reports print it is good, if they are unavailable, the communications to the disk failed again) 

This SHOULD NOT be used if the disk has been written to since the failure since it will assume the physical disk is correct and the virtual disk is not. 

 

The procedure is described here:

http://lime-technology.com/wiki/index.php?title=Make_unRAID_Trust_the_Parity_Drive,_Avoid_Rebuilding_Parity_Unnecessarily

 

You must follow it exactly.   Read it through and ask questions before starting it.  You must issue a

mdcmd set invalidslot 99

command and get the expected response BEFORE pressing the "Start" button, but after pressing the button labeled "restore"

 

 

 

What mobo do you have... and does it have an Intel chipset?

 

My money is on AHIC issues... changing the drives to IDE will eliminate that.

  • Author

It's in the latter log... look at Oct 24 20:57:52

 

I'm wondering if the power blip you had had some lingering effects.  What make and size of UPS do you have?

 

I suggest changing your disks from AHCI back to IDE.

 

 

It's in the latter log... look at Oct 24 20:57:52

 

I'm wondering if the power blip you had had some lingering effects.  What make and size of UPS do you have?

 

I suggest changing your disks from AHCI back to IDE.

 

I have a APC 550V UPS, which seems to handle things fine.

 

If I recall correctly, I think I changed IDE to AHCI as my server was picking up one of my drives as IDE and not SATA  ( as the port may have been for PATA ).

 

Should I worry about this change now or later?

It's in the latter log... look at Oct 24 20:57:52

 

I'm wondering if the power blip you had had some lingering effects.  What make and size of UPS do you have?

 

I suggest changing your disks from AHCI back to IDE.

 

 

It's in the latter log... look at Oct 24 20:57:52

 

I'm wondering if the power blip you had had some lingering effects.  What make and size of UPS do you have?

 

I suggest changing your disks from AHCI back to IDE.

 

I have a APC 550V UPS, which seems to handle things fine.

 

If I recall correctly, I think I changed IDE to AHCI as my server was picking up one of my drives as IDE and not SATA  ( as the port may have been for PATA ).

 

Should I worry about this change now or later?

I don't think it has anything to do with this... (but I've been proven wrong in the past...  It looks more like a cabling issue than anything else so far)
  • Author

What mobo do you have... and does it have an Intel chipset?

 

My money is on AHIC issues... changing the drives to IDE will eliminate that.

 

I bought one of the recommended ones in the forums:

Gigabyte MA74GM-S2 with AMD

  • Author

This SHOULD NOT be used if the disk has been written to since the failure since it will assume the physical disk is correct and the virtual disk is not. 

 

I didn't realize it was down until I posted here, so I suspect I have done some minor writing to it.

 

I have been adding .nfo files to some of the folders, however as they are adding through shares/virtual drives, I can't say exactly which drives I have written to ( but it can be assumed at least once I hit the drive in question ).

That's got a good chipset (AMD SB700) and the Smartctl output is fine so the disk itself does not seem to be the problem, so see what happens with reseated cables.

  • Author

I've shutdown, re-seated cables ( even replaced the SATA cable to the drive in question ).

 

I've started it back up and it still shows as DISK_DSBL

 

I've re-ran those two reports below.

 

What next?

 

 

HDParm Info for /dev/sdb WDC_WD15EADS-00P8B0_WD-WMAVU0072618

 

/dev/sdb:

 

ATA device, with non-removable media

Model Number:      WDC WD15EADS-00P8B0                   

Serial Number:      WD-WMAVU0072618

Firmware Revision:  01.00A01

Transport:          Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5

Standards:

Supported: 8 7 6 5

Likely used: 8

Configuration:

Logical max current

cylinders 16383 16383

heads 16 16

sectors/track 63 63

--

CHS current addressable sectors:  16514064

LBA    user addressable sectors:  268435455

LBA48  user addressable sectors: 2930277168

device size with M = 1024*1024:    1430799 MBytes

device size with M = 1000*1000:    1500301 MBytes (1500 GB)

Capabilities:

LBA, IORDY(can be disabled)

Queue depth: 32

Standby timer values: spec'd by Standard, with device specific minimum

R/W multiple sector transfer: Max = 16 Current = 0

Recommended acoustic management value: 128, current value: 254

DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6

    Cycle time: min=120ns recommended=120ns

PIO: pio0 pio1 pio2 pio3 pio4

    Cycle time: no flow control=120ns  IORDY flow control=120ns

Commands/features:

Enabled Supported:

  * SMART feature set

    Security Mode feature set

  * Power Management feature set

  * Write cache

  * Look-ahead

  * Host Protected Area feature set

  * WRITE_BUFFER command

  * READ_BUFFER command

  * NOP cmd

  * DOWNLOAD_MICROCODE

    Power-Up In Standby feature set

  * SET_FEATURES required to spinup after power up

    SET_MAX security extension

    Automatic Acoustic Management feature set

  * 48-bit Address feature set

  * Device Configuration Overlay feature set

  * Mandatory FLUSH_CACHE

  * FLUSH_CACHE_EXT

  * SMART error logging

  * SMART self-test

  * General Purpose Logging feature set

  * 64-bit World wide name

  * WRITE_UNCORRECTABLE_EXT command

  * {READ,WRITE}_DMA_EXT_GPL commands

  * Segmented DOWNLOAD_MICROCODE

  * SATA-I signaling speed (1.5Gb/s)

  * SATA-II signaling speed (3.0Gb/s)

  * Native Command Queueing (NCQ)

  * Host-initiated interface power management

  * Phy event counters

  * unknown 76[12]

    DMA Setup Auto-Activate optimization

  * Software settings preservation

  * SMART Command Transport (SCT) feature set

  * SCT Long Sector Access (AC1)

  * SCT LBA Segment Access (AC2)

  * SCT Error Recovery Control (AC3)

  * SCT Features Control (AC4)

  * SCT Data Tables (AC5)

    unknown 206[12] (vendor specific)

    unknown 206[13] (vendor specific)

Security:

Master password revision code = 65534

supported

not enabled

not locked

not frozen

not expired: security count

supported: enhanced erase

334min for SECURITY ERASE UNIT. 334min for ENHANCED SECURITY ERASE UNIT.

Logical Unit WWN Device Identifier: 50014ee017dfab1

NAA : 5

IEEE OUI : 14ee

Unique ID : 017dfab1

Checksum: correct

 

 

 

Statistics for /dev/sdb WDC_WD15EADS-00P8B0_WD-WMAVU0072618

 

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

=== START OF INFORMATION SECTION ===

Device Model:    WDC WD15EADS-00P8B0

Serial Number:    WD-WMAVU0072618

Firmware Version: 01.00A01

User Capacity:    1,500,301,910,016 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  8

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Sun Oct 25 18:34:57 2009 GMT+5

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x82) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Enabled.

Self-test execution status:      (  0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (32760) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: (  2) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: (  5) minutes.

SCT capabilities:       (0x303f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0

  3 Spin_Up_Time            0x0027  200  179  021    Pre-fail  Always      -      4975

  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      120

  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002e  100  253  000    Old_age  Always      -      0

  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      670

10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      0

11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      58

192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      44

193 Load_Cycle_Count        0x0032  199  199  000    Old_age  Always      -      3414

194 Temperature_Celsius    0x0022  123  115  000    Old_age  Always      -      27

196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0030  200  200  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      0

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

 

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

 

  • Author

Syslog attached here, in case its also needed.

[i've started it back up and it still shows as DISK_DSBL

 

I've re-ran those two reports below.

 

What next?

 

Once failed, it will not reuse it unless it "thinks" you have replaced it.  To do that you must make it "forget" the serial number of the old assigned drive.

 

You now only need to

Stop the array

Un-assign the disk that has failed

Start the array with the disk un-assigned

Stop the array again

Re-assign the disk that had failed

Start the array (using the "Start" button), to allow the failed disk to be rebuilt onto the "replacement"  since the slot was previously "un-assigned" it will now use the same drive as its own "replacement"

when the rebuild is complete, you will have parity protection once more.  The "rebuild" will take about the same time as a full parity calc.  Until it is done, you are at risk of losing data if concurrently another drive were to 'really" fail.

 

Joe L.

 

Edit: fixed my cut-and-replace...

  • Author

 

You now only need to

Stop the array

Un-assign the disk that has failed

Start the array with the disk un-assigned

Stop the array again

Re-assign the disk that had failed

 

--- Done to Here ----

 

Start the array (using the "Start" button), to allow the failed disk to be rebuilt onto the "replacement"  since the slot was previously "un-assigned" it will now use the same drive as its own "replacement"

when the rebuild is complete, you will have parity protection once more.  The "rebuild" will take about the same time as a full parity calc.  Until it is done, you are at risk of losing data if concurrently another drive were to 'really" fail.

 

Start is "grey" right now, unless I select:

Start will bring the array on-line, start Data-Rebuild, and then expand the file system (if possible).

I'm sure I want to do this

 

Im sure.... right ?  ( You are playing the role of me here )  :)

 

Joe L.

 

Edit: fixed my cut-and-replace...

 

  • Author

I didn't transfer any new DVD's over to the array today, just spend all day doing maintenance to get my Popcorn Hour reading the folders properly to generate it's index.html file.

 

If its only new stuff done today that is lost, then I can recover from it.

 

You now only need to

Stop the array

Un-assign the disk that has failed

Start the array with the disk un-assigned

Stop the array again

Re-assign the disk that had failed

 

--- Done to Here ----

 

Start the array (using the "Start" button), to allow the failed disk to be rebuilt onto the "replacement"  since the slot was previously "un-assigned" it will now use the same drive as its own "replacement"

when the rebuild is complete, you will have parity protection once more.  The "rebuild" will take about the same time as a full parity calc.  Until it is done, you are at risk of losing data if concurrently another drive were to 'really" fail.

 

Start is "grey" right now, unless I select:

Start will bring the array on-line, start Data-Rebuild, and then expand the file system (if possible).

I'm sure I want to do this

 

Im sure.... right ?  ( You are playing the role of me here )  :)

 

Joe L.

 

Edit: fixed my cut-and-replace...

 

Yes, I think you are "sure" you want to do the "Data-Rebuild"  (You might need to check the checkbox under "Start" to enable it.)  You must use "Start" to rebuild the disk.

 

Joe L.

  • Author

Done.

 

While you are being me for the night, do you think I should break up with my girlfriend or let things go a little long to see if a future does exist for us?

 

:)

Done.

 

While you are being me for the night, do you think I should break up with my girlfriend or let things go a little long to see if a future does exist for us?

 

:)

 

If she looks like the girl in your avatar pic then I think you should break up with her and send her my way ;)

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.