DISK_DSBL -- Doesn't sound good

October 25, 200916 yr

Red always equals bad, so i am trying to figure out what's happened.

If I go to Main screen in UnMenu, beside my one drive it says:

DISK_DSBL

I can still browse to the drive, but it looks like it has fallen out of the UnRaid array (its not protected now).

I'm also thinking the "errors" at the end may also be a big concern.

The drive has 1.5TB of data on it and its only a month old.

Suggestions on what to do from here?

Quote

October 25, 200916 yr

1. Replace the drive OR copy the data from the virtual drive to somewhere else.

2. Remove drive from array, and do diagnostics on it.... LOTS of diagnostics.... and either RMA it or junk it.

Quote

October 25, 200916 yr

Author

Is their a specific series of steps I should follow to complete this?

1. Stop the array

2. Shut machine off

3. I've just removed another 1.5TB drive from my Popcorn Hour, so I can add it to the system and run pre_clear on it.

After I do this, do I just use my Windows Explorer and drag and drop the contents from the failing drive over to the new drive

Where to go after this?

http://tower:8080/array_management and the select "Check and Correct Parity" ?

I don't want to mess this up

Thanks

Quote

October 25, 200916 yr

Red always equals bad, so i am trying to figure out what's happened.

If I go to Main screen in UnMenu, beside my one drive it says:

DISK_DSBL

I can still browse to the drive, but it looks like it has fallen out of the UnRaid array (its not protected now).

I'm also thinking the "errors" at the end may also be a big concern.

The drive has 1.5TB of data on it and its only a month old.

Suggestions on what to do from here?

The very first thing you should do is post a copy of your syslog. Since you have unMENU loaded, it is as simple as clicking on the link on the syslog plug-in page and then attaching the file to your next post to this thread.

Only after looking at how the drive was disabled can we know if the drive itself is at fault, or something else.

Also, the disk is disabled because a write to it failed. It could be the drive itself, or a loose cable, or a loose interface card.

Since you have unMENU installed, you can request "SMART" status reports easily through it. Post the "status report" output.

Also, since you have one disabled drive, it is being "simulated" by reading parity and all the other data drives. Until you get this resolved, do not add or remove ANY other drives or you will lose the data being simulated by parity and the other disks.

You should correct the problem as soon as possible, since if a second drive were to fail, you would lose the data on both failed drives. It is not just the data on the failed drive that is at risk, all of your data is at the same risk of a second concurrent drive failure.

See this link in the wiki:

http://lime-technology.com/wiki/index.php?title=Troubleshooting#What_do_I_do_if_I_get_a_red_ball_next_to_a_hard_disk.3F

Do not be misled by the fact that you can still read and write to the drive with a red ball indicator. You are, in fact, writing to the parity drive as if the failed drive was working. When reading, you are reading all of the remaining drives and re-constructing the data on the failed drive. If a drive has a red ball on the unRAID management page, it has been taken out of service. You will need to take corrective action, as a second concurrent disk failure will almost certainly result in lost data.

DO NOT press the button labeled "Restore" on the unRAID interface. It does not restore a disk, but instead sets a new initial configuration based on the current assigned and WORKING disks. It immediately throws away any old parity data. If you were to press it now you would erase all knowledge of the failed disk and anything that was on it. If you replace the disk, you only need to press the "Start" button to get your data rebuilt onto it.

So, first post a syslog, before you reboot, before you power down to check the cabling.

Joe L.

Quote

October 25, 200916 yr

Is their a specific series of steps I should follow to complete this?

1. Stop the array

2. Shut machine off

3. I've just removed another 1.5TB drive from my Popcorn Hour, so I can add it to the system and run pre_clear on it.

After I do this, do I just use my Windows Explorer and drag and drop the contents from the failing drive over to the new drive

Where to go after this?

http://tower:8080/array_management and the select "Check and Correct Parity" ?

I don't want to mess this up

Thanks

No, that is NOT what to do. You do not need to copy the files and in fact you cannot add any drives to the protected array while it is in a degraded state.

You will not be allowed to check parity... You have only one possible solution.

1. Post the syslog.

2. Get SMART and hdparm reportss for the failed drive.

If it has actually failed, as opposed to a bad or loose connection, then replace the drive, press "Start" after checking the checkbox under it, and have unRAID build your old contents onto it.

If we suspect a bad connection, you will stop the array, un-assign the drive, re-start the array (It will show it as missing, but still simulate it with the contents) then stop the array once more, re-assign the disk, then press "Start" once more. It will re-construct the drive.

If you are certain it was a loose cable, you can use the "Trust My Parity" procedure as described in the wiki. (It is a special procedure where you do press the button I said not to, but invoke a special command after pressing it but before starting the array so it does not invalidate your parity and throw away all your data)

Ask questions BEFORE you do anything, they are a lot easier to answer BEFORE you do something to endanger your data.

Joe L.

Quote

October 25, 200916 yr

Author

OK... I will stop doing anything with it right now.

1. Syslog Attached

2. Smart Status Report

SMART status Info for /dev/sdg

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Device Model: WDC WD15EADS-00P8B0

Serial Number: WD-WMAVU0072618

Firmware Version: 01.00A01

User Capacity: 1,500,301,910,016 bytes

Device is: Not in smartctl database [for details use: -P showall]

ATA Version is: 8

ATA Standard is: Exact ATA specification draft version not indicated

Local Time is: Sun Oct 25 17:34:15 2009 GMT+5

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status: (0x84) Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (32760) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 2) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: ( 5) minutes.

SCT capabilities: (0x303f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0

3 Spin_Up_Time 0x0027 200 179 021 Pre-fail Always - 4975

4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 119

5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0

7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0

9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 669

10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0

11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 57

192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 44

193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3403

194 Temperature_Celsius 0x0022 122 115 000 Old_age Always - 28

196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0

200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

3. HDParm Info

HDParm Info for /dev/sdg

/dev/sdg:

ATA device, with non-removable media

Model Number: WDC WD15EADS-00P8B0

Serial Number: WD-WMAVU0072618

Firmware Revision: 01.00A01

Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5

Standards:

Supported: 8 7 6 5

Likely used: 8

Configuration:

Logical max current

cylinders 16383 16383

heads 16 16

sectors/track 63 63

--

CHS current addressable sectors: 16514064

LBA user addressable sectors: 268435455

LBA48 user addressable sectors: 2930277168

device size with M = 1024*1024: 1430799 MBytes

device size with M = 1000*1000: 1500301 MBytes (1500 GB)

Capabilities:

LBA, IORDY(can be disabled)

Queue depth: 32

Standby timer values: spec'd by Standard, with device specific minimum

R/W multiple sector transfer: Max = 16 Current = 0

Recommended acoustic management value: 128, current value: 254

DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6

Cycle time: min=120ns recommended=120ns

PIO: pio0 pio1 pio2 pio3 pio4

Cycle time: no flow control=120ns IORDY flow control=120ns

Commands/features:

Enabled Supported:

* SMART feature set

Security Mode feature set

* Power Management feature set

* Write cache

* Look-ahead

* Host Protected Area feature set

* WRITE_BUFFER command

* READ_BUFFER command

* NOP cmd

* DOWNLOAD_MICROCODE

Power-Up In Standby feature set

* SET_FEATURES required to spinup after power up

SET_MAX security extension

Automatic Acoustic Management feature set

* 48-bit Address feature set

* Device Configuration Overlay feature set

* Mandatory FLUSH_CACHE

* FLUSH_CACHE_EXT

* SMART error logging

* SMART self-test

* General Purpose Logging feature set

* 64-bit World wide name

* WRITE_UNCORRECTABLE_EXT command

* {READ,WRITE}_DMA_EXT_GPL commands

* Segmented DOWNLOAD_MICROCODE

* SATA-I signaling speed (1.5Gb/s)

* SATA-II signaling speed (3.0Gb/s)

* Native Command Queueing (NCQ)

* Host-initiated interface power management

* Phy event counters

* unknown 76[12]

DMA Setup Auto-Activate optimization

* Software settings preservation

* SMART Command Transport (SCT) feature set

* SCT Long Sector Access (AC1)

* SCT LBA Segment Access (AC2)

* SCT Error Recovery Control (AC3)

* SCT Features Control (AC4)

* SCT Data Tables (AC5)

unknown 206[12] (vendor specific)

unknown 206[13] (vendor specific)

Security:

Master password revision code = 65534

supported

not enabled

not locked

not frozen

not expired: security count

supported: enhanced erase

334min for SECURITY ERASE UNIT. 334min for ENHANCED SECURITY ERASE UNIT.

Logical Unit WWN Device Identifier: 50014ee017dfab1

NAA : 5

IEEE OUI : 14ee

Unique ID : 017dfab1

Checksum: correct

Quote

October 25, 200916 yr

Author

My syslog was too big for one message, so here is everything before today ( the first one had today's logs )

For reference, all I have done, over the past few days, is link my Popcorn Hour to the server and then week it ( ie. get all the file names proper, add some .nfo files, images, etc )

Quote

October 25, 200916 yr

The syslog you posted is filled with repeating messages. The syslog rotation has already copied your original syslog to an alternate file.

Type

ls -l /var/log/syslog*

to see them all.

As an example, on my server it looks like this:

ls -l /var/log/syslog*

-rw-r--r-- 1 root root 17347 Oct 25 01:54 /var/log/syslog

-rw-r--r-- 1 root root 1088573 Oct 16 01:36 /var/log/syslog.1

The syslog.1 file is the first part of my syslog. when it filled, it was "rotated" out so a single file would not use up all my ram.

You may have a few syslog files in your /var/log folder. We need to see the earlier one, where the error first occurred.

You may need to copy it to your flash drive first and then upload it.

Joe L.

Quote

October 25, 200916 yr

Author

If the issue isn't in the log I just posted, it had to be in this one ( as this the balance of my syslog ) from a week ago.

This is a new drive from a few weeks ago that I pre-cleared multiple times.

Here are some of the pre-clear results from when I first got it ( if it helps any )

Disk Temperature: 27C, Elapsed Time: 22:33:36

===========================================================================

= unRAID server Pre-Clear disk /dev/sdb

= cycle 1 of 1

= Disk Pre-Clear-Read completed DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward. DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE

= Step 5 of 10 - Clearing MBR code area DONE

= Step 6 of 10 - Setting MBR signature bytes DONE

= Step 7 of 10 - Setting partition 1 to precleared state DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries DONE

= Step 10 of 10 - Testing if the clear has been successful. DONE

= Post-Read in progress: 99% complete.

( 1,500,291,072,000 of 1,500,301,910,016 bytes read ) 43.7 MB/s

Disk Temperature: 27C, Elapsed Time: 22:34:47

===========================================================================

= unRAID server Pre-Clear disk /dev/sdb

= cycle 1 of 1

= Disk Pre-Clear-Read completed DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward. DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE

= Step 5 of 10 - Clearing MBR code area DONE

= Step 6 of 10 - Setting MBR signature bytes DONE

= Step 7 of 10 - Setting partition 1 to precleared state DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries DONE

= Step 10 of 10 - Testing if the clear has been successful. DONE

= Disk Post-Clear-Read completed DONE

Disk Temperature: 26C, Elapsed Time: 22:35:56

============================================================================

==

== Disk /dev/sdb has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

63c63

< 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 49

---

> 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 51

============================================================================

root@Tower:/boot#

===========================================================================

= unRAID server Pre-Clear disk /dev/sdb

= cycle 1 of 1

= Disk Pre-Clear-Read completed DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward. DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE

= Step 5 of 10 - Clearing MBR code area DONE

= Step 6 of 10 - Setting MBR signature bytes DONE

= Step 7 of 10 - Setting partition 1 to precleared state DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries DONE

= Step 10 of 10 - Testing if the clear has been successful. DONE

= Post-Read in progress: 99% complete.

( 1,498,646,016,000 of 1,500,301,910,016 bytes read ) 46.1 MB/s

Disk Temperature: 25C, Elapsed Time: 22:34:39

===========================================================================

= unRAID server Pre-Clear disk /dev/sdb

= cycle 1 of 1

= Disk Pre-Clear-Read completed DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward. DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE

= Step 5 of 10 - Clearing MBR code area DONE

= Step 6 of 10 - Setting MBR signature bytes DONE

= Step 7 of 10 - Setting partition 1 to precleared state DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries DONE

= Step 10 of 10 - Testing if the clear has been successful. DONE

= Post-Read in progress: 99% complete.

( 1,500,291,072,000 of 1,500,301,910,016 bytes read ) 46.2 MB/s

Disk Temperature: 25C, Elapsed Time: 22:35:48

===========================================================================

= unRAID server Pre-Clear disk /dev/sdb

= cycle 1 of 1

= Disk Pre-Clear-Read completed DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward. DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE

= Step 5 of 10 - Clearing MBR code area DONE

= Step 6 of 10 - Setting MBR signature bytes DONE

= Step 7 of 10 - Setting partition 1 to precleared state DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries DONE

= Step 10 of 10 - Testing if the clear has been successful. DONE

= Disk Post-Clear-Read completed DONE

Disk Temperature: 25C, Elapsed Time: 22:36:57

============================================================================

==

== Disk /dev/sdb has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

58c58

< 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0

---

> 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0

63c63

< 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 76

---

> 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 77

============================================================================

root@Tower:/boot#

Quote

October 25, 200916 yr

It's in the latter log... look at Oct 24 20:57:52

I'm wondering if the power blip you had had some lingering effects. What make and size of UPS do you have?

I suggest changing your disks from AHCI back to IDE.

Quote

October 25, 200916 yr

You read my mind... nice...

Yes, it looks like a poor connection more than a failed drive. The SMART report and hdparm look good.

The first errors I see are here:

Oct 24 20:57:52 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 24 20:57:52 Tower kernel: ata2.00: irq_stat 0x40000001

Oct 24 20:57:52 Tower kernel: ata2.00: cmd 25/00:08:87:40:d4/00:00:31:00:00/e0 tag 0 dma 4096 in

Oct 24 20:57:52 Tower kernel: res 41/04:00:87:40:d4/00:00:31:00:00/e0 Emask 0x1 (device error)

Oct 24 20:57:52 Tower kernel: ata2.00: status: { DRDY ERR }

Oct 24 20:57:52 Tower kernel: ata2.00: error: { ABRT }

Oct 24 20:57:52 Tower kernel: ata2.00: configured for UDMA/133

Oct 24 20:57:52 Tower kernel: ata2: EH complete

Oct 24 20:58:02 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 24 20:58:02 Tower kernel: ata2.00: irq_stat 0x40000001

Oct 24 20:58:02 Tower kernel: ata2.00: cmd 25/00:08:87:40:d4/00:00:31:00:00/e0 tag 0 dma 4096 in

Oct 24 20:58:02 Tower kernel: res 41/04:00:87:40:d4/00:00:31:00:00/e0 Emask 0x1 (device error)

Oct 24 20:58:02 Tower kernel: ata2.00: status: { DRDY ERR }

Oct 24 20:58:02 Tower kernel: ata2.00: error: { ABRT }

Oct 24 20:58:02 Tower kernel: ata2.00: configured for UDMA/133

Oct 24 20:58:02 Tower kernel: ata2: EH complete

Oct 24 20:58:12 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 24 20:58:12 Tower kernel: ata2.00: irq_stat 0x40000001

Oct 24 20:58:12 Tower kernel: ata2.00: cmd 25/00:08:87:40:d4/00:00:31:00:00/e0 tag 0 dma 4096 in

Oct 24 20:58:12 Tower kernel: res 41/04:00:87:40:d4/00:00:31:00:00/e0 Emask 0x1 (device error)

Oct 24 20:58:12 Tower kernel: ata2.00: status: { DRDY ERR }

Oct 24 20:58:12 Tower kernel: ata2.00: error: { ABRT }

Oct 24 20:58:12 Tower kernel: ata2.00: configured for UDMA/133

Oct 24 20:58:12 Tower kernel: ata2: EH complete

Oct 24 20:58:22 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 24 20:58:22 Tower kernel: ata2.00: irq_stat 0x40000001

Oct 24 20:58:22 Tower kernel: ata2.00: cmd 25/00:08:87:40:d4/00:00:31:00:00/e0 tag 0 dma 4096 in

Oct 24 20:58:22 Tower kernel: res 41/04:00:87:40:d4/00:00:31:00:00/e0 Emask 0x1 (device error)

Oct 24 20:58:22 Tower kernel: ata2.00: status: { DRDY ERR }

Oct 24 20:58:22 Tower kernel: ata2.00: error: { ABRT }

Oct 24 20:58:22 Tower kernel: ata2.00: configured for UDMA/133

Oct 24 20:58:22 Tower kernel: ata2: EH complete

Oct 24 20:58:31 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 24 20:58:31 Tower kernel: ata2.00: irq_stat 0x40000001

Oct 24 20:58:31 Tower kernel: ata2.00: cmd 25/00:08:87:40:d4/00:00:31:00:00/e0 tag 0 dma 4096 in

Oct 24 20:58:31 Tower kernel: res 41/04:00:87:40:d4/00:00:31:00:00/e0 Emask 0x1 (device error)

Oct 24 20:58:31 Tower kernel: ata2.00: status: { DRDY ERR }

Oct 24 20:58:31 Tower kernel: ata2.00: error: { ABRT }

Oct 24 20:58:31 Tower kernel: ata2.00: configured for UDMA/133

Oct 24 20:58:31 Tower kernel: ata2: EH complete

Oct 24 20:58:41 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Oct 24 20:58:41 Tower kernel: ata2.00: irq_stat 0x40000001

Oct 24 20:58:41 Tower kernel: ata2.00: cmd 25/00:08:87:40:d4/00:00:31:00:00/e0 tag 0 dma 4096 in

Oct 24 20:58:41 Tower kernel: res 41/04:00:87:40:d4/00:00:31:00:00/e0 Emask 0x1 (device error)

Oct 24 20:58:41 Tower kernel: ata2.00: status: { DRDY ERR }

Oct 24 20:58:41 Tower kernel: ata2.00: error: { ABRT }

Oct 24 20:58:41 Tower kernel: ata2.00: configured for UDMA/133

Oct 24 20:58:41 Tower kernel: sd 1:0:0:0: [sdb] Result: hostbyte=0x00 driverbyte=0x08

Oct 24 20:58:41 Tower kernel: sd 1:0:0:0: [sdb] Sense Key : 0xb [current] [descriptor]

Oct 24 20:58:41 Tower kernel: Descriptor sense data with sense descriptors (in hex):

Oct 24 20:58:41 Tower kernel: 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00

Oct 24 20:58:41 Tower kernel: 31 d4 40 87

Oct 24 20:58:41 Tower kernel: sd 1:0:0:0: [sdb] ASC=0x0 ASCQ=0x0

Oct 24 20:58:41 Tower kernel: end_request: I/O error, dev sdb, sector 835993735

Oct 24 20:58:41 Tower kernel: ata2: EH complete

Oct 24 20:58:41 Tower kernel: md: disk1 read error

I'd stop the array, power down, re-seat the connectors (both power AND data) especially if you are using any kind of drive trays, and then power back up.

If the disk still looks good via hdparm and smart reports and ALL the other disks are still "green" you are a candidate for the trust my parity procedure (if the hdparm and smartctl reports print it is good, if they are unavailable, the communications to the disk failed again)

This SHOULD NOT be used if the disk has been written to since the failure since it will assume the physical disk is correct and the virtual disk is not.

The procedure is described here:

http://lime-technology.com/wiki/index.php?title=Make_unRAID_Trust_the_Parity_Drive,_Avoid_Rebuilding_Parity_Unnecessarily

You must follow it exactly. Read it through and ask questions before starting it. You must issue a

mdcmd set invalidslot 99

command and get the expected response BEFORE pressing the "Start" button, but after pressing the button labeled "restore"

Quote

October 25, 200916 yr

What mobo do you have... and does it have an Intel chipset?

My money is on AHIC issues... changing the drives to IDE will eliminate that.

Quote

October 25, 200916 yr

Author

It's in the latter log... look at Oct 24 20:57:52

I'm wondering if the power blip you had had some lingering effects. What make and size of UPS do you have?

I suggest changing your disks from AHCI back to IDE.

I have a APC 550V UPS, which seems to handle things fine.

If I recall correctly, I think I changed IDE to AHCI as my server was picking up one of my drives as IDE and not SATA ( as the port may have been for PATA ).

Should I worry about this change now or later?

Quote

October 25, 200916 yr

It's in the latter log... look at Oct 24 20:57:52

I'm wondering if the power blip you had had some lingering effects. What make and size of UPS do you have?

I suggest changing your disks from AHCI back to IDE.

It's in the latter log... look at Oct 24 20:57:52

I'm wondering if the power blip you had had some lingering effects. What make and size of UPS do you have?

I suggest changing your disks from AHCI back to IDE.

I have a APC 550V UPS, which seems to handle things fine.

If I recall correctly, I think I changed IDE to AHCI as my server was picking up one of my drives as IDE and not SATA ( as the port may have been for PATA ).

Should I worry about this change now or later?

I don't think it has anything to do with this... (but I've been proven wrong in the past... It looks more like a cabling issue than anything else so far)

Quote

October 25, 200916 yr

Author

What mobo do you have... and does it have an Intel chipset?

My money is on AHIC issues... changing the drives to IDE will eliminate that.

I bought one of the recommended ones in the forums:

Gigabyte MA74GM-S2 with AMD

Quote

October 25, 200916 yr

Author

This SHOULD NOT be used if the disk has been written to since the failure since it will assume the physical disk is correct and the virtual disk is not.

I didn't realize it was down until I posted here, so I suspect I have done some minor writing to it.

I have been adding .nfo files to some of the folders, however as they are adding through shares/virtual drives, I can't say exactly which drives I have written to ( but it can be assumed at least once I hit the drive in question ).

Quote

October 25, 200916 yr

That's got a good chipset (AMD SB700) and the Smartctl output is fine so the disk itself does not seem to be the problem, so see what happens with reseated cables.

Quote

October 25, 200916 yr

Author

I've shutdown, re-seated cables ( even replaced the SATA cable to the drive in question ).

I've started it back up and it still shows as DISK_DSBL

I've re-ran those two reports below.

What next?

HDParm Info for /dev/sdb WDC_WD15EADS-00P8B0_WD-WMAVU0072618

/dev/sdb:

ATA device, with non-removable media

Model Number: WDC WD15EADS-00P8B0

Serial Number: WD-WMAVU0072618

Firmware Revision: 01.00A01

Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5

Standards:

Supported: 8 7 6 5

Likely used: 8

Configuration:

Logical max current

cylinders 16383 16383

heads 16 16

sectors/track 63 63

--

CHS current addressable sectors: 16514064

LBA user addressable sectors: 268435455

LBA48 user addressable sectors: 2930277168

device size with M = 1024*1024: 1430799 MBytes

device size with M = 1000*1000: 1500301 MBytes (1500 GB)

Capabilities:

LBA, IORDY(can be disabled)

Queue depth: 32

Standby timer values: spec'd by Standard, with device specific minimum

R/W multiple sector transfer: Max = 16 Current = 0

Recommended acoustic management value: 128, current value: 254

DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6

Cycle time: min=120ns recommended=120ns

PIO: pio0 pio1 pio2 pio3 pio4

Cycle time: no flow control=120ns IORDY flow control=120ns

Commands/features:

Enabled Supported:

* SMART feature set

Security Mode feature set

* Power Management feature set

* Write cache

* Look-ahead

* Host Protected Area feature set

* WRITE_BUFFER command

* READ_BUFFER command

* NOP cmd

* DOWNLOAD_MICROCODE

Power-Up In Standby feature set

* SET_FEATURES required to spinup after power up

SET_MAX security extension

Automatic Acoustic Management feature set

* 48-bit Address feature set

* Device Configuration Overlay feature set

* Mandatory FLUSH_CACHE

* FLUSH_CACHE_EXT

* SMART error logging

* SMART self-test

* General Purpose Logging feature set

* 64-bit World wide name

* WRITE_UNCORRECTABLE_EXT command

* {READ,WRITE}_DMA_EXT_GPL commands

* Segmented DOWNLOAD_MICROCODE

* SATA-I signaling speed (1.5Gb/s)

* SATA-II signaling speed (3.0Gb/s)

* Native Command Queueing (NCQ)

* Host-initiated interface power management

* Phy event counters

* unknown 76[12]

DMA Setup Auto-Activate optimization

* Software settings preservation

* SMART Command Transport (SCT) feature set

* SCT Long Sector Access (AC1)

* SCT LBA Segment Access (AC2)

* SCT Error Recovery Control (AC3)

* SCT Features Control (AC4)

* SCT Data Tables (AC5)

unknown 206[12] (vendor specific)

unknown 206[13] (vendor specific)

Security:

Master password revision code = 65534

supported

not enabled

not locked

not frozen

not expired: security count

supported: enhanced erase

334min for SECURITY ERASE UNIT. 334min for ENHANCED SECURITY ERASE UNIT.

Logical Unit WWN Device Identifier: 50014ee017dfab1

NAA : 5

IEEE OUI : 14ee

Unique ID : 017dfab1

Checksum: correct

Statistics for /dev/sdb WDC_WD15EADS-00P8B0_WD-WMAVU0072618

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Device Model: WDC WD15EADS-00P8B0

Serial Number: WD-WMAVU0072618

Firmware Version: 01.00A01

User Capacity: 1,500,301,910,016 bytes

Device is: Not in smartctl database [for details use: -P showall]

ATA Version is: 8

ATA Standard is: Exact ATA specification draft version not indicated

Local Time is: Sun Oct 25 18:34:57 2009 GMT+5

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status: (0x82) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (32760) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 2) minutes.

Extended self-test routine

recommended polling time: ( 255) minutes.

Conveyance self-test routine

recommended polling time: ( 5) minutes.

SCT capabilities: (0x303f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0

3 Spin_Up_Time 0x0027 200 179 021 Pre-fail Always - 4975

4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 120

5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0

7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0

9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 670

10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0

11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 58

192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 44

193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3414

194 Temperature_Celsius 0x0022 123 115 000 Old_age Always - 27

196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0

200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

Quote

October 25, 200916 yr

Author

Syslog attached here, in case its also needed.

Quote

October 25, 200916 yr

[i've started it back up and it still shows as DISK_DSBL

I've re-ran those two reports below.

What next?

Once failed, it will not reuse it unless it "thinks" you have replaced it. To do that you must make it "forget" the serial number of the old assigned drive.

You now only need to

Stop the array

Un-assign the disk that has failed

Start the array with the disk un-assigned

Stop the array again

Re-assign the disk that had failed

Start the array (using the "Start" button), to allow the failed disk to be rebuilt onto the "replacement" since the slot was previously "un-assigned" it will now use the same drive as its own "replacement"

when the rebuild is complete, you will have parity protection once more. The "rebuild" will take about the same time as a full parity calc. Until it is done, you are at risk of losing data if concurrently another drive were to 'really" fail.

Joe L.

Edit: fixed my cut-and-replace...

Quote

October 25, 200916 yr

Author

You now only need to

Stop the array

Un-assign the disk that has failed

Start the array with the disk un-assigned

Stop the array again

Re-assign the disk that had failed

--- Done to Here ----

Start the array (using the "Start" button), to allow the failed disk to be rebuilt onto the "replacement" since the slot was previously "un-assigned" it will now use the same drive as its own "replacement"

when the rebuild is complete, you will have parity protection once more. The "rebuild" will take about the same time as a full parity calc. Until it is done, you are at risk of losing data if concurrently another drive were to 'really" fail.

Start is "grey" right now, unless I select:

Start will bring the array on-line, start Data-Rebuild, and then expand the file system (if possible).

I'm sure I want to do this

Im sure.... right ? ( You are playing the role of me here )

Joe L.

Edit: fixed my cut-and-replace...

Quote

October 25, 200916 yr

Author

I didn't transfer any new DVD's over to the array today, just spend all day doing maintenance to get my Popcorn Hour reading the folders properly to generate it's index.html file.

If its only new stuff done today that is lost, then I can recover from it.

Quote

October 26, 200916 yr

You now only need to

Stop the array

Un-assign the disk that has failed

Start the array with the disk un-assigned

Stop the array again

Re-assign the disk that had failed

--- Done to Here ----

Start the array (using the "Start" button), to allow the failed disk to be rebuilt onto the "replacement" since the slot was previously "un-assigned" it will now use the same drive as its own "replacement"

when the rebuild is complete, you will have parity protection once more. The "rebuild" will take about the same time as a full parity calc. Until it is done, you are at risk of losing data if concurrently another drive were to 'really" fail.

Start is "grey" right now, unless I select:

Start will bring the array on-line, start Data-Rebuild, and then expand the file system (if possible).

I'm sure I want to do this

Im sure.... right ? ( You are playing the role of me here )

Joe L.

Edit: fixed my cut-and-replace...

Yes, I think you are "sure" you want to do the "Data-Rebuild" (You might need to check the checkbox under "Start" to enable it.) You must use "Start" to rebuild the disk.

Joe L.

Quote

October 26, 200916 yr

Author

Done.

While you are being me for the night, do you think I should break up with my girlfriend or let things go a little long to see if a future does exist for us?

Quote

October 26, 200916 yr

Done.

While you are being me for the night, do you think I should break up with my girlfriend or let things go a little long to see if a future does exist for us?

If she looks like the girl in your avatar pic then I think you should break up with her and send her my way

Quote

DISK_DSBL -- Doesn't sound good

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)