Drive shows disabled, Help?

July 2, 201016 yr

I woke up this morning and checking my emails it shows that one of the disks is disabled (happens to be my parity drive ) below is the automated report and i have attached the syslog. i don't want to reboot the system yet so what are my options?

Subject:unRaid Failure Notification - One or more disks are disabled or invalid.

Status update for unRAID NAS

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Status:

ERROR: The unRaid array needs attention. One or more disks are disabled or invalid.

Disk 0=DISK_DSBL

Server Name: NAS

Server IP: 192.168.200.10

Date: Fri Jul 2 09:47:01 GMT+7 2010

Output of /proc/mdcmd:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - cmdOper=status cmdResult=ok sbName=/boot/config/super.dat

sbVersion=0.95.4

sbCreated=1271227392

sbUpdated=1278054675

sbEvents=68

sbState=0

sbNumDisks=11

sbSynced=1274810422

sbSyncErrs=0

mdVersion=0.95.4

mdState=STARTED

mdNumProtected=11

mdNumDisabled=1

mdDisabledDisk=0

mdNumInvalid=1

mdInvalidDisk=0

mdNumMissing=0

mdMissingDisk=0

mdNumNew=0

mdResync=0

diskNumber.0=0

diskName.0=

diskSize.0=0

diskState.0=4

diskModel.0=WDC WD20EADS-00S

diskSerial.0=WD-WCAVY1446339

diskId.0=WDC_WD20EADS-00S_WD-WCAVY1446339

rdevNumber.0=0

rdevStatus.0=DISK_DSBL

rdevName.0=sdl

rdevSize.0=1953514552

rdevModel.0=WDC WD20EADS-00S

rdevSerial.0=WD-WCAVY1446339

rdevId.0=WDC_WD20EADS-00S_WD-WCAVY1446339

rdevNumErrors.0=1670

rdevLastIO.0=0

rdevSpinupGroup.0=0

diskNumber.1=1

diskName.1=md1

diskSize.1=1953514552

diskState.1=7

diskModel.1=WDC WD20EARS-00S

diskSerial.1=WD-WCAVY1867806

diskId.1=WDC_WD20EARS-00S_WD-WCAVY1867806

rdevNumber.1=1

rdevStatus.1=DISK_OK

rdevName.1=sdc

rdevSize.1=1953514552

rdevModel.1=WDC WD20EARS-00S

rdevSerial.1=WD-WCAVY1867806

rdevId.1=WDC_WD20EARS-00S_WD-WCAVY1867806

rdevNumErrors.1=0

rdevLastIO.1=0

rdevSpinupGroup.1=360

diskNumber.2=2

diskName.2=md2

diskSize.2=1953514552

diskState.2=7

diskModel.2=WDC WD20EARS-00S

diskSerial.2=WD-WCAVY2078305

diskId.2=WDC_WD20EARS-00S_WD-WCAVY2078305

rdevNumber.2=2

rdevStatus.2=DISK_OK

rdevName.2=sdg

rdevSize.2=1953514552

rdevModel.2=WDC WD20EARS-00S

rdevSerial.2=WD-WCAVY2078305

rdevId.2=WDC_WD20EARS-00S_WD-WCAVY2078305

rdevNumErrors.2=0

rdevLastIO.2=0

rdevSpinupGroup.2=1680

diskNumber.3=3

diskName.3=md3

diskSize.3=1953514552

diskState.3=7

diskModel.3=WDC WD20EADS-00S

diskSerial.3=WD-WCAVY1328762

diskId.3=WDC_WD20EADS-00S_WD-WCAVY1328762

rdevNumber.3=3

rdevStatus.3=DISK_OK

rdevName.3=sdf

rdevSize.3=1953514552

rdevModel.3=WDC WD20EADS-00S

rdevSerial.3=WD-WCAVY1328762

rdevId.3=WDC_WD20EADS-00S_WD-WCAVY1328762

rdevNumErrors.3=0

rdevLastIO.3=0

rdevSpinupGroup.3=354

diskNumber.4=4

diskName.4=md4

diskSize.4=1953514552

diskState.4=7

diskModel.4=WDC WD20EARS-00S

diskSerial.4=WD-WCAVY2755687

diskId.4=WDC_WD20EARS-00S_WD-WCAVY2755687

rdevNumber.4=4

rdevStatus.4=DISK_OK

rdevName.4=sdj

rdevSize.4=1953514552

rdevModel.4=WDC WD20EARS-00S

rdevSerial.4=WD-WCAVY2755687

rdevId.4=WDC_WD20EARS-00S_WD-WCAVY2755687

rdevNumErrors.4=0

rdevLastIO.4=0

rdevSpinupGroup.4=1668

diskNumber.5=5

diskName.5=md5

diskSize.5=1953514552

diskState.5=7

diskModel.5=WDC WD20EARS-00S

diskSerial.5=WD-WCAVY2900727

diskId.5=WDC_WD20EARS-00S_WD-WCAVY2900727

rdevNumber.5=5

rdevStatus.5=DISK_OK

rdevName.5=sdd

rdevSize.5=1953514552

rdevModel.5=WDC WD20EARS-00S

rdevSerial.5=WD-WCAVY2900727

rdevId.5=WDC_WD20EARS-00S_WD-WCAVY2900727

rdevNumErrors.5=0

rdevLastIO.5=0

rdevSpinupGroup.5=330

diskNumber.6=6

diskName.6=md6

diskSize.6=1953514552

diskState.6=7

diskModel.6=WDC WD20EARS-00S

diskSerial.6=WD-WCAVY2592832

diskId.6=WDC_WD20EARS-00S_WD-WCAVY2592832

rdevNumber.6=6

rdevStatus.6=DISK_OK

rdevName.6=sdb

rdevSize.6=1953514552

rdevModel.6=WDC WD20EARS-00S

rdevSerial.6=WD-WCAVY2592832

rdevId.6=WDC_WD20EARS-00S_WD-WCAVY2592832

rdevNumErrors.6=0

rdevLastIO.6=0

rdevSpinupGroup.6=298

diskNumber.7=7

diskName.7=md7

diskSize.7=1953514552

diskState.7=7

diskModel.7=WDC WD20EARS-00S

diskSerial.7=WD-WCAVY2461835

diskId.7=WDC_WD20EARS-00S_WD-WCAVY2461835

rdevNumber.7=7

rdevStatus.7=DISK_OK

rdevName.7=sdi

rdevSize.7=1953514552

rdevModel.7=WDC WD20EARS-00S

rdevSerial.7=WD-WCAVY2461835

rdevId.7=WDC_WD20EARS-00S_WD-WCAVY2461835

rdevNumErrors.7=0

rdevLastIO.7=0

rdevSpinupGroup.7=1556

diskNumber.8=8

diskName.8=md8

diskSize.8=1953514552

diskState.8=7

diskModel.8=WDC WD20EARS-00S

diskSerial.8=WD-WCAVY2662502

diskId.8=WDC_WD20EARS-00S_WD-WCAVY2662502

rdevNumber.8=8

rdevStatus.8=DISK_OK

rdevName.8=sde

rdevSize.8=1953514552

rdevModel.8=WDC WD20EARS-00S

rdevSerial.8=WD-WCAVY2662502

rdevId.8=WDC_WD20EARS-00S_WD-WCAVY2662502

rdevNumErrors.8=0

rdevLastIO.8=0

rdevSpinupGroup.8=106

diskNumber.9=9

diskName.9=md9

diskSize.9=1953514552

diskState.9=7

diskModel.9=WDC WD20EARS-00S

diskSerial.9=WD-WCAVY2461819

diskId.9=WDC_WD20EARS-00S_WD-WCAVY2461819

rdevNumber.9=9

rdevStatus.9=DISK_OK

rdevName.9=sdh

rdevSize.9=1953514552

rdevModel.9=WDC WD20EARS-00S

rdevSerial.9=WD-WCAVY2461819

rdevId.9=WDC_WD20EARS-00S_WD-WCAVY2461819

rdevNumErrors.9=0

rdevLastIO.9=0

rdevSpinupGroup.9=1172

diskNumber.10=10

diskName.10=md10

diskSize.10=1953514552

diskState.10=7

diskModel.10=WDC WD20EARS-00S

diskSerial.10=WD-WCAVY2461905

diskId.10=WDC_WD20EARS-00S_WD-WCAVY2461905

rdevNumber.10=10

rdevStatus.10=DISK_OK

rdevName.10=sdk

rdevSize.10=1953514552

rdevModel.10=WDC WD20EARS-00S

rdevSerial.10=WD-WCAVY2461905

rdevId.10=WDC_WD20EARS-00S_WD-WCAVY2461905

rdevNumErrors.10=0

rdevLastIO.10=0

rdevSpinupGroup.10=660

Thanks

James

syslog-2010-07-02.zip

July 2, 201016 yr

The log where the drive was taken out of service was rotated out already.

Instead of /var/log/syslog, it would be helpful to see

/var/log/syslog.1

and potentially

/var/log/syslog.2

You can get to either from the normal unRAID web interface by typing in your browser:

//tower/log/syslog.1

or

//tower/log/syslog.2

The current log is filled with attempts to spin down the failed drive. (And that seems like it might be a bug in unRAID, since if a drive has failed, it is not likely to spin down, and filling the log with attempts to spin it down will eventually use up all the RAM) The actual failure was in a prior log file.

Basically, once you captured the syslog, you can:

stop the array

power down

verify the connectors (power and data) to the drive are secure

power back up

attempt to get a "smartctl" status report for the drive.

If you can get it to respond, then it might have just been a loose cable.

If you cannot, it is time to purchase a replacement drive.

unRAID will not put the drive that failed back into service on its own automatically, since you will need to perform a few steps to let it know how you want the failure handled.

First thing to remember, the drive was taken out of service because a "write" to if failed. That means the data written to it was not.

To put it back into service you'll need to first un-assign the drive, then start the array with it un-assigned, (that will make it forget the model/serial number of the drive so it will accept it as its own replacement)

Then, stop the array once more, re-assign the parity drive, an let it rebuild itself.

You could also use the procedure described in the wiki here:

http://lime-technology.com/wiki/index.php?title=Make_unRAID_Trust_the_Parity_Drive,_Avoid_Rebuilding_Parity_Unnecessarily to have the drive initially trust that the parity drive is ok (even though we know it must be somewhat out of date, since writes to it failed) and let a full parity check get things back into sync. (expect that there will be parity errors detected and corrected)

Joe L.

July 2, 201016 yr

Author

Joe L. thanks for the info i will follow your instructions and see what happens..

i could not attach the logs because they are too big so i put them on my works server and you can download them from here http://www.dpispecialtyfoods.com/syslogs.zip

thanks

James

July 2, 201016 yr

Looks like the drive just stopped responding to commands.

Jul 2 00:10:46 NAS kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Jul 2 00:10:46 NAS kernel: ata11.00: BMDMA stat 0x25

Jul 2 00:10:46 NAS kernel: ata11.00: failed command: READ DMA EXT

Jul 2 00:10:46 NAS kernel: ata11.00: cmd 25/00:00:6f:71:95/00:04:2d:00:00/e0 tag 0 dma 524288 in

Jul 2 00:10:46 NAS kernel: res 41/04:ef:6f:71:95/04:03:2d:00:00/e0 Emask 0x1 (device error)

Jul 2 00:10:46 NAS kernel: ata11.00: status: { DRDY ERR }

Jul 2 00:10:46 NAS kernel: ata11.00: error: { ABRT }

Jul 2 00:10:46 NAS kernel: ata11.00: configured for UDMA/133

Jul 2 00:10:46 NAS kernel: ata11: EH complete

Jul 2 00:10:46 NAS kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Jul 2 00:10:46 NAS kernel: ata11.00: BMDMA stat 0x25

Jul 2 00:10:46 NAS kernel: ata11.00: failed command: READ DMA EXT

Jul 2 00:10:46 NAS kernel: ata11.00: cmd 25/00:00:6f:71:95/00:04:2d:00:00/e0 tag 0 dma 524288 in

Jul 2 00:10:46 NAS kernel: res 51/04:00:6f:71:95/04:04:2d:00:00/e0 Emask 0x1 (device error)

Jul 2 00:10:46 NAS kernel: ata11.00: status: { DRDY ERR }

Jul 2 00:10:46 NAS kernel: ata11.00: error: { ABRT }

Jul 2 00:10:46 NAS kernel: ata11.00: configured for UDMA/133

Jul 2 00:10:46 NAS kernel: ata11: EH complete

Jul 2 00:10:46 NAS kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Jul 2 00:10:46 NAS kernel: ata11.00: BMDMA stat 0x25

Jul 2 00:10:46 NAS kernel: ata11.00: failed command: READ DMA EXT

Jul 2 00:10:46 NAS kernel: ata11.00: cmd 25/00:00:6f:71:95/00:04:2d:00:00/e0 tag 0 dma 524288 in

Jul 2 00:10:46 NAS kernel: res 51/04:00:6f:71:95/04:04:2d:00:00/e0 Emask 0x1 (device error)

Jul 2 00:10:46 NAS kernel: ata11.00: status: { DRDY ERR }

Jul 2 00:10:46 NAS kernel: ata11.00: error: { ABRT }

The last message in this series is one I'd never seen before. From 2TB to 0TB in one step.

Jul 2 00:11:15 NAS kernel: md: disk0 read error

Jul 2 00:11:15 NAS kernel: handle_stripe read error: 764769616/0, count: 1

Jul 2 00:11:15 NAS kernel: md: disk0 read error

Jul 2 00:11:15 NAS kernel: handle_stripe read error: 764769624/0, count: 1

Jul 2 00:11:15 NAS kernel: sd 3:0:0:0: [sdl] Asking for cache data failed

Jul 2 00:11:15 NAS kernel: sd 3:0:0:0: [sdl] Assuming drive cache: write through

Jul 2 00:11:15 NAS kernel: md: disk0 read error

Jul 2 00:11:15 NAS kernel: handle_stripe read error: 764769632/0, count: 1

Jul 2 00:11:15 NAS kernel: sdl: detected capacity change from 2000398934016 to 0

As I said, you can try re-seating the connectors on its cables, but other than that, purchase a new drive, install it, and then just press "Start" to rebuild the parity drive.

Then, RMA the old one and use the one you get back as a replacement for more movies.

Oh yes, be careful not to dislodge the cables on the working drives as you work in the server... don't want multiple failed drives. Make sure you power down before moving any cabling.

Joe L.

July 2, 201016 yr

Author

Joe thanks. i did reboot the system and it thinks that there is a new parity drive in the slot, i will pull it and not take any chances.

i do have 2 drives that have not had any data written to them yet in the array (they show up as empty), can i just remove one of them from the array and then add it to the parity slot?

thanks

james

July 4, 201016 yr

While it may be empty pulling it will make the server think a second drive has failed and it will want to rebuild parity. I guess since it seems to be the parity drive that has failed you could do this and it would build the array from scratch. Hrm, not sure that's a great path to follow...

July 4, 201016 yr

The current log is filled with attempts to spin down the failed drive. (And that seems like it might be a bug in unRAID, since if a drive has failed, it is not likely to spin down, and filling the log with attempts to spin it down will eventually use up all the RAM)

They are two separate bugs actually:

1. BUG: Issuing spindown commands to a disk 10 seconds apart.

Such commands should be at worst spindownDelay minutes apart.

And it doesn't have to be a bad disk. I've seen this effect on a otherwise perfectly good disk which silently ignored spindown commands.

2. BUG: Allowing the possibility of a runaway syslog to completly use up all RAM and crash the server.

Such train wrecks can be so easily prevented by mounting /var/log/ into its own limited-size ramdisk.

I've actually done this on my custom bzroot build by adding this single line to /etc/fstab

none  /var/log  tmpfs  size=128m  0  0

That ramdisk doesn't set apart any RAM. It will only start using up RAM when stuff starts filing it up, up to the set limit.

I've been trying to bring the above two issues to Limetech's attention for at least six months already.

If more people email Limetech about this, then maybe we can get these bugs a little higher on the to-do list.

Email: [email protected]

July 4, 201016 yr

i do have 2 drives that have not had any data written to them yet in the array (they show up as empty), can i just remove one of them from the array and then add it to the parity slot?

In this case, provided one of those empty disks is as large or larger than any of your other data disks, you can move it to the Parity slot. Then powerup server and confirm array is not Started (it will show the disk you moved as being 'missing'). Now from the console or telnet, type:

initconfig

(answer Yes to the confirmation).

Next, back on the webGui, click Refresh and all drive status will show as 'New' (blue dots). You may now click Start, and unRAID will mount your remaining data disks and start parity-sync on the disk in the Parity slot.

IMPORTANT: this case is for handling a failed/disabled Parity disk only, and you want to commission an existing Data disk to replace the failed Parity disk.

July 4, 201016 yr

They are two separate bugs actually:

1. BUG: Issuing spindown commands to a disk 10 seconds apart.

Such commands should be at worst spindownDelay minutes apart.

And it doesn't have to be a bad disk. I've seen this effect on a otherwise perfectly good disk which silently ignored spindown commands.

Not a BUG. If your disk does not support spindown, then set spindown delay for that disk to 'never'.

2. BUG: Allowing the possibility of a runaway syslog to completly use up all RAM and crash the server.

Such train wrecks can be so easily prevented by mounting /var/log/ into its own limited-size ramdisk.

I've actually done this on my custom bzroot build by adding this single line to /etc/fstab
none  /var/log  tmpfs  size=128m  0  0
That ramdisk doesn't set apart any RAM. It will only start using up RAM when stuff starts filing it up, up to the set limit.

That's a really good idea & I will incorporate in the next release.

I've been trying to bring the above two issues to Limetech's attention for at least six months already.

If more people email Limetech about this, then maybe we can get these bugs a little higher on the to-do list.

To my knowledge, I have never received an email from you. If this is incorrect, please enlighten me.

July 4, 201016 yr

I've been trying to bring the above two issues to Limetech's attention for at least six months already.

If more people email Limetech about this, then maybe we can get these bugs a little higher on the to-do list.

To my knowledge, I have never received an email from you. If this is incorrect, please enlighten me.

I can't update the email address in my profile, because I've long ago lost my password for this forum.

I am only able to still be active here thanks to the Firefox cookie.

Didn't want to bother you about resetting my password or email address.

But that's not really important. Since I got your attention here, I'll continue here...

They are two separate bugs actually:

1. BUG: Issuing spindown commands to a disk 10 seconds apart.

Such commands should be at worst spindownDelay minutes apart.

And it doesn't have to be a bad disk. I've seen this effect on a otherwise perfectly good disk which silently ignored spindown commands.

Not a BUG. If your disk does not support spindown, then set spindown delay for that disk to 'never'.

I only learned that the new disk does not support spindown after it crashed the whole server through the syslog.

Give me a moment... I'll gather links to various previous posts, and then I'll try to make a better argument about why it is a bug.

July 4, 201016 yr

I can't update the email address in my profile, because I've long ago lost my password for this forum.

I am only able to still be active here thanks to the Firefox cookie.

Didn't want to bother you about resetting my password or email address.

But that's not really important. Since I got your attention here, I'll continue here...

Even though your email address is obviously invalid in your profile, my email address is well-known. You complain about problems and suggest people email me, but as far as I can tell you have never emailed me with any issue.

Most of the time, email from a customer gets my attention right away (not always perfect with that). Others have complained about what I call "maintenance" releases without a prior beta first. Well almost all the time, these releases are in response to issues from customers brought to my attention via email, and for which most of the time, consisted of multiple email exchanges over several days, along with customer testing via private releases.

In the early days I could read and respond to almost every forum post, but that is impossible for me now. I do not see some really critical issues, though there are some here that forward me links to threads saying, "you might want to look at this". To those people, Thank You!!

So if have a dire problem, then no good to complain about a non-response on the forum, you need to email me directly: [email protected] I hope this is clear.

July 5, 201016 yr

So if have a dire problem, then no good to complain about a non-response on the forum, you need to email me directly

No, this BUG is not a "dire problem" for me -- I have my own workaround.

I disabled all unRAID's spin up/down functionality from the web management interface.

But even with this thing "disabled", I am still getting logs like these...

...
Jul  4 05:17:57 v454 kernel: md: disk0: ATA_OP_SETIDLE1 ioctl error: -5
Jul  4 05:17:57 v454 kernel: md: disk2: ATA_OP_SETIDLE1 ioctl error: -5
Jul  4 05:17:57 v454 kernel: md: disk1: ATA_OP_SETIDLE1 ioctl error: -5
...

Shouldn't "disabled" mean "disabled"?

It is strange to me how you seem more interested in lecturing me about email, when in my eyes I am doing you a favor.

We aren't here as friends, freely exchanging free software. We are here as seller and customer.

Fact: Spinning down hard disks every 10 seconds is a bug, no matter what you like to call it.

----

P.S.: I was in the process of gathering links to all past posts related to the spindown BUG, to make things easier for you,

when it occured to me that you've already seen them all, as can be inferred from your latest post.

...You complain about problems and suggest people email me...

July 5, 201015 yr

So if have a dire problem, then no good to complain about a non-response on the forum, you need to email me directly: [email protected] I hope this is clear.

There have been numerous complaints on the forum about non response to email.

It might help that once someone becomes a customer, they get a customer ID and then they can post to some help desk ticket system this way the query and response are tied together. Status is viewable and resolutions can be posted into a knowledge base.

My friend did this with his web hosting company and it really very well.

I believe he used Cererus Helpdesk http://www.cerberusweb.com

July 5, 201015 yr

2. BUG: Allowing the possibility of a runaway syslog to completly use up all RAM and crash the server.

Such train wrecks can be so easily prevented by mounting /var/log/ into its own limited-size ramdisk.

I've actually done this on my custom bzroot build by adding this single line to /etc/fstab
none  /var/log  tmpfs  size=128m  0  0
That ramdisk doesn't set apart any RAM. It will only start using up RAM when stuff starts filing it up, up to the set limit.
That's a really good idea & I will incorporate in the next release.

I started a new thread here related to this topic.

http://lime-technology.com/forum/index.php?topic=6910.0

Drive shows disabled, Help?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)