Device disabled

steve1977 · June 17, 2016

Any chance to see from the diagnostic file attached whether the disabled disk is indeed "broken" or yet again just some controller incompatiblity with Unraid?

tower-diagnostics-20160617-1741.zip

Fireball3 · June 17, 2016

I'm no expert in reading this things, but it seems there is more into it than just one disk!?

Any loose cabling?

There seems to be an WD 500GB (sdb) drive that is throwing I/O errors frequently.

Starts right after the boot sequence.

Next is disk 2 with I/O errors.

disk2: [8,160] (sdk) WDC_WD60EZRX-00MVLB1_WD-WX11D741AYXK

Jun 14 03:34:40 Tower kernel: md: disk2 write error, sector=6099439824
Jun 14 03:34:40 Tower kernel: md: disk2 read error, sector=6099444688

A few days later, disk 14 is joining the party.

Jun 17 16:40:59 Tower kernel: md: disk14 read error, sector=11312

Then there is the super.dat in your log with size of 0

trurl · June 17, 2016

SMART for both those drives looks OK. Are they on the same controller?

Not sure what to make of super.dat but there is this late in the log

Jun 14 03:34:40 Tower kernel: md: could not write superblock from /boot/config/super.dat

Earlier in the log it is read fine. I would make a screenshot of your drive assignments just in case.

steve1977 · June 18, 2016

Thanks for your quick replies.

The 500GB disk is no longer attached. I never got it to work, but believe this is driven by a separate issue (USB port).

In the GUI, only one disk shows errors (disk 2).

Can you eleborate on the super.dat file and what it is related it?

Are you suggesting to rebuild the array or to replace disk 2 and restore to it?

In general, this is quite annoying if this is yet again related to cabling. I have done about everything I can since starting to use Unraid 18 months ago. I bought a new sata controller and also a few onths ago changed all cabling by an IT professional (which cost a fortune). After I did the latter, it indeed worked the first time for a few months, which is great! But now, the same issue comes up with one drive being disabled (without the drive being broken). Any other ideas what I can do?

trurl · June 18, 2016

super.dat is the file that stores your array configuration. I suggested the screenshot in case super.dat is bad and so unRAID would not remember your array configuration after a reboot. Actually we can get your disk assignments from the logs you already posted but I thought a screenshot would be easier for you to use.

Due to the write error unRAID has disabled disk2. unRAID always disables a disk when it has a write failure because the disk contents are no longer valid. The missing writes are reflected in parity though, so the array actually contains those writes and that is where the valid contents of the disk is now. unRAID is emulating the disabled disk so any writes and reads can still happen, but it actually involves calculating that disks contents by reading all the other drives. unRAID will not even try to access the disabled disk until it is rebuilt.

Since the drive is probably OK you can try to rebuild it to itself. After getting the recommended screenshot, you should shut down and put your flash in your PC and let it checkdisk just in case there is some corruption there. While shut down, check all drive connections and cables, both ends of power and sata.

After booting back up report back and let us know if unRAID remembers your array configuration and we can proceed from there.

steve1977 · July 16, 2016

Thanks again. I hadn't done anything so far, but am making a new attempt this weekend.

Just to make sure that I am doing the right thing:

1) I am now planning to dissolve the array. Rather than recreating it. I will just let it go and create a new array with the same drive and let it rebuild the parity drive. I have not done any read/write to the "faulty disk", so assume that this will be fine. I think this may be faster than rebuilding it from parity? And even "safer".

2) Before I do this, I will run chkdsk, which will fix the super.dat.

Makes sense?

steve1977 · July 18, 2016

Any thoughts?

trurl · July 18, 2016

Thanks again. I hadn't done anything so far, but am making a new attempt this weekend.

Just to make sure that I am doing the right thing:

1) I am now planning to dissolve the array. Rather than recreating it. I will just let it go and create a new array with the same drive and let it rebuild the parity drive. I have not done any read/write to the "faulty disk", so assume that this will be fine. I think this may be faster than rebuilding it from parity? And even "safer".

2) Before I do this, I will run chkdsk, which will fix the super.dat.

Makes sense?

Not entirely. You say you have not done any writes to the faulty disk, but a write failure was the reason it was disabled in the first place, and the data that should have been written is in the parity array but not on the disk. It is possible the failed write isn't simply a file that didn't get written, or even a file that is incomplete. Instead it may be that the failed write was an update to the filesystem metadata that allows it to keep track of where file data is stored.

This thread was dead for a month so I have to wonder how well we really know the current state of your system. And we never got any further information on the state of super.dat

Assuming you only have one disabled disk, the safest course of action would be to rebuild that disk to a different trusted disk (precleared). Then you would still have the original disk if for some reason there was a problem with the rebuild.

steve1977 · July 19, 2016

Thanks for your reply. I am back to my Unraid on Friday and can send an update on the status and revised log.

What would you suggest me about the super.dat? Run the "chkdsk /f" now or anything else?

trurl · July 19, 2016

Thanks for your reply. I am back to my Unraid on Friday and can send an update on the status and revised log.

What would you suggest me about the super.dat? Run the "chkdsk /f" now or anything else?

I think there was a known issue recently that may have caused an issue with saving super.dat. Stopping the array should recreate it then you can get us new diagnostics.

steve1977 · July 22, 2016

Thanks for your help. I finally got back to my Unraid serrver and things appear a lot worse now. All drives are unassigned now. Please have a look at my diagnostic log. Guidance appreciated!

tower-diagnostics-20160722-1901.zip

trurl · July 22, 2016

Thanks for your help. I finally got back to my Unraid serrver and things appear a lot worse now. All drives are unassigned now. Please have a look at my diagnostic log. Guidance appreciated!

Looks like you rebooted with an empty super.dat, so that is exactly what would happen.

Here are your drive assignments from your earlier diagnostics syslog (disk0 is parity, Plextor is cache):

May 31 22:30:20 Tower kernel: md: import disk0: [8,240] (sdp) WDC_WD60EZRX-00MVLB1_WD-WX11DB4H85V6 size: 5860522532
May 31 22:30:20 Tower kernel: md: import disk1: [8,224] (sdo) WDC_WD60EZRX-00MVLB1_WD-WX11DA40HUVD size: 5860522532
May 31 22:30:20 Tower kernel: md: import disk2: [8,160] (sdk) WDC_WD60EZRX-00MVLB1_WD-WX11D741AYXK size: 5860522532
May 31 22:30:20 Tower kernel: md: import disk3: [8,80] (sdf) WDC_WD60EZRX-00MVLB1_WD-WX41D3402835 size: 5860522532
May 31 22:30:20 Tower kernel: md: import disk4: [8,112] (sdh) WDC_WD60EZRX-00MVLB1_WD-WXK1H641XM4J size: 5860522532
May 31 22:30:20 Tower kernel: md: import disk5: [8,192] (sdm) WDC_WD40EZRX-00SPEB0_WD-WCC4EHJF6HU0 size: 3907018532
May 31 22:30:20 Tower kernel: md: import disk6: [8,208] (sdn) WDC_WD40EZRX-00SPEB0_WD-WCC4EJ4EN10X size: 3907018532
May 31 22:30:20 Tower kernel: md: import disk7: [8,96] (sdg) WDC_WD60EZRX-00MVLB1_WD-WX11DA49HHVY size: 5860522532
May 31 22:30:20 Tower kernel: md: import disk8: [65,0] (sdq) WDC_WD40EZRX-00SPEB0_WD-WCC4E1961434 size: 3907018532
May 31 22:30:20 Tower kernel: md: import disk9: [65,16] (sdr) WDC_WD40EZRX-00SPEB0_WD-WCC4E4Z7FKHA size: 3907018532
May 31 22:30:20 Tower kernel: md: import disk10: [8,32] (sdc) WDC_WD60EZRX-00MVLB1_WD-WX71DA4A03D2 size: 5860522532
May 31 22:30:20 Tower kernel: md: import disk11: [8,48] (sdd) WDC_WD40EZRX-00SPEB0_WD-WCC4E0076546 size: 3907018532
May 31 22:30:20 Tower kernel: md: import disk12: [8,64] (sde) ST4000DM000-1F2168_W300JY79 size: 3907018532
May 31 22:30:20 Tower kernel: md: import disk13: [8,144] (sdj) ST4000DM000-1F2168_W3008JLM size: 3907018532
May 31 22:30:20 Tower kernel: md: import disk14: [8,176] (sdl) WDC_WD60EZRX-00MVLB1_WD-WX31D55A4944 size: 5860522532
May 31 22:21:20 Tower emhttp: PLEXTOR_PX-AG128M6e_P02509107411 (sdi) 125034840

Assign all your disks as before but don't start. Then post a screenshot that shows your disk assignments, and a screenshot that shows Array Operations.

steve1977 · July 22, 2016

Thanks for your help.

I think I made a mistake and probably will lose a disk over this mistake...

Before I saw your email, I played around and rebuild the array without the parity (with the old disk order from a screenshot). The good news is that the disks can be read, but the bad news is that the "faulty" disk indeed is unstable and Unraid marks error after using it for a while.

Given I restarted the array without the parity, I assume that I now no longer have the chance to rebuild using the parity disk? Any other idea or is the disk gone?

steve1977 · July 22, 2016

And attached a diagnostic log to see whether the disk can still somehow be saved?

tower-diagnostics-20160722-2211.zip

trurl · July 22, 2016

Are we still talking about this disk?

Serial Number:    WD-WX11D741AYXK

SMART looks OK so I have to think the issue is with something other than the drive, like SATA and power cables and connections, SATA port or controller.

Do you have your SATA cables bundled? If so don't.

What is the exact model of your power supply?

You should be able to New Config and assign all drives as before, including parity, then before starting post a screenshot showing Array Operations.

steve1977 · July 22, 2016

Thanks. Yes, it is still the same disk.

I have started the array (without parity) and also started the VM, which is working. Given I have started the VM, I assume that there has been some changes to at least one disk of the array. If I understand Unraid correctly, this will make the parity disk useless and the backup feature is lost? Or do I understand this wrong?

Let me explain more what is wrong about the drive. I can mount and read it. Also, I have started the array (without the parity) and have accessed it from Win10 on my VM. However, I can only access the disk very slowly and it eventually get kicked out of the array (i.e., drive becomes unavailable). I can stop and start the array and repeat this game.

I will dig out model of my PSU.

Where do I find "array operations"?

trurl · July 22, 2016

Array Operations is on the Main page.

If anything has written to any array disk then your parity is invalid. If your VM is on an array disk then you can probably assume that it invalidated parity.

You could try a New Config and parity sync without the problem disk then see if you can mount it using Unassigned Devices and copy from it.

trurl · July 22, 2016

What disk is the VM on?

steve1977 · July 22, 2016

Below the two screenshots you had requested.

trurl · July 22, 2016

Did you New Config and reassign all drives including parity, or did you just add parity to the existing config instead?

The goal is to get it to let you trust parity instead of forcing you to sync it.

If your VM changed any disk other than cache or disk2 its a moot point.

steve1977 · July 22, 2016

Array Operations is on the Main page.

If anything has written to any array disk then your parity is invalid. If your VM is on an array disk then you can probably assume that it invalidated parity.

You could try a New Config and parity sync without the problem disk then see if you can mount it using Unassigned Devices and copy from it.

Yes, the VM is on the array (but not on the faulty disk). I have not written anything to the faulty disk, but to the VM, so this should invalidate the parity?

What do you think is better? Create a new config with parity without the "faulty disk" (what you suggest above)? Or better create the new config includnig the "faulty disk" and see whether I can create a new parity disk?

trurl · July 22, 2016

You will probably have a better chance of building parity if you don't try to include the problem.

steve1977 · July 22, 2016

Did you New Config and reassign all drives including parity, or did you just add parity to the existing config instead?

The goal is to get it to let you trust parity instead of forcing you to sync it.

If your VM changed any disk other than cache or disk2 its a moot point.

What I did in sequence:

1) After restarting Unraid, no disk was assigned, but the 14+1 slots still showed in the GUI without any disk assigned to it

2) I then added all array disks (in right sequence), but NOT the parity disk

3) I started the array (which I should not have done)

4) I started the VM, which sits on disk 1

5) Disk 2 showed errors in Unraid GUI after some while

6) I stopped the array and rebooted Unraid

7) I repeated steps 3 to 5

I then added the parity disk as parity disk. I did not start the array, but took the screenshots and posted them

To answer your question: VM sits on disk 1 and I am quite sure that it made changes to disk 1.

Still worth to recover anything from parity? Or rebuild as you suggest without the failty disk? Or try to build a new parity disk with the array including the faulty disk?

Btw, the errors on disk 2 only happen if/once I access disk 2 from the VM. If I just start the array, but not the VM, no errors show. But that's probably obvious as no read/write occurs?

trurl · July 22, 2016

If you haven't rebooted or changed anything yet you might post a diagnostic that includes the disk2 errors to see if anything shows up in the syslog.

steve1977 · July 22, 2016

Let me recreate the errors and post a log. Give me a few minutes.

Device disabled

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived