Device disabled - should I replace the disk or is something else wrong


nia

Recommended Posts

Hi.

 

A disk suddenly appears to be disabled. There has been no physical interaction with the server for a long time. I just recently upgraded unRAID to the latest version, and only afterwards discovered that Zerons OpenVMTools is no longer updated and compatible. I run on ESXi 6.0, but I don't think that has any influence on my issue.

 

Can anyone tell from the attached diagnostics (current logfile where the disk was disabled), if I need to replace the disk? Or should/can I do something else to bring it back online? 

 

I think the interesting sequence in the log is the part that looks like this:

(---snip---)

Feb 19 11:02:02 Tower kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00)
Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 Sense Key : 0x2 [current] 
Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 ASC=0x4 ASCQ=0x0 
Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 CDB: opcode=0x28 28 00 43 07 28 90 00 00 08 00
Feb 19 11:02:02 Tower kernel: print_req_error: I/O error, dev sde, sector 1124542608
Feb 19 11:02:02 Tower kernel: md: disk9 read error, sector=1124542544
Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 Sense Key : 0x2 [current] 
Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 ASC=0x4 ASCQ=0x0 
Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 CDB: opcode=0x28 28 00 09 97 54 d8 00 00 08 00
Feb 19 11:02:02 Tower kernel: print_req_error: I/O error, dev sde, sector 160912600
Feb 19 11:02:02 Tower kernel: md: disk9 read error, sector=160912536
Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 Sense Key : 0x2 [current] 
Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 ASC=0x4 ASCQ=0x0 
Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 CDB: opcode=0x2a 2a 00 43 07 28 90 00 00 08 00
Feb 19 11:02:10 Tower kernel: print_req_error: I/O error, dev sde, sector 1124542608
Feb 19 11:02:10 Tower kernel: md: disk9 write error, sector=1124542544
Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 Sense Key : 0x2 [current] 
Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 ASC=0x4 ASCQ=0x0 
Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 CDB: opcode=0x28 28 00 2d 65 ba d0 00 00 08 00
Feb 19 11:02:10 Tower kernel: print_req_error: I/O error, dev sde, sector 761641680
Feb 19 11:02:10 Tower kernel: md: disk9 read error, sector=761641616
Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#0 Sense Key : 0x2 [current] 
Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#0 ASC=0x4 ASCQ=0x0 
Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#0 CDB: opcode=0x2a 2a 00 09 97 54 d8 00 00 08 00
Feb 19 11:02:10 Tower kernel: print_req_error: I/O error, dev sde, sector 160912600
Feb 19 11:02:10 Tower kernel: md: disk9 write error, sector=160912536
Feb 19 11:02:11 Tower kernel: sd 4:0:0:0: [sde] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Feb 19 11:02:11 Tower kernel: sd 4:0:0:0: [sde] tag#0 Sense Key : 0x2 [current] 
Feb 19 11:02:11 Tower kernel: sd 4:0:0:0: [sde] tag#0 ASC=0x4 ASCQ=0x0 
Feb 19 11:02:11 Tower kernel: sd 4:0:0:0: [sde] tag#0 CDB: opcode=0x2a 2a 00 2d 65 ba d0 00 00 08 00
Feb 19 11:02:11 Tower kernel: print_req_error: I/O error, dev sde, sector 761641680
Feb 19 11:02:11 Tower kernel: md: disk9 write error, sector=761641616
Feb 19 11:02:39 Tower kernel: sd 4:0:0:0: device_block, handle(0x000e)
Feb 19 11:02:41 Tower kernel: sd 4:0:0:0: device_unblock and setting to running, handle(0x000e)
Feb 19 11:02:41 Tower kernel: sd 4:0:0:0: [sde] Synchronizing SCSI cache
Feb 19 11:02:41 Tower kernel: sd 4:0:0:0: [sde] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00

(---snip---)

 

Any input appreciated :-)

tower-diagnostics-20180219-1414.zip

Link to comment

I stopped the server, massaged the cables, and the disk is back again :) - but still with a red X in the UI, so the data is still Emulated. :/

 

I can see, that only 9,03MB is free on the disk (ReiserFS). :o Could that be contributing to the issue (if only a few sectors are failing, then there's nowhere to put the data i guess?)

 

Anyway, the SMART log is attached. One thing makes me think it should be replaced soon anyway - it's been powered on for 68595 hours (7y, 9m, 26d, 3h) !

 

Back to the current issue though. How can it get back into operation again. Would it help/do I need to delete some files from the emulated disk? 

 

Thanks in advance for your input... :)

tower-smart-20180219-1657.zip

Link to comment

SMART looks perfect, except for some previous overheating.

 

If you want to rebuild to the same disk:

 

http://lime-technology.com/wiki/Troubleshooting#Re-enable_the_drive

 

Make sure contents on the emulated disk look correct before rebuilding, or to play it safer rebuild to a new disk.

 

22 minutes ago, nia said:

I can see, that only 9,03MB is free on the disk (ReiserFS). :o Could that be contributing to the issue (if only a few sectors are failing, then there's nowhere to put the data i guess?)

Nope, though you should leave a few GB free, and convert all reiserfs disks to xfs, reiserfs is dead and not recommend for v6

Edited by johnnie.black
Link to comment

SMART looks OK.

 

unRAID disables a disk when a write to it fails. But the write is still used to update parity, so the emulated disk has the correct data, but the original doesn't. The disk must be rebuilt before unRAID will use it again since its data is now invalid and out-of-sync with parity.

 

It doesn't matter how full a disk is for rebuilding since parity is at the bit level and every bit will be rebuilt.

 

The safest approach is to rebuild to a new disk, so the original can be kept intact in case something goes wrong. Do you have a spare?

 

It is also possible to rebuild to the same disk.

Link to comment

Thanks or the quick answers guys! :)

 

@trurl I appreciate the thinking about the safest approach. I don't have a spare at hand unfortunately - only 1TB's. I think I will put it back into service again, as it was apparently a cable issue, and SMART looks OK (it would take 2-3 days before a spare can arrive, and I think I would rather replace other drives first).

 

@johnnie.black Regarding updating the filesystem. Can I do that on the current 2TB disk before re-creating the data on it - i.e. get XFS on it?

 

 

 

Link to comment

@trurl I found a place that can ship so I have a 4TB tomorrow. :D Then it can run Preclear for the 24 hours or so that it takes(?), I can then swap it in to the old 2TB slot and rebuild on XFS in stead. I guess the risk profile of running with a emulated disk but the data safe on the physical one for a couple of days extra is lower than trying to rebuild, right?

 

Afterwards, I can start updating other disks to XFS on a 'rotational pattern' having an extra 2TB disk for the purpose I guess O.o (except for the three of four 4TB on ReiserFS :() ... 

Link to comment
25 minutes ago, nia said:

rebuild on XFS in stead

Not sure I understand what you have in mind.

 

As already stated, you can't rebuild to a different filesystem. And changing filesystems will format a disk. Changing from Reiser is a good idea, but you must copy the data elsewhere before formatting to another filesystem.

Link to comment

@trurl You are right - I was not thinking it all the way through, sorry... 9_9 

 

Now - thinking a bit more carefully, I still think I can find a way.

Adding a 4TB to replace the 2TB will add 2 TB free space.

  1. The data from the next 2TB can be copied to the free space on the 4TB,
  2. while the old 2TB disk (the one that was replaced by the 4TB) is formatted to XFS.
  3. Then I can remove the disk that has been copied to the 4TB from the array.
  4. Restart the array
  5. Then I can add the XFS-formatted disk, so now I have another 2TB of free space.
  6. I can then copy data from the NEXT ReiserFS-formatted 2TB (or 1TB) disk to the 'blank' XFS-disk just added to the array,
  7. while formatting the disk that was removed to XFS.
  8. Then [Repeat from 3]. B|

It should work. BUT - the question is: Is it worth all of this effort (including constantly adapting the config of the Shares) just to go to XFS? 

I mean - ReiserFS has been doing the job well for a long time for many?

Link to comment

Rebuild 2TB to 4TB. This rebuilt disk will still be Reiser but it will have an extra 2TB free. That gets us to 1.

52 minutes ago, nia said:

The data from the next 2TB can be copied to the free space on the 4TB,

 

then we go on to 2., which is lacking in details

53 minutes ago, nia said:

while the old 2TB disk (the one that was replaced by the 4TB) is formatted to XFS.

I hope you mean you will format the old 2TB disk that you replaced instead of formatting the rebuilt 4TB disk, which is still Reiser. But even if that is what you mean, the old 2TB disk is no longer part of the array. How did you plan to make it part of the array so you can format it?

 

And it continues to go off track after that. 

 

Changing to XFS is a good thing to do, but it is time consuming. The basic ideas aren't that complicated. To change the filesystem of a disk, you must format it to another filesystem while it is still in the array. Since the disk will be formatted, its contents must be copied elsewhere.

 

Any time you remove a disk you must rebuild parity, so you want to avoid removing a disk unless your ultimate goal is to end up with fewer disks, and even then you should do that as the last step so you only rebuild parity once.

 

And any time you add a disk, unRAID will clear it so parity will remain valid. It is best to avoid adding disks unless your ultimate goal is to end up with more disks. And if you do add a disk, you can't format it until it has been added to the array.

 

If all you want to do is change a disk to XFS, just copy all its contents somewhere else, and format the disk while it is still in the array. Then you can repeat with other disks. Don't remove or add any disks. If you need more space, replace a disk with a larger one and rebuild it, then continue.

 

There is a long thread about this already pinned at the top of this subforum, and it has links to procedures in the wiki if you want to go look there for more background, but it might be just as well to stay here and try to clear up your understanding of all this.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.