nia Posted February 19, 2018 Posted February 19, 2018 Hi. A disk suddenly appears to be disabled. There has been no physical interaction with the server for a long time. I just recently upgraded unRAID to the latest version, and only afterwards discovered that Zerons OpenVMTools is no longer updated and compatible. I run on ESXi 6.0, but I don't think that has any influence on my issue. Can anyone tell from the attached diagnostics (current logfile where the disk was disabled), if I need to replace the disk? Or should/can I do something else to bring it back online? I think the interesting sequence in the log is the part that looks like this: (---snip---) Feb 19 11:02:02 Tower kernel: mpt2sas_cm0: log_info(0x31110d00): originator(PL), code(0x11), sub_code(0x0d00) Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 Sense Key : 0x2 [current] Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 ASC=0x4 ASCQ=0x0 Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 CDB: opcode=0x28 28 00 43 07 28 90 00 00 08 00 Feb 19 11:02:02 Tower kernel: print_req_error: I/O error, dev sde, sector 1124542608 Feb 19 11:02:02 Tower kernel: md: disk9 read error, sector=1124542544 Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 Sense Key : 0x2 [current] Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 ASC=0x4 ASCQ=0x0 Feb 19 11:02:02 Tower kernel: sd 4:0:0:0: [sde] tag#0 CDB: opcode=0x28 28 00 09 97 54 d8 00 00 08 00 Feb 19 11:02:02 Tower kernel: print_req_error: I/O error, dev sde, sector 160912600 Feb 19 11:02:02 Tower kernel: md: disk9 read error, sector=160912536 Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 Sense Key : 0x2 [current] Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 ASC=0x4 ASCQ=0x0 Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 CDB: opcode=0x2a 2a 00 43 07 28 90 00 00 08 00 Feb 19 11:02:10 Tower kernel: print_req_error: I/O error, dev sde, sector 1124542608 Feb 19 11:02:10 Tower kernel: md: disk9 write error, sector=1124542544 Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 Sense Key : 0x2 [current] Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 ASC=0x4 ASCQ=0x0 Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#2 CDB: opcode=0x28 28 00 2d 65 ba d0 00 00 08 00 Feb 19 11:02:10 Tower kernel: print_req_error: I/O error, dev sde, sector 761641680 Feb 19 11:02:10 Tower kernel: md: disk9 read error, sector=761641616 Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#0 Sense Key : 0x2 [current] Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#0 ASC=0x4 ASCQ=0x0 Feb 19 11:02:10 Tower kernel: sd 4:0:0:0: [sde] tag#0 CDB: opcode=0x2a 2a 00 09 97 54 d8 00 00 08 00 Feb 19 11:02:10 Tower kernel: print_req_error: I/O error, dev sde, sector 160912600 Feb 19 11:02:10 Tower kernel: md: disk9 write error, sector=160912536 Feb 19 11:02:11 Tower kernel: sd 4:0:0:0: [sde] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Feb 19 11:02:11 Tower kernel: sd 4:0:0:0: [sde] tag#0 Sense Key : 0x2 [current] Feb 19 11:02:11 Tower kernel: sd 4:0:0:0: [sde] tag#0 ASC=0x4 ASCQ=0x0 Feb 19 11:02:11 Tower kernel: sd 4:0:0:0: [sde] tag#0 CDB: opcode=0x2a 2a 00 2d 65 ba d0 00 00 08 00 Feb 19 11:02:11 Tower kernel: print_req_error: I/O error, dev sde, sector 761641680 Feb 19 11:02:11 Tower kernel: md: disk9 write error, sector=761641616 Feb 19 11:02:39 Tower kernel: sd 4:0:0:0: device_block, handle(0x000e) Feb 19 11:02:41 Tower kernel: sd 4:0:0:0: device_unblock and setting to running, handle(0x000e) Feb 19 11:02:41 Tower kernel: sd 4:0:0:0: [sde] Synchronizing SCSI cache Feb 19 11:02:41 Tower kernel: sd 4:0:0:0: [sde] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00 (---snip---) Any input appreciated :-) tower-diagnostics-20180219-1414.zip
trurl Posted February 19, 2018 Posted February 19, 2018 SMART for sde doesn't appear in the diagnostics. Check connections and post another.
nia Posted February 19, 2018 Author Posted February 19, 2018 I stopped the server, massaged the cables, and the disk is back again - but still with a red X in the UI, so the data is still Emulated. I can see, that only 9,03MB is free on the disk (ReiserFS). Could that be contributing to the issue (if only a few sectors are failing, then there's nowhere to put the data i guess?) Anyway, the SMART log is attached. One thing makes me think it should be replaced soon anyway - it's been powered on for 68595 hours (7y, 9m, 26d, 3h) ! Back to the current issue though. How can it get back into operation again. Would it help/do I need to delete some files from the emulated disk? Thanks in advance for your input... tower-smart-20180219-1657.zip
JorgeB Posted February 19, 2018 Posted February 19, 2018 SMART looks perfect, except for some previous overheating. If you want to rebuild to the same disk: http://lime-technology.com/wiki/Troubleshooting#Re-enable_the_drive Make sure contents on the emulated disk look correct before rebuilding, or to play it safer rebuild to a new disk. 22 minutes ago, nia said: I can see, that only 9,03MB is free on the disk (ReiserFS). Could that be contributing to the issue (if only a few sectors are failing, then there's nowhere to put the data i guess?) Nope, though you should leave a few GB free, and convert all reiserfs disks to xfs, reiserfs is dead and not recommend for v6
trurl Posted February 19, 2018 Posted February 19, 2018 SMART looks OK. unRAID disables a disk when a write to it fails. But the write is still used to update parity, so the emulated disk has the correct data, but the original doesn't. The disk must be rebuilt before unRAID will use it again since its data is now invalid and out-of-sync with parity. It doesn't matter how full a disk is for rebuilding since parity is at the bit level and every bit will be rebuilt. The safest approach is to rebuild to a new disk, so the original can be kept intact in case something goes wrong. Do you have a spare? It is also possible to rebuild to the same disk.
nia Posted February 19, 2018 Author Posted February 19, 2018 Thanks or the quick answers guys! @trurl I appreciate the thinking about the safest approach. I don't have a spare at hand unfortunately - only 1TB's. I think I will put it back into service again, as it was apparently a cable issue, and SMART looks OK (it would take 2-3 days before a spare can arrive, and I think I would rather replace other drives first). @johnnie.black Regarding updating the filesystem. Can I do that on the current 2TB disk before re-creating the data on it - i.e. get XFS on it?
JorgeB Posted February 19, 2018 Posted February 19, 2018 2 minutes ago, nia said: Can I do that on the current 2TB disk before re-creating the data on it - i.e. get XFS on it? No, a rebuilt disk can only have the original filesystem.
trurl Posted February 19, 2018 Posted February 19, 2018 14 minutes ago, nia said: Can I do that on the current 2TB disk before re-creating the data on it - i.e. get XFS on it? Also be aware that if you do decide to change the filesystem after the rebuild, changing the filesystem on any disk will format it so you must have its contents copied elsewhere.
nia Posted February 19, 2018 Author Posted February 19, 2018 @trurl I found a place that can ship so I have a 4TB tomorrow. Then it can run Preclear for the 24 hours or so that it takes(?), I can then swap it in to the old 2TB slot and rebuild on XFS in stead. I guess the risk profile of running with a emulated disk but the data safe on the physical one for a couple of days extra is lower than trying to rebuild, right? Afterwards, I can start updating other disks to XFS on a 'rotational pattern' having an extra 2TB disk for the purpose I guess (except for the three of four 4TB on ReiserFS ) ...
trurl Posted February 19, 2018 Posted February 19, 2018 25 minutes ago, nia said: rebuild on XFS in stead Not sure I understand what you have in mind. As already stated, you can't rebuild to a different filesystem. And changing filesystems will format a disk. Changing from Reiser is a good idea, but you must copy the data elsewhere before formatting to another filesystem.
nia Posted February 19, 2018 Author Posted February 19, 2018 @trurl You are right - I was not thinking it all the way through, sorry... Now - thinking a bit more carefully, I still think I can find a way. Adding a 4TB to replace the 2TB will add 2 TB free space. The data from the next 2TB can be copied to the free space on the 4TB, while the old 2TB disk (the one that was replaced by the 4TB) is formatted to XFS. Then I can remove the disk that has been copied to the 4TB from the array. Restart the array Then I can add the XFS-formatted disk, so now I have another 2TB of free space. I can then copy data from the NEXT ReiserFS-formatted 2TB (or 1TB) disk to the 'blank' XFS-disk just added to the array, while formatting the disk that was removed to XFS. Then [Repeat from 3]. It should work. BUT - the question is: Is it worth all of this effort (including constantly adapting the config of the Shares) just to go to XFS? I mean - ReiserFS has been doing the job well for a long time for many?
trurl Posted February 19, 2018 Posted February 19, 2018 Rebuild 2TB to 4TB. This rebuilt disk will still be Reiser but it will have an extra 2TB free. That gets us to 1. 52 minutes ago, nia said: The data from the next 2TB can be copied to the free space on the 4TB, then we go on to 2., which is lacking in details 53 minutes ago, nia said: while the old 2TB disk (the one that was replaced by the 4TB) is formatted to XFS. I hope you mean you will format the old 2TB disk that you replaced instead of formatting the rebuilt 4TB disk, which is still Reiser. But even if that is what you mean, the old 2TB disk is no longer part of the array. How did you plan to make it part of the array so you can format it? And it continues to go off track after that. Changing to XFS is a good thing to do, but it is time consuming. The basic ideas aren't that complicated. To change the filesystem of a disk, you must format it to another filesystem while it is still in the array. Since the disk will be formatted, its contents must be copied elsewhere. Any time you remove a disk you must rebuild parity, so you want to avoid removing a disk unless your ultimate goal is to end up with fewer disks, and even then you should do that as the last step so you only rebuild parity once. And any time you add a disk, unRAID will clear it so parity will remain valid. It is best to avoid adding disks unless your ultimate goal is to end up with more disks. And if you do add a disk, you can't format it until it has been added to the array. If all you want to do is change a disk to XFS, just copy all its contents somewhere else, and format the disk while it is still in the array. Then you can repeat with other disks. Don't remove or add any disks. If you need more space, replace a disk with a larger one and rebuild it, then continue. There is a long thread about this already pinned at the top of this subforum, and it has links to procedures in the wiki if you want to go look there for more background, but it might be just as well to stay here and try to clear up your understanding of all this.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.