Disk became unaccessible

steve1977 · July 30, 2016

This is a new and different issue. I am using RC2 and one disk in the array can no longer be accessed. It still shows "green, "error 0" and indicates free space in the GUI. However, I cannot access it over the network and when trying to see folder structure in the GUI, it shows no folders.

I can imagine that rebooting may fix things at least temporaily, but wanted to get advice on this forum first. To make sure that I don't break anything by restarting.

Attached the diagnostic file. Your help is much appreciated!

tower-diagnostics-20160730-1324.zip

JorgeB · July 30, 2016

You need to use xfs_repair on disk1.

Before that there are a few errors on disk11, you may want to check/replace cables and keep monitoring the log.

Jul 25 21:14:33 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 25 21:14:33 Tower kernel: ata2.00: failed command: READ DMA EXT
Jul 25 21:14:33 Tower kernel: ata2.00: cmd 25/00:08:60:5b:f9/00:00:71:01:00/e0 tag 6 dma 4096 in
Jul 25 21:14:33 Tower kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 25 21:14:33 Tower kernel: ata2.00: status: { DRDY }
Jul 25 21:14:33 Tower kernel: ata2: hard resetting link
Jul 25 21:14:33 Tower kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Jul 25 23:41:42 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 25 23:41:42 Tower kernel: ata2.00: failed command: STANDBY IMMEDIATE
Jul 25 23:41:42 Tower kernel: ata2.00: cmd e0/00:00:00:00:00/00:00:00:00:00/40 tag 5
Jul 25 23:41:42 Tower kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 25 23:41:42 Tower kernel: ata2.00: status: { DRDY }
Jul 25 23:41:42 Tower kernel: ata2: hard resetting link
Jul 25 23:41:42 Tower kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Jul 26 11:12:44 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 26 11:12:44 Tower kernel: ata2.00: failed command: STANDBY IMMEDIATE
Jul 26 11:12:44 Tower kernel: ata2.00: cmd e0/00:00:00:00:00/00:00:00:00:00/40 tag 29
Jul 26 11:12:44 Tower kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 26 11:12:44 Tower kernel: ata2.00: status: { DRDY }
Jul 26 11:12:44 Tower kernel: ata2: hard resetting link
Jul 26 11:12:44 Tower kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

steve1977 · July 30, 2016

Thanks for your reply. Let me run xfs_repair and report back.

When it comes to the errors related to cables, I have been having this for a while and changed them just recently (again). I stumbled over a notion about "autoparking" on Green drives (https://lime-technology.com/forum/index.php?topic=21007.0). I am using almost exclusively green drives. Could this have anything to do with the errors on disk 11 et al?

steve1977 · July 30, 2016

Thanks. I ran xfs_repair, but getting below error message. Any thoughts?

Phase 1 - find and verrify superblock...

- block cache size set to 1421280 entries

Phase 2 - using internal log

- zero log....

zero_log: head block 1615378 tail block 1615345

ERRROR: The filesystem has valuable metadate changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmont it before re-running xfs_repair. If you are unable to mount the filesystem, them use the -L option to destroy the log and attempt a repair.

Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

JorgeB · July 30, 2016

Try starting and stopping the array in normal mode, then start in maintenance mode and run xfs_repair again, if you still get the same error using -L is the only option.

steve1977 · July 30, 2016

Drive now shows as "unassigned", so can no longer be mounted. Shall I try /L or shall I recover on a new disk?

steve1977 · July 30, 2016

This means running: xfs_repair -v -L /dev/md1

steve1977 · July 30, 2016

I was not able to restart in maintenance mode. See new diagnostic below. Any advice:

tower-diagnostics-20160730-1631.zip

JorgeB · July 30, 2016

Drive now shows as "unassigned", so can no longer be mounted. Shall I try /L or shall I recover on a new disk?

If it shows unassigned it may have dropped offline, reboot and try again.

steve1977 · July 30, 2016

as you may see from the log, disk11 now also dropped, so I have two unassigned disks. any thoughts?

Sent from my iPhone using Tapatalk

JorgeB · July 30, 2016

Reboot

steve1977 · July 30, 2016

Thanks. Disk11 came back after reboot and I was able to go into maintenance mode again. Ran with -L and got some error mistakes. I stopped the array (in maintenacne mode). Disk 1 still shows as "unassigned". Reboot again?

steve1977 · July 30, 2016

I went ahead and rebooted. The disk shows again. Now, I am facing a critical question: shall I start the array (and probably lost something because of -L) or shall I rebuild from parity (if even possible?).

tower-diagnostics-20160730-2111.zip

trurl · July 30, 2016

Rebuilds typically will not fix filesystem issues.

You need to figure out why the disks are dropping or you are going to continue having problems even if you get your files back. Rebooting is not really a fix for anything. Have you checked all your connections? Power, SATA both ends? Controller seated well in the slot?

JorgeB · July 30, 2016

If the disk goes offline when running xfs_repair there's a hardware issue, be it disk/cable/controller, after checking cabling and if you have a spare you can do a rebuild, you still need to run xfs_repair after the rebuild completes.

steve1977 · July 30, 2016

Thanks for your messages. Actually, I got it wrong. The disk did not go offline during xfs_repair. Things are working again now. Thanks for your help!

Having said this, there remains some form of issue, which has (probably) nothing to do with the xfs issue for disk1. But the disk11 issue is a constant issue that I am facing since starting to run Unraid. Unraid disables my disks from time to time. I have really tried everything you can imagine. Bought a new M1015 controller card, exchanged the PSU to one of the most expensive ones, had a professional IT person redo the cabling (twice). The issue remains though. From all I can tell, it only impacts disks that are connected to the M1015 and not the Mobo directly, but I really doubt that the M1015 is the issue.

My suspcision is that it may still be cable-related as the IT guys could have picked cheap cables. Don't know how to better do this or whether anyone has recommendations for good cables. A new idea from today is that it may be related to the "autoparking" given I am mostly using Green drives and I don't think one of my Seagate disks ever caused an issue. But I saw your (johnnie.black) reply that this is rather unlikely.

Link to the threads with more details and context about my issues:

https://lime-technology.com/forum/index.php?topic=49815.0

https://lime-technology.com/forum/index.php?topic=21007.0

Disk became unaccessible

Recommended Posts

steve1977

Link to comment

JorgeB

Link to comment

steve1977

Link to comment

steve1977

Link to comment

JorgeB

Link to comment

steve1977

Link to comment

steve1977

Link to comment

steve1977

Link to comment

JorgeB

Link to comment

steve1977

Link to comment

JorgeB

Link to comment

steve1977

Link to comment

steve1977

Link to comment

trurl

Link to comment

JorgeB

Link to comment

steve1977

Link to comment

Archived