Unmountable drives after shutdown

Nodiaque · December 10, 2023

Hello everyone,

I'm on unraid 6.12.5 and had to shutdown my server. After restart, I'm met with this:

image.png.cc1d83b27157922dc32243a0f683a002.png

Did I just lost 3 drives full of data (and then all data from these array)? I had read error on other drive (not even these) before shutdown.

edit: I tried this and get bad superblock

mount -o rescue=all,ro /dev/sdf /temp
mount: /temp: wrong fs type, bad option, bad superblock on /dev/sdf, missing codepage or helper program, or other error.
dmesg(1) may have more information after failed mount system call.

servraid-diagnostics-20231210-0933.zip

dmesg show:

679.417315] squashfs: Unknown parameter 'rescue'
[ 679.417367] fuseblk: Unknown parameter 'rescue'
[ 679.417376] UDF-fs: bad mount option "rescue=all" or missing value
[ 679.417483] xfs: Unknown parameter 'rescue'

I tried to follow these step

Edited December 10, 2023 by Nodiaque

itimpi · December 10, 2023

No reason to assume just because a drive shows as unmountable that anything has been lost. Handling of unmountable drives is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page. The Unraid OS->Manual section in particular covers most features of the current Unraid release.

the question, however, is why the drives went unmountable. Probably a good idea to attach your system’s diagnostics zip file to your next post in this thread to see if we can spot anything.

Nodiaque · December 10, 2023

Hello, I did attached the diagnostic file. Maybe we crossed path between my multiple edit on attempting to fix it using another thread. I'll check the manual you linked.

I just ran a xfs repair option from the gui in maintenance mode with -n, gave me that


Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 7
        - agno = 3
        - agno = 8
        - agno = 6
        - agno = 5
        - agno = 9
        - agno = 11
        - agno = 13
        - agno = 12
        - agno = 10
        - agno = 14
        - agno = 1
        - agno = 4
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
Maximum metadata LSN (3:2197135) is ahead of log (3:2193676).
Would format log to cycle 6.
No modify flag set, skipping filesystem flush and exiting.

itimpi · December 10, 2023

No indication that the repair process is having problems. If you rerun without -n and adding -L iit should repair things so that when you restart the array in normal mode it mounts OK.

Nodiaque · December 10, 2023

ok, that's what I'm doing right now. I got one disk that had more error. Shutdown was initiated because I woke up with read error this morning on two drive and I wasn't able to run the smart on one of them. I then saw syslog daemon wasn't logging anything for like a week (this happen randomly) so I restarte the daemon and started having weird error. When I triggered a shutdown from GUI, nothing happened. So I shutdown all dockers and vm and pressed power (just press). This initiated a shutdown and it "properly" shutdown. The disk unmountable aren't (well, 1 is) the one with the read error.

I'm gonna scan the other drives while I'm in maintenance just for good measure.

the error I had this morning in log was thid:

Dec 10 09:03:40 ServRaid kernel: sd 7:0:0:0: [sdh] tag#17 CDB: opcode=0x88 88 00 00 00 00 01 80 43 18 20 00 00 01 00 00 00
Dec 10 09:03:40 ServRaid kernel: I/O error, dev sdh, sector 6446848032 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 2
Dec 10 09:03:40 ServRaid kernel: ata6: EH complete
Dec 10 09:03:40 ServRaid kernel: ata3.00: exception Emask 0x50 SAct 0x2062000 SErr 0x4090800 action 0xe frozen
Dec 10 09:03:40 ServRaid kernel: ata3.00: irq_stat 0x00400040, connection status changed
Dec 10 09:03:40 ServRaid kernel: ata3: SError: { HostInt PHYRdyChg 10B8B DevExch }
Dec 10 09:03:40 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:40 ServRaid kernel: ata3.00: cmd 60/00:68:e8:c5:ad/01:00:27:05:00/40 tag 13 ncq dma 131072 in
Dec 10 09:03:40 ServRaid kernel:         res 40/00:90:60:bf:13/00:00:af:05:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:40 ServRaid kernel: ata3.00: status: { DRDY }
Dec 10 09:03:40 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:40 ServRaid kernel: ata3.00: cmd 60/00:88:60:be:13/01:00:af:05:00/40 tag 17 ncq dma 131072 in
Dec 10 09:03:40 ServRaid kernel:         res 40/00:90:60:bf:13/00:00:af:05:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:40 ServRaid kernel: ata3.00: status: { DRDY }
Dec 10 09:03:40 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:40 ServRaid kernel: ata3.00: cmd 60/00:90:60:bf:13/01:00:af:05:00/40 tag 18 ncq dma 131072 in
Dec 10 09:03:40 ServRaid kernel:         res 40/00:90:60:bf:13/00:00:af:05:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:40 ServRaid kernel: ata3.00: status: { DRDY }
Dec 10 09:03:40 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:40 ServRaid kernel: ata3.00: cmd 60/00:c8:e8:c6:ad/01:00:27:05:00/40 tag 25 ncq dma 131072 in
Dec 10 09:03:40 ServRaid kernel:         res 40/00:90:60:bf:13/00:00:af:05:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:40 ServRaid kernel: ata3.00: status: { DRDY }
Dec 10 09:03:40 ServRaid kernel: ata3: hard resetting link
Dec 10 09:03:41 ServRaid kernel: ata5: COMRESET failed (errno=-16)
Dec 10 09:03:41 ServRaid kernel: ata5: hard resetting link
Dec 10 09:03:46 ServRaid kernel: ata3: link is slow to respond, please be patient (ready=0)
Dec 10 09:03:46 ServRaid kernel: ata5: link is slow to respond, please be patient (ready=0)
Dec 10 09:03:48 ServRaid kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 10 09:03:49 ServRaid kernel: ata5.00: configured for UDMA/33
Dec 10 09:03:49 ServRaid kernel: ata5: EH complete
Dec 10 09:03:49 ServRaid kernel: ata6.00: exception Emask 0x50 SAct 0x20000 SErr 0x4090800 action 0xe frozen
Dec 10 09:03:49 ServRaid kernel: ata6.00: irq_stat 0x00400040, connection status changed
Dec 10 09:03:49 ServRaid kernel: ata6: SError: { HostInt PHYRdyChg 10B8B DevExch }
Dec 10 09:03:49 ServRaid kernel: ata6.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:49 ServRaid kernel: ata6.00: cmd 60/00:88:78:1d:91/01:00:2a:00:00/40 tag 17 ncq dma 131072 in
Dec 10 09:03:49 ServRaid kernel:         res 40/00:88:78:1d:91/00:00:2a:00:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:49 ServRaid kernel: ata6.00: status: { DRDY }
Dec 10 09:03:49 ServRaid kernel: ata6: hard resetting link
Dec 10 09:03:50 ServRaid kernel: ata3: COMRESET failed (errno=-16)
Dec 10 09:03:50 ServRaid kernel: ata3: hard resetting link
Dec 10 09:03:55 ServRaid kernel: ata6: link is slow to respond, please be patient (ready=0)
Dec 10 09:03:56 ServRaid kernel: ata3: link is slow to respond, please be patient (ready=0)
Dec 10 09:03:58 ServRaid kernel: ata5.00: exception Emask 0x50 SAct 0xc00 SErr 0x4090800 action 0xe frozen
Dec 10 09:03:58 ServRaid kernel: ata5.00: irq_stat 0x00400040, connection status changed
Dec 10 09:03:58 ServRaid kernel: ata5: SError: { HostInt PHYRdyChg 10B8B DevExch }
Dec 10 09:03:58 ServRaid kernel: ata5.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:58 ServRaid kernel: ata5.00: cmd 60/00:50:c0:94:af/01:00:a3:06:00/40 tag 10 ncq dma 131072 in
Dec 10 09:03:58 ServRaid kernel:         res 40/00:58:c0:95:af/00:00:a3:06:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:58 ServRaid kernel: ata5.00: status: { DRDY }
Dec 10 09:03:58 ServRaid kernel: ata5.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:58 ServRaid kernel: ata5.00: cmd 60/00:58:c0:95:af/01:00:a3:06:00/40 tag 11 ncq dma 131072 in
Dec 10 09:03:58 ServRaid kernel:         res 40/00:58:c0:95:af/00:00:a3:06:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:58 ServRaid kernel: ata5.00: status: { DRDY }
Dec 10 09:03:58 ServRaid kernel: ata5: hard resetting link
Dec 10 09:03:59 ServRaid kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 10 09:03:59 ServRaid kernel: ata3.00: configured for UDMA/33
Dec 10 09:03:59 ServRaid kernel: ata3: EH complete
Dec 10 09:03:59 ServRaid kernel: ata6: COMRESET failed (errno=-16)
Dec 10 09:03:59 ServRaid kernel: ata6: hard resetting link

itimpi · December 10, 2023

That syslog snippet look like what we normally see if there is a cabling or power issue with the dtive.

Nodiaque · December 10, 2023

Yeah, after a restart it seems fixed but it just started happening again. I'm wondering if it's my backplane that is dead. All drive with this behavior are into one of this

https://www.amazon.ca/dp/B00OUSU8MI?ref=ppx_yo2ov_dt_b_product_details&th=1

itimpi · December 10, 2023

Do you have 1 or 2 power connectors attached? You may well need 2 (ideally from different PSU leads) to get reliable operation.

Nodiaque · December 10, 2023

2 connected but it's Y splitter from another connector. Stupid Dell Motherboard that only give 4 plugs for a 12 sata motherboard... Right now I didn't see anymore error, but I do see that 2 of the ata are 3.0gps while 1 is 6gbps. Somewhere next week, I'll swap all of the sata cable. I had new one in the box that I didn't used. It's true though that problem started since I have this box. Before that, I had all of the hard drive (well only 3) all connected to separate power cable in an old hdd cage that I salvage from another pc. I though this box was "better" but I might be wrong.

Nodiaque · December 11, 2023

I'm pretty sure it's that external box I bought. Located all drive that are having issue on this thing. Now if I want to take them out and put them back directly on sata, do I just do reassign while array down? Those are hot swappable so I can swap them 1 by 1. Since they change dev id when I moved them, I'll change them 1 by 1 so I don't mix them.

Nodiaque · December 11, 2023

I don't know what happened, I now have a red X on a drive

image.png.c5d0264685bdce2b368129d10d58cd4b.png

edit: I did an xfs_repair and I can manually mount the drive in shell no problem.

servraid-diagnostics-20231211-0940.zip

new edit: Ok so I followed what I found here:

Stop the array, remove device, start the array, stop the array, add device, start array. I see a rebuild has started. My parity was checked 2 weeks ago so I'm not too concerned with it. I just hope the external enclosure won't cause any problem while it's rebuilding so I can then put all those drive outside of the dammed enclosure

next edit: Dam, I'm getting again the same error I had about link down..... Can I stop the rebuild process, put the hard drive out of the cage and assign it back for the new rebuilt or since a rebuild already started, I'm doomed?

Edited December 11, 2023 by Nodiaque

JorgeB · December 11, 2023

1 hour ago, Nodiaque said:

next edit: Dam, I'm getting again the same error I had about link down..... Can I stop the rebuild process, put the hard drive out of the cage and assign it back for the new rebuilt or since a rebuild already started, I'm doomed?

It's OK to cancel and start over, as long as the cage doesn't interfere with the drive, like some USB cages do.

Nodiaque · December 11, 2023

ok, well in fact, the cage is the problem. I've removed the drive out of it but the rebuild need 2 of the other drive in the cage. It non stop spam syslog with

Dec 11 10:54:12 ServRaid kernel: ata3.00: exception Emask 0x50 SAct 0xff0000 SErr 0x4890800 action 0xe frozen
Dec 11 10:54:12 ServRaid kernel: ata3.00: irq_stat 0x0d400040, interface fatal error, connection status changed
Dec 11 10:54:12 ServRaid kernel: ata3: SError: { HostInt PHYRdyChg 10B8B LinkSeq DevExch }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:80:e0:5e:86/05:00:13:00:00/40 tag 16 ncq dma 688128 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:88:20:64:86/05:00:13:00:00/40 tag 17 ncq dma 688128 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:90:60:69:86/05:00:13:00:00/40 tag 18 ncq dma 688128 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:98:a0:6e:86/05:00:13:00:00/40 tag 19 ncq dma 688128 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:a0:e0:73:86/05:00:13:00:00/40 tag 20 ncq dma 688128 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/78:a8:20:79:86/00:00:13:00:00/40 tag 21 ncq dma 61440 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:b0:98:79:86/05:00:13:00:00/40 tag 22 ncq dma 688128 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/08:b8:d8:7e:86/00:00:13:00:00/40 tag 23 ncq dma 4096 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3: hard resetting link
Dec 11 10:54:18 ServRaid kernel: ata3: link is slow to respond, please be patient (ready=0)
Dec 11 10:54:22 ServRaid kernel: ata3: COMRESET failed (errno=-16)
Dec 11 10:54:22 ServRaid kernel: ata3: hard resetting link
Dec 11 10:54:28 ServRaid kernel: ata3: link is slow to respond, please be patient (ready=0)
Dec 11 10:54:31 ServRaid kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 11 10:54:31 ServRaid kernel: ata3.00: configured for UDMA/33
Dec 11 10:54:31 ServRaid kernel: ata3: EH complete

Is there something I can do to ID who is ata3?

If I only I could remove all HDD from the cage and then begin the rebuild on that one drive... right now it's saying 6 days for the rebuilt at 16mbps

JorgeB · December 11, 2023

49 minutes ago, Nodiaque said:

Is there something I can do to ID who is ata3?

Is that inside the cage? If yes remove it, if not replace cables.

Nodiaque · December 11, 2023

Yes I'm pretty sure it's one drive inside the cage, but I don't know who is ATA3? Before that I had also ata5 error so the drive I removed from the cage is probably ata5 (the one I'm rebuilding)

Edited December 11, 2023 by Nodiaque

JorgeB · December 11, 2023

19 minutes ago, Nodiaque said:

but I don't know who is ATA3?

You'd need to post the diags for us to be able to tell.

Nodiaque · December 11, 2023

Oh I though there was a command or something I could easily check. Here's the diag

servraid-diagnostics-20231211-1228.zip

JorgeB · December 11, 2023

ATA3 is currently disk6.

Nodiaque · December 11, 2023

ah, yeah it's in the cage. I guess I have to wait for the rebuild to end before moving another drive

Nodiaque · December 11, 2023

quick question, where can I see that in the log?

JorgeB · December 11, 2023

lsscsi.txt maps the ATA# to the sdX identifier.

Nodiaque · December 11, 2023

oh nice thanks, I'll keep an eye on that!

Nodiaque · December 11, 2023

To remove the HDD from the cage and connect them directly, I need to reassign them. Since I have a rebuild in progress, I cannot stop the rebuild, reassign all drive before the rebuld is finish right?

itimpi · December 11, 2023

Just now, Nodiaque said:

To remove the HDD from the cage and connect them directly, I need to reassign them. Since I have a rebuild in progress, I cannot stop the rebuild, reassign all drive before the rebuld is finish right?

You should not need to reassign them as Unraid recognises drives by their serial number rather than by where they are connected.

Nodiaque · December 11, 2023

hmmm that's not what happened earlier when I tried it. I'm afraid to get another "disconnect drive" again. I'll wait for the rebuild even if I have multiple days because of the disconnection on the read drives... safer this way

Unmountable drives after shutdown

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation