Jump to content

Unmountable drives after shutdown


Nodiaque

Recommended Posts

Hello everyone,

 

I'm on unraid 6.12.5 and had to shutdown my server. After restart, I'm met with this:

image.thumb.png.a96f19374c2f84619678c2a2f9f17025.png

 

image.png.cc1d83b27157922dc32243a0f683a002.png

 

Did I just lost 3 drives full of data (and then all data from these array)? I had read error on other drive (not even these) before shutdown.

 

edit: I tried this and get bad superblock

 

mount -o rescue=all,ro /dev/sdf /temp
mount: /temp: wrong fs type, bad option, bad superblock on /dev/sdf, missing codepage or helper program, or other error.
       dmesg(1) may have more information after failed mount system call.

servraid-diagnostics-20231210-0933.zip

 

dmesg show:

  679.417315] squashfs: Unknown parameter 'rescue'
[  679.417367] fuseblk: Unknown parameter 'rescue'
[  679.417376] UDF-fs: bad mount option "rescue=all" or missing value
[  679.417483] xfs: Unknown parameter 'rescue'

 

I tried to follow these step

 

 

Edited by Nodiaque
Link to comment


No reason to assume just because a drive shows as unmountable that anything has been lost.    Handling of unmountable drives is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page. The Unraid OS->Manual section in particular covers most features of the current Unraid release.

 

the question, however, is why the drives went unmountable.   Probably a good idea to attach your system’s diagnostics zip file to your next post in this thread to see if we can spot anything.

Link to comment

Hello, I did attached the diagnostic file. Maybe we crossed path between my multiple edit on attempting to fix it using another thread. I'll check the manual you linked.

 

I just ran a xfs repair option from the gui in maintenance mode with -n, gave me that

 

 


Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used.  Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 7
        - agno = 3
        - agno = 8
        - agno = 6
        - agno = 5
        - agno = 9
        - agno = 11
        - agno = 13
        - agno = 12
        - agno = 10
        - agno = 14
        - agno = 1
        - agno = 4
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
Maximum metadata LSN (3:2197135) is ahead of log (3:2193676).
Would format log to cycle 6.
No modify flag set, skipping filesystem flush and exiting.

 

Link to comment

ok, that's what I'm doing right now. I got one disk that had more error. Shutdown was initiated because I woke up with read error this morning on two drive and I wasn't able to run the smart on one of them. I then saw syslog daemon wasn't logging anything for like a week (this happen randomly) so I restarte the daemon and started having weird error. When I triggered a shutdown from GUI, nothing happened. So I shutdown all dockers and vm and pressed power (just press). This initiated a shutdown and it "properly" shutdown. The disk unmountable aren't (well, 1 is) the one with the read error.

 

I'm gonna scan the other drives while I'm in maintenance just for good measure.

 

the error I had this morning in log was thid:

Dec 10 09:03:40 ServRaid kernel: sd 7:0:0:0: [sdh] tag#17 CDB: opcode=0x88 88 00 00 00 00 01 80 43 18 20 00 00 01 00 00 00
Dec 10 09:03:40 ServRaid kernel: I/O error, dev sdh, sector 6446848032 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 2
Dec 10 09:03:40 ServRaid kernel: ata6: EH complete
Dec 10 09:03:40 ServRaid kernel: ata3.00: exception Emask 0x50 SAct 0x2062000 SErr 0x4090800 action 0xe frozen
Dec 10 09:03:40 ServRaid kernel: ata3.00: irq_stat 0x00400040, connection status changed
Dec 10 09:03:40 ServRaid kernel: ata3: SError: { HostInt PHYRdyChg 10B8B DevExch }
Dec 10 09:03:40 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:40 ServRaid kernel: ata3.00: cmd 60/00:68:e8:c5:ad/01:00:27:05:00/40 tag 13 ncq dma 131072 in
Dec 10 09:03:40 ServRaid kernel:         res 40/00:90:60:bf:13/00:00:af:05:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:40 ServRaid kernel: ata3.00: status: { DRDY }
Dec 10 09:03:40 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:40 ServRaid kernel: ata3.00: cmd 60/00:88:60:be:13/01:00:af:05:00/40 tag 17 ncq dma 131072 in
Dec 10 09:03:40 ServRaid kernel:         res 40/00:90:60:bf:13/00:00:af:05:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:40 ServRaid kernel: ata3.00: status: { DRDY }
Dec 10 09:03:40 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:40 ServRaid kernel: ata3.00: cmd 60/00:90:60:bf:13/01:00:af:05:00/40 tag 18 ncq dma 131072 in
Dec 10 09:03:40 ServRaid kernel:         res 40/00:90:60:bf:13/00:00:af:05:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:40 ServRaid kernel: ata3.00: status: { DRDY }
Dec 10 09:03:40 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:40 ServRaid kernel: ata3.00: cmd 60/00:c8:e8:c6:ad/01:00:27:05:00/40 tag 25 ncq dma 131072 in
Dec 10 09:03:40 ServRaid kernel:         res 40/00:90:60:bf:13/00:00:af:05:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:40 ServRaid kernel: ata3.00: status: { DRDY }
Dec 10 09:03:40 ServRaid kernel: ata3: hard resetting link
Dec 10 09:03:41 ServRaid kernel: ata5: COMRESET failed (errno=-16)
Dec 10 09:03:41 ServRaid kernel: ata5: hard resetting link
Dec 10 09:03:46 ServRaid kernel: ata3: link is slow to respond, please be patient (ready=0)
Dec 10 09:03:46 ServRaid kernel: ata5: link is slow to respond, please be patient (ready=0)
Dec 10 09:03:48 ServRaid kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 10 09:03:49 ServRaid kernel: ata5.00: configured for UDMA/33
Dec 10 09:03:49 ServRaid kernel: ata5: EH complete
Dec 10 09:03:49 ServRaid kernel: ata6.00: exception Emask 0x50 SAct 0x20000 SErr 0x4090800 action 0xe frozen
Dec 10 09:03:49 ServRaid kernel: ata6.00: irq_stat 0x00400040, connection status changed
Dec 10 09:03:49 ServRaid kernel: ata6: SError: { HostInt PHYRdyChg 10B8B DevExch }
Dec 10 09:03:49 ServRaid kernel: ata6.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:49 ServRaid kernel: ata6.00: cmd 60/00:88:78:1d:91/01:00:2a:00:00/40 tag 17 ncq dma 131072 in
Dec 10 09:03:49 ServRaid kernel:         res 40/00:88:78:1d:91/00:00:2a:00:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:49 ServRaid kernel: ata6.00: status: { DRDY }
Dec 10 09:03:49 ServRaid kernel: ata6: hard resetting link
Dec 10 09:03:50 ServRaid kernel: ata3: COMRESET failed (errno=-16)
Dec 10 09:03:50 ServRaid kernel: ata3: hard resetting link
Dec 10 09:03:55 ServRaid kernel: ata6: link is slow to respond, please be patient (ready=0)
Dec 10 09:03:56 ServRaid kernel: ata3: link is slow to respond, please be patient (ready=0)
Dec 10 09:03:58 ServRaid kernel: ata5.00: exception Emask 0x50 SAct 0xc00 SErr 0x4090800 action 0xe frozen
Dec 10 09:03:58 ServRaid kernel: ata5.00: irq_stat 0x00400040, connection status changed
Dec 10 09:03:58 ServRaid kernel: ata5: SError: { HostInt PHYRdyChg 10B8B DevExch }
Dec 10 09:03:58 ServRaid kernel: ata5.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:58 ServRaid kernel: ata5.00: cmd 60/00:50:c0:94:af/01:00:a3:06:00/40 tag 10 ncq dma 131072 in
Dec 10 09:03:58 ServRaid kernel:         res 40/00:58:c0:95:af/00:00:a3:06:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:58 ServRaid kernel: ata5.00: status: { DRDY }
Dec 10 09:03:58 ServRaid kernel: ata5.00: failed command: READ FPDMA QUEUED
Dec 10 09:03:58 ServRaid kernel: ata5.00: cmd 60/00:58:c0:95:af/01:00:a3:06:00/40 tag 11 ncq dma 131072 in
Dec 10 09:03:58 ServRaid kernel:         res 40/00:58:c0:95:af/00:00:a3:06:00/40 Emask 0x50 (ATA bus error)
Dec 10 09:03:58 ServRaid kernel: ata5.00: status: { DRDY }
Dec 10 09:03:58 ServRaid kernel: ata5: hard resetting link
Dec 10 09:03:59 ServRaid kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 10 09:03:59 ServRaid kernel: ata3.00: configured for UDMA/33
Dec 10 09:03:59 ServRaid kernel: ata3: EH complete
Dec 10 09:03:59 ServRaid kernel: ata6: COMRESET failed (errno=-16)
Dec 10 09:03:59 ServRaid kernel: ata6: hard resetting link

 

Link to comment

2 connected but it's Y splitter from another connector. Stupid Dell Motherboard that only give 4 plugs for a 12 sata motherboard... Right now I didn't see anymore error, but I do see that 2 of the ata are 3.0gps while 1 is 6gbps. Somewhere next week, I'll swap all of the sata cable. I had new one in the box that I didn't used. It's true though that problem started since I have this box. Before that, I had all of the hard drive (well only 3) all connected to separate power cable in an old hdd cage that I salvage from another pc. I though this box was "better" but I might be wrong.

Link to comment

I'm pretty sure it's that external box I bought. Located all drive that are having issue on this thing. Now if I want to take them out and put them back directly on sata, do I just do reassign while array down? Those are hot swappable so I can swap them 1 by 1. Since they change dev id when I moved them, I'll change them 1 by 1 so I don't mix them.

Link to comment

I don't know what happened, I now have a red X on a drive

image.png.c5d0264685bdce2b368129d10d58cd4b.png

 

edit: I did an xfs_repair and I can manually mount the drive in shell no problem.

servraid-diagnostics-20231211-0940.zip

 

new edit: Ok so I followed what I found here: 

Stop the array, remove device, start the array, stop the array, add device, start array. I see a rebuild has started. My parity was checked 2 weeks ago so I'm not too concerned with it. I just hope the external enclosure won't cause any problem while it's rebuilding so I can then put all those drive outside of the dammed enclosure

 

next edit: Dam, I'm getting again the same error I had about link down..... Can I stop the rebuild process, put the hard drive out of the cage and assign it back for the new rebuilt or since a rebuild already started, I'm doomed?

 

Edited by Nodiaque
Link to comment
1 hour ago, Nodiaque said:

next edit: Dam, I'm getting again the same error I had about link down..... Can I stop the rebuild process, put the hard drive out of the cage and assign it back for the new rebuilt or since a rebuild already started, I'm doomed?

It's OK to cancel and start over, as long as the cage doesn't interfere with the drive, like some USB cages do.

Link to comment

ok, well in fact, the cage is the problem. I've removed the drive out of it but the rebuild need 2 of the other drive in the cage. It non stop spam syslog with

 

Dec 11 10:54:12 ServRaid kernel: ata3.00: exception Emask 0x50 SAct 0xff0000 SErr 0x4890800 action 0xe frozen
Dec 11 10:54:12 ServRaid kernel: ata3.00: irq_stat 0x0d400040, interface fatal error, connection status changed
Dec 11 10:54:12 ServRaid kernel: ata3: SError: { HostInt PHYRdyChg 10B8B LinkSeq DevExch }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:80:e0:5e:86/05:00:13:00:00/40 tag 16 ncq dma 688128 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:88:20:64:86/05:00:13:00:00/40 tag 17 ncq dma 688128 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:90:60:69:86/05:00:13:00:00/40 tag 18 ncq dma 688128 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:98:a0:6e:86/05:00:13:00:00/40 tag 19 ncq dma 688128 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:a0:e0:73:86/05:00:13:00:00/40 tag 20 ncq dma 688128 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/78:a8:20:79:86/00:00:13:00:00/40 tag 21 ncq dma 61440 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:b0:98:79:86/05:00:13:00:00/40 tag 22 ncq dma 688128 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED
Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/08:b8:d8:7e:86/00:00:13:00:00/40 tag 23 ncq dma 4096 in
Dec 11 10:54:12 ServRaid kernel:         res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error)
Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY }
Dec 11 10:54:12 ServRaid kernel: ata3: hard resetting link
Dec 11 10:54:18 ServRaid kernel: ata3: link is slow to respond, please be patient (ready=0)
Dec 11 10:54:22 ServRaid kernel: ata3: COMRESET failed (errno=-16)
Dec 11 10:54:22 ServRaid kernel: ata3: hard resetting link
Dec 11 10:54:28 ServRaid kernel: ata3: link is slow to respond, please be patient (ready=0)
Dec 11 10:54:31 ServRaid kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 11 10:54:31 ServRaid kernel: ata3.00: configured for UDMA/33
Dec 11 10:54:31 ServRaid kernel: ata3: EH complete

 

Is there something I can do to ID who is ata3?

 

If I only I could remove all HDD from the cage and then begin the rebuild on that one drive... right now it's saying 6 days for the rebuilt at 16mbps

Link to comment
Just now, Nodiaque said:

To remove the HDD from the cage and connect them directly, I need to reassign them. Since I have a rebuild in progress, I cannot stop the rebuild, reassign all drive before the rebuld is finish right?

You should not need to reassign them as Unraid recognises drives by their serial number rather than by where they are connected.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...