Nodiaque Posted December 10, 2023 Share Posted December 10, 2023 (edited) Hello everyone, I'm on unraid 6.12.5 and had to shutdown my server. After restart, I'm met with this: Did I just lost 3 drives full of data (and then all data from these array)? I had read error on other drive (not even these) before shutdown. edit: I tried this and get bad superblock mount -o rescue=all,ro /dev/sdf /temp mount: /temp: wrong fs type, bad option, bad superblock on /dev/sdf, missing codepage or helper program, or other error. dmesg(1) may have more information after failed mount system call. servraid-diagnostics-20231210-0933.zip dmesg show: 679.417315] squashfs: Unknown parameter 'rescue' [ 679.417367] fuseblk: Unknown parameter 'rescue' [ 679.417376] UDF-fs: bad mount option "rescue=all" or missing value [ 679.417483] xfs: Unknown parameter 'rescue' I tried to follow these step Edited December 10, 2023 by Nodiaque Quote Link to comment
itimpi Posted December 10, 2023 Share Posted December 10, 2023 No reason to assume just because a drive shows as unmountable that anything has been lost. Handling of unmountable drives is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page. The Unraid OS->Manual section in particular covers most features of the current Unraid release. the question, however, is why the drives went unmountable. Probably a good idea to attach your system’s diagnostics zip file to your next post in this thread to see if we can spot anything. Quote Link to comment
Nodiaque Posted December 10, 2023 Author Share Posted December 10, 2023 Hello, I did attached the diagnostic file. Maybe we crossed path between my multiple edit on attempting to fix it using another thread. I'll check the manual you linked. I just ran a xfs repair option from the gui in maintenance mode with -n, gave me that Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 7 - agno = 3 - agno = 8 - agno = 6 - agno = 5 - agno = 9 - agno = 11 - agno = 13 - agno = 12 - agno = 10 - agno = 14 - agno = 1 - agno = 4 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... Maximum metadata LSN (3:2197135) is ahead of log (3:2193676). Would format log to cycle 6. No modify flag set, skipping filesystem flush and exiting. Quote Link to comment
itimpi Posted December 10, 2023 Share Posted December 10, 2023 No indication that the repair process is having problems. If you rerun without -n and adding -L iit should repair things so that when you restart the array in normal mode it mounts OK. Quote Link to comment
Nodiaque Posted December 10, 2023 Author Share Posted December 10, 2023 ok, that's what I'm doing right now. I got one disk that had more error. Shutdown was initiated because I woke up with read error this morning on two drive and I wasn't able to run the smart on one of them. I then saw syslog daemon wasn't logging anything for like a week (this happen randomly) so I restarte the daemon and started having weird error. When I triggered a shutdown from GUI, nothing happened. So I shutdown all dockers and vm and pressed power (just press). This initiated a shutdown and it "properly" shutdown. The disk unmountable aren't (well, 1 is) the one with the read error. I'm gonna scan the other drives while I'm in maintenance just for good measure. the error I had this morning in log was thid: Dec 10 09:03:40 ServRaid kernel: sd 7:0:0:0: [sdh] tag#17 CDB: opcode=0x88 88 00 00 00 00 01 80 43 18 20 00 00 01 00 00 00 Dec 10 09:03:40 ServRaid kernel: I/O error, dev sdh, sector 6446848032 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 2 Dec 10 09:03:40 ServRaid kernel: ata6: EH complete Dec 10 09:03:40 ServRaid kernel: ata3.00: exception Emask 0x50 SAct 0x2062000 SErr 0x4090800 action 0xe frozen Dec 10 09:03:40 ServRaid kernel: ata3.00: irq_stat 0x00400040, connection status changed Dec 10 09:03:40 ServRaid kernel: ata3: SError: { HostInt PHYRdyChg 10B8B DevExch } Dec 10 09:03:40 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED Dec 10 09:03:40 ServRaid kernel: ata3.00: cmd 60/00:68:e8:c5:ad/01:00:27:05:00/40 tag 13 ncq dma 131072 in Dec 10 09:03:40 ServRaid kernel: res 40/00:90:60:bf:13/00:00:af:05:00/40 Emask 0x50 (ATA bus error) Dec 10 09:03:40 ServRaid kernel: ata3.00: status: { DRDY } Dec 10 09:03:40 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED Dec 10 09:03:40 ServRaid kernel: ata3.00: cmd 60/00:88:60:be:13/01:00:af:05:00/40 tag 17 ncq dma 131072 in Dec 10 09:03:40 ServRaid kernel: res 40/00:90:60:bf:13/00:00:af:05:00/40 Emask 0x50 (ATA bus error) Dec 10 09:03:40 ServRaid kernel: ata3.00: status: { DRDY } Dec 10 09:03:40 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED Dec 10 09:03:40 ServRaid kernel: ata3.00: cmd 60/00:90:60:bf:13/01:00:af:05:00/40 tag 18 ncq dma 131072 in Dec 10 09:03:40 ServRaid kernel: res 40/00:90:60:bf:13/00:00:af:05:00/40 Emask 0x50 (ATA bus error) Dec 10 09:03:40 ServRaid kernel: ata3.00: status: { DRDY } Dec 10 09:03:40 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED Dec 10 09:03:40 ServRaid kernel: ata3.00: cmd 60/00:c8:e8:c6:ad/01:00:27:05:00/40 tag 25 ncq dma 131072 in Dec 10 09:03:40 ServRaid kernel: res 40/00:90:60:bf:13/00:00:af:05:00/40 Emask 0x50 (ATA bus error) Dec 10 09:03:40 ServRaid kernel: ata3.00: status: { DRDY } Dec 10 09:03:40 ServRaid kernel: ata3: hard resetting link Dec 10 09:03:41 ServRaid kernel: ata5: COMRESET failed (errno=-16) Dec 10 09:03:41 ServRaid kernel: ata5: hard resetting link Dec 10 09:03:46 ServRaid kernel: ata3: link is slow to respond, please be patient (ready=0) Dec 10 09:03:46 ServRaid kernel: ata5: link is slow to respond, please be patient (ready=0) Dec 10 09:03:48 ServRaid kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Dec 10 09:03:49 ServRaid kernel: ata5.00: configured for UDMA/33 Dec 10 09:03:49 ServRaid kernel: ata5: EH complete Dec 10 09:03:49 ServRaid kernel: ata6.00: exception Emask 0x50 SAct 0x20000 SErr 0x4090800 action 0xe frozen Dec 10 09:03:49 ServRaid kernel: ata6.00: irq_stat 0x00400040, connection status changed Dec 10 09:03:49 ServRaid kernel: ata6: SError: { HostInt PHYRdyChg 10B8B DevExch } Dec 10 09:03:49 ServRaid kernel: ata6.00: failed command: READ FPDMA QUEUED Dec 10 09:03:49 ServRaid kernel: ata6.00: cmd 60/00:88:78:1d:91/01:00:2a:00:00/40 tag 17 ncq dma 131072 in Dec 10 09:03:49 ServRaid kernel: res 40/00:88:78:1d:91/00:00:2a:00:00/40 Emask 0x50 (ATA bus error) Dec 10 09:03:49 ServRaid kernel: ata6.00: status: { DRDY } Dec 10 09:03:49 ServRaid kernel: ata6: hard resetting link Dec 10 09:03:50 ServRaid kernel: ata3: COMRESET failed (errno=-16) Dec 10 09:03:50 ServRaid kernel: ata3: hard resetting link Dec 10 09:03:55 ServRaid kernel: ata6: link is slow to respond, please be patient (ready=0) Dec 10 09:03:56 ServRaid kernel: ata3: link is slow to respond, please be patient (ready=0) Dec 10 09:03:58 ServRaid kernel: ata5.00: exception Emask 0x50 SAct 0xc00 SErr 0x4090800 action 0xe frozen Dec 10 09:03:58 ServRaid kernel: ata5.00: irq_stat 0x00400040, connection status changed Dec 10 09:03:58 ServRaid kernel: ata5: SError: { HostInt PHYRdyChg 10B8B DevExch } Dec 10 09:03:58 ServRaid kernel: ata5.00: failed command: READ FPDMA QUEUED Dec 10 09:03:58 ServRaid kernel: ata5.00: cmd 60/00:50:c0:94:af/01:00:a3:06:00/40 tag 10 ncq dma 131072 in Dec 10 09:03:58 ServRaid kernel: res 40/00:58:c0:95:af/00:00:a3:06:00/40 Emask 0x50 (ATA bus error) Dec 10 09:03:58 ServRaid kernel: ata5.00: status: { DRDY } Dec 10 09:03:58 ServRaid kernel: ata5.00: failed command: READ FPDMA QUEUED Dec 10 09:03:58 ServRaid kernel: ata5.00: cmd 60/00:58:c0:95:af/01:00:a3:06:00/40 tag 11 ncq dma 131072 in Dec 10 09:03:58 ServRaid kernel: res 40/00:58:c0:95:af/00:00:a3:06:00/40 Emask 0x50 (ATA bus error) Dec 10 09:03:58 ServRaid kernel: ata5.00: status: { DRDY } Dec 10 09:03:58 ServRaid kernel: ata5: hard resetting link Dec 10 09:03:59 ServRaid kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Dec 10 09:03:59 ServRaid kernel: ata3.00: configured for UDMA/33 Dec 10 09:03:59 ServRaid kernel: ata3: EH complete Dec 10 09:03:59 ServRaid kernel: ata6: COMRESET failed (errno=-16) Dec 10 09:03:59 ServRaid kernel: ata6: hard resetting link Quote Link to comment
itimpi Posted December 10, 2023 Share Posted December 10, 2023 That syslog snippet look like what we normally see if there is a cabling or power issue with the dtive. Quote Link to comment
Nodiaque Posted December 10, 2023 Author Share Posted December 10, 2023 Yeah, after a restart it seems fixed but it just started happening again. I'm wondering if it's my backplane that is dead. All drive with this behavior are into one of this https://www.amazon.ca/dp/B00OUSU8MI?ref=ppx_yo2ov_dt_b_product_details&th=1 Quote Link to comment
itimpi Posted December 10, 2023 Share Posted December 10, 2023 Do you have 1 or 2 power connectors attached? You may well need 2 (ideally from different PSU leads) to get reliable operation. Quote Link to comment
Nodiaque Posted December 10, 2023 Author Share Posted December 10, 2023 2 connected but it's Y splitter from another connector. Stupid Dell Motherboard that only give 4 plugs for a 12 sata motherboard... Right now I didn't see anymore error, but I do see that 2 of the ata are 3.0gps while 1 is 6gbps. Somewhere next week, I'll swap all of the sata cable. I had new one in the box that I didn't used. It's true though that problem started since I have this box. Before that, I had all of the hard drive (well only 3) all connected to separate power cable in an old hdd cage that I salvage from another pc. I though this box was "better" but I might be wrong. Quote Link to comment
Nodiaque Posted December 11, 2023 Author Share Posted December 11, 2023 I'm pretty sure it's that external box I bought. Located all drive that are having issue on this thing. Now if I want to take them out and put them back directly on sata, do I just do reassign while array down? Those are hot swappable so I can swap them 1 by 1. Since they change dev id when I moved them, I'll change them 1 by 1 so I don't mix them. Quote Link to comment
Nodiaque Posted December 11, 2023 Author Share Posted December 11, 2023 (edited) I don't know what happened, I now have a red X on a drive edit: I did an xfs_repair and I can manually mount the drive in shell no problem. servraid-diagnostics-20231211-0940.zip new edit: Ok so I followed what I found here: Stop the array, remove device, start the array, stop the array, add device, start array. I see a rebuild has started. My parity was checked 2 weeks ago so I'm not too concerned with it. I just hope the external enclosure won't cause any problem while it's rebuilding so I can then put all those drive outside of the dammed enclosure next edit: Dam, I'm getting again the same error I had about link down..... Can I stop the rebuild process, put the hard drive out of the cage and assign it back for the new rebuilt or since a rebuild already started, I'm doomed? Edited December 11, 2023 by Nodiaque Quote Link to comment
JorgeB Posted December 11, 2023 Share Posted December 11, 2023 1 hour ago, Nodiaque said: next edit: Dam, I'm getting again the same error I had about link down..... Can I stop the rebuild process, put the hard drive out of the cage and assign it back for the new rebuilt or since a rebuild already started, I'm doomed? It's OK to cancel and start over, as long as the cage doesn't interfere with the drive, like some USB cages do. Quote Link to comment
Nodiaque Posted December 11, 2023 Author Share Posted December 11, 2023 ok, well in fact, the cage is the problem. I've removed the drive out of it but the rebuild need 2 of the other drive in the cage. It non stop spam syslog with Dec 11 10:54:12 ServRaid kernel: ata3.00: exception Emask 0x50 SAct 0xff0000 SErr 0x4890800 action 0xe frozen Dec 11 10:54:12 ServRaid kernel: ata3.00: irq_stat 0x0d400040, interface fatal error, connection status changed Dec 11 10:54:12 ServRaid kernel: ata3: SError: { HostInt PHYRdyChg 10B8B LinkSeq DevExch } Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:80:e0:5e:86/05:00:13:00:00/40 tag 16 ncq dma 688128 in Dec 11 10:54:12 ServRaid kernel: res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error) Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY } Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:88:20:64:86/05:00:13:00:00/40 tag 17 ncq dma 688128 in Dec 11 10:54:12 ServRaid kernel: res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error) Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY } Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:90:60:69:86/05:00:13:00:00/40 tag 18 ncq dma 688128 in Dec 11 10:54:12 ServRaid kernel: res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error) Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY } Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:98:a0:6e:86/05:00:13:00:00/40 tag 19 ncq dma 688128 in Dec 11 10:54:12 ServRaid kernel: res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error) Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY } Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:a0:e0:73:86/05:00:13:00:00/40 tag 20 ncq dma 688128 in Dec 11 10:54:12 ServRaid kernel: res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error) Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY } Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/78:a8:20:79:86/00:00:13:00:00/40 tag 21 ncq dma 61440 in Dec 11 10:54:12 ServRaid kernel: res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error) Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY } Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/40:b0:98:79:86/05:00:13:00:00/40 tag 22 ncq dma 688128 in Dec 11 10:54:12 ServRaid kernel: res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error) Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY } Dec 11 10:54:12 ServRaid kernel: ata3.00: failed command: READ FPDMA QUEUED Dec 11 10:54:12 ServRaid kernel: ata3.00: cmd 60/08:b8:d8:7e:86/00:00:13:00:00/40 tag 23 ncq dma 4096 in Dec 11 10:54:12 ServRaid kernel: res 40/00:a8:20:79:86/00:00:13:00:00/40 Emask 0x50 (ATA bus error) Dec 11 10:54:12 ServRaid kernel: ata3.00: status: { DRDY } Dec 11 10:54:12 ServRaid kernel: ata3: hard resetting link Dec 11 10:54:18 ServRaid kernel: ata3: link is slow to respond, please be patient (ready=0) Dec 11 10:54:22 ServRaid kernel: ata3: COMRESET failed (errno=-16) Dec 11 10:54:22 ServRaid kernel: ata3: hard resetting link Dec 11 10:54:28 ServRaid kernel: ata3: link is slow to respond, please be patient (ready=0) Dec 11 10:54:31 ServRaid kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Dec 11 10:54:31 ServRaid kernel: ata3.00: configured for UDMA/33 Dec 11 10:54:31 ServRaid kernel: ata3: EH complete Is there something I can do to ID who is ata3? If I only I could remove all HDD from the cage and then begin the rebuild on that one drive... right now it's saying 6 days for the rebuilt at 16mbps Quote Link to comment
JorgeB Posted December 11, 2023 Share Posted December 11, 2023 49 minutes ago, Nodiaque said: Is there something I can do to ID who is ata3? Is that inside the cage? If yes remove it, if not replace cables. Quote Link to comment
Nodiaque Posted December 11, 2023 Author Share Posted December 11, 2023 (edited) Yes I'm pretty sure it's one drive inside the cage, but I don't know who is ATA3? Before that I had also ata5 error so the drive I removed from the cage is probably ata5 (the one I'm rebuilding) Edited December 11, 2023 by Nodiaque Quote Link to comment
JorgeB Posted December 11, 2023 Share Posted December 11, 2023 19 minutes ago, Nodiaque said: but I don't know who is ATA3? You'd need to post the diags for us to be able to tell. Quote Link to comment
Nodiaque Posted December 11, 2023 Author Share Posted December 11, 2023 Oh I though there was a command or something I could easily check. Here's the diag servraid-diagnostics-20231211-1228.zip Quote Link to comment
JorgeB Posted December 11, 2023 Share Posted December 11, 2023 ATA3 is currently disk6. Quote Link to comment
Nodiaque Posted December 11, 2023 Author Share Posted December 11, 2023 ah, yeah it's in the cage. I guess I have to wait for the rebuild to end before moving another drive Quote Link to comment
Nodiaque Posted December 11, 2023 Author Share Posted December 11, 2023 quick question, where can I see that in the log? Quote Link to comment
JorgeB Posted December 11, 2023 Share Posted December 11, 2023 lsscsi.txt maps the ATA# to the sdX identifier. Quote Link to comment
Nodiaque Posted December 11, 2023 Author Share Posted December 11, 2023 oh nice thanks, I'll keep an eye on that! Quote Link to comment
Nodiaque Posted December 11, 2023 Author Share Posted December 11, 2023 To remove the HDD from the cage and connect them directly, I need to reassign them. Since I have a rebuild in progress, I cannot stop the rebuild, reassign all drive before the rebuld is finish right? Quote Link to comment
itimpi Posted December 11, 2023 Share Posted December 11, 2023 Just now, Nodiaque said: To remove the HDD from the cage and connect them directly, I need to reassign them. Since I have a rebuild in progress, I cannot stop the rebuild, reassign all drive before the rebuld is finish right? You should not need to reassign them as Unraid recognises drives by their serial number rather than by where they are connected. Quote Link to comment
Nodiaque Posted December 11, 2023 Author Share Posted December 11, 2023 hmmm that's not what happened earlier when I tried it. I'm afraid to get another "disconnect drive" again. I'll wait for the rebuild even if I have multiple days because of the disconnection on the read drives... safer this way Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.