caplam Posted March 7, 2021 Author Share Posted March 7, 2021 ok thank you for following up my steps. currently running 5% rescued and going on... Quote No, you'd need to do a new config and re-sync parity after re-sync i can add rescued files from ud device ? Quote Link to comment
JorgeB Posted March 7, 2021 Share Posted March 7, 2021 Not clear to me if you plan on using the new cloned disk as disk3, if yes you'd do a new config with it and re-sync parity, all data on there will be part of the array, if you plan to use another disk for disk3 than you'd need to replace and rebuild, then format (since it would still be unmountable) and then copy the data from the cloned disk to the array using UD. Quote Link to comment
caplam Posted March 7, 2021 Author Share Posted March 7, 2021 i planned the second option but i was not aware of the first which seems more simple. but i think i have to check files integrity on the cloned disk before syncing parity. Quote Link to comment
caplam Posted March 8, 2021 Author Share Posted March 8, 2021 ddrescue went well and i recovered all files. I did a new config with the new disk3. Parity sync started but disk4 had been disabled. So i guess i have to stop array to replace disk4. I have no 6Tb disk spare. Quote Link to comment
caplam Posted March 8, 2021 Author Share Posted March 8, 2021 i start thinking trouble is with the hotplug enclosure in which i have disk 2, 3 and 4. Quote Link to comment
caplam Posted March 8, 2021 Author Share Posted March 8, 2021 when i run xfs_repair -n on disk4 i have an input/output error. Quote Link to comment
caplam Posted March 8, 2021 Author Share Posted March 8, 2021 i'm confused. I replaced sata cable, changed sata port and still I/O error. So i changed enclosure for disk 4; and still I/O error when doing check _n with gui. So in a terminal i ran xfs_repair -n on disk4 in the external enclosure and it's running but with gui i have i/o error. I have good hope for disk 4 as the internal enclosure in which disk4 was primarily had a problem. When i pulled disk 4 the latch of the enclosure was broken and the disk was not maintained as it should. Quote Link to comment
caplam Posted March 8, 2021 Author Share Posted March 8, 2021 i have UDMA CRC error count at 15 on disk4 godzilla-diagnostics-20210308-1202.zip Quote Link to comment
JorgeB Posted March 8, 2021 Share Posted March 8, 2021 Disk looks fine, but since both parity drives are invalid it can't be emulated, if you already replaced cables do another new config to enable all the disks and start a new parity sync. Quote Link to comment
caplam Posted March 8, 2021 Author Share Posted March 8, 2021 (edited) thank you 😃 parity sync is started; but it's very slow: 20MB/S edit My bad docker service started and that was slowing down the process. now it runs at 150MB/S. Now i also have udma crc errors on cache pools (both vm and docker pool)🤥 The controller on the mainboard could also have a problem. Edited March 8, 2021 by caplam Quote Link to comment
JorgeB Posted March 8, 2021 Share Posted March 8, 2021 CRC errors don't reset but it they keep increasing there's still a problem. Quote Link to comment
caplam Posted March 8, 2021 Author Share Posted March 8, 2021 (edited) i have that multiple times in syslog. It keeps coming up. How can i know which controller/port is involved? Mar 8 12:27:32 godzilla kernel: ata2: SError: { HostInt 10B8B LinkSeq } Mar 8 12:27:32 godzilla kernel: ata2.00: failed command: WRITE DMA EXT Mar 8 12:27:32 godzilla kernel: ata2.00: cmd 35/00:60:f8:c3:4d/00:00:1a:00:00/e0 tag 30 dma 49152 out Mar 8 12:27:32 godzilla kernel: res 50/00:00:17:85:4d/00:00:1a:00:00/e0 Emask 0x50 (ATA bus error) Mar 8 12:27:32 godzilla kernel: ata2.00: status: { DRDY } Mar 8 12:27:32 godzilla kernel: ata2: hard resetting link Mar 8 12:27:32 godzilla kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Mar 8 12:27:32 godzilla kernel: ata2.00: configured for UDMA/33 Mar 8 12:27:32 godzilla kernel: ata2: EH complete Mar 8 12:29:00 godzilla kernel: ata2.00: exception Emask 0x40 SAct 0x0 SErr 0x880800 action 0x6 Mar 8 12:29:00 godzilla kernel: ata2.00: irq_stat 0x40000001 Mar 8 12:29:00 godzilla kernel: ata2: SError: { HostInt 10B8B LinkSeq } Mar 8 12:29:00 godzilla kernel: ata2.00: failed command: FLUSH CACHE EXT Mar 8 12:29:00 godzilla kernel: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 10 Mar 8 12:29:00 godzilla kernel: res 51/04:00:00:00:00/00:00:00:00:00/a0 Emask 0x41 (internal error) Mar 8 12:29:00 godzilla kernel: ata2.00: status: { DRDY ERR } Mar 8 12:29:00 godzilla kernel: ata2.00: error: { ABRT } Mar 8 12:29:00 godzilla kernel: ata2: hard resetting link Mar 8 12:29:00 godzilla kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Mar 8 12:29:00 godzilla kernel: ata2.00: configured for UDMA/33 Mar 8 12:29:00 godzilla kernel: ata2.00: device reported invalid CHS sector 0 Mar 8 12:29:00 godzilla kernel: ata2: EH complete Mar 8 12:29:16 godzilla kernel: ata2.00: exception Emask 0x40 SAct 0x0 SErr 0x880800 action 0x6 Mar 8 12:29:16 godzilla kernel: ata2.00: irq_stat 0x40000001 Mar 8 12:29:16 godzilla kernel: ata2: SError: { HostInt 10B8B LinkSeq } Mar 8 12:29:16 godzilla kernel: ata2.00: failed command: FLUSH CACHE EXT Mar 8 12:29:16 godzilla kernel: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 22 Mar 8 12:29:16 godzilla kernel: res 51/04:00:00:00:00/00:00:00:00:00/a0 Emask 0x41 (internal error) Mar 8 12:29:16 godzilla kernel: ata2.00: status: { DRDY ERR } Mar 8 12:29:16 godzilla kernel: ata2.00: error: { ABRT } Mar 8 12:29:16 godzilla kernel: ata2: hard resetting link Mar 8 12:29:16 godzilla kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Mar 8 12:29:16 godzilla kernel: ata2.00: configured for UDMA/33 Mar 8 12:29:16 godzilla kernel: ata2.00: device reported invalid CHS sector 0 Mar 8 12:29:16 godzilla kernel: ata2: EH complete Mar 8 12:29:46 godzilla kernel: ata2.00: exception Emask 0x40 SAct 0x0 SErr 0x880800 action 0x6 Mar 8 12:29:46 godzilla kernel: ata2.00: irq_stat 0x40000001 Mar 8 12:29:46 godzilla kernel: ata2: SError: { HostInt 10B8B LinkSeq } Mar 8 12:29:46 godzilla kernel: ata2.00: failed command: WRITE DMA Mar 8 12:29:46 godzilla kernel: ata2.00: cmd ca/00:08:98:05:7b/00:00:00:00:00/eb tag 18 dma 4096 out Mar 8 12:29:46 godzilla kernel: res 51/04:08:98:05:7b/00:00:0b:00:00/eb Emask 0x41 (internal error) Mar 8 12:29:46 godzilla kernel: ata2.00: status: { DRDY ERR } Mar 8 12:29:46 godzilla kernel: ata2.00: error: { ABRT } Mar 8 12:29:46 godzilla kernel: ata2: hard resetting link Mar 8 12:29:46 godzilla kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Mar 8 12:29:46 godzilla kernel: ata2.00: configured for UDMA/33 Mar 8 12:29:46 godzilla kernel: ata2: EH complete Mar 8 12:30:02 godzilla kernel: ata2.00: exception Emask 0x50 SAct 0x0 SErr 0x880800 action 0x6 frozen Mar 8 12:30:02 godzilla kernel: ata2.00: irq_stat 0x08000000, interface fatal error Mar 8 12:30:02 godzilla kernel: ata2: SError: { HostInt 10B8B LinkSeq } Mar 8 12:30:02 godzilla kernel: ata2.00: failed command: WRITE DMA EXT Mar 8 12:30:02 godzilla kernel: ata2.00: cmd 35/00:08:f8:37:a9/00:00:36:00:00/e0 tag 20 dma 4096 out Mar 8 12:30:02 godzilla kernel: res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x50 (ATA bus error) Mar 8 12:30:02 godzilla kernel: ata2.00: status: { DRDY } Mar 8 12:30:02 godzilla kernel: ata2: hard resetting link Mar 8 12:30:02 godzilla kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Mar 8 12:30:02 godzilla kernel: ata2.00: configured for UDMA/33 Mar 8 12:30:02 godzilla kernel: ata2: EH complete edit : found: it's the port used for the m2-sata ssd of the vm-pool. I certainly had it "moved" when reinstalling the cpu2 daughter card (the server is a Z620 with 2cpu). Edited March 8, 2021 by caplam Quote Link to comment
caplam Posted March 8, 2021 Author Share Posted March 8, 2021 that's it. The sata connector was almost unplugged!!! Mar 8 13:40:10 godzilla kernel: ata2.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen Mar 8 13:40:10 godzilla kernel: ata2.00: irq_stat 0x08000000, interface fatal error Mar 8 13:40:10 godzilla kernel: ata2: SError: { UnrecovData HostInt 10B8B BadCRC } Mar 8 13:40:10 godzilla kernel: ata2.00: failed command: READ DMA EXT Mar 8 13:40:10 godzilla kernel: ata2.00: cmd 25/00:00:08:1b:b8/00:01:36:00:00/e0 tag 28 dma 131072 in Mar 8 13:40:10 godzilla kernel: res 50/00:00:8f:20:b8/00:00:36:00:00/e0 Emask 0x50 (ATA bus error) Mar 8 13:40:10 godzilla kernel: ata2.00: status: { DRDY } Mar 8 13:40:10 godzilla kernel: ata2: hard resetting link Mar 8 13:40:10 godzilla kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Mar 8 13:40:10 godzilla kernel: ata2.00: configured for UDMA/33 Mar 8 13:40:10 godzilla kernel: sd 3:0:0:0: [sdd] tag#28 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s Mar 8 13:40:10 godzilla kernel: sd 3:0:0:0: [sdd] tag#28 Sense Key : 0x5 [current] Mar 8 13:40:10 godzilla kernel: sd 3:0:0:0: [sdd] tag#28 ASC=0x21 ASCQ=0x4 Mar 8 13:40:10 godzilla kernel: sd 3:0:0:0: [sdd] tag#28 CDB: opcode=0x28 28 00 36 b8 1b 08 00 01 00 00 Mar 8 13:40:10 godzilla kernel: blk_update_request: I/O error, dev sdd, sector 918035208 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0 Mar 8 13:40:10 godzilla kernel: ata2: EH complete Mar 8 13:40:14 godzilla kernel: ata2: SATA link down (SStatus 0 SControl 300) Mar 8 13:40:20 godzilla kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Mar 8 13:40:20 godzilla kernel: ata2.00: configured for UDMA/33 Mar 8 13:40:21 godzilla root: Total Spundown: 0 it's running ok now. I feel like a black cat. 🙃 1 Quote Link to comment
caplam Posted March 9, 2021 Author Share Posted March 9, 2021 (edited) good news. Server is now running fine. Parity sync is now finished. I'll run preclear on the disks used to see if they really failing. Tank your @JorgeB for your help 😃 Edited March 9, 2021 by caplam 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.