knarf0007 Posted February 9, 2022 Share Posted February 9, 2022 Hi All, I have a single Cache SSD drive in my UnRaid setting where all my Docker Applications are running. In the last couple of month ramdomly sector relacations on this drive took place. Last night I tried to transfer all data to a "classical" data disc to save the data. The Mover transfered the data partially and then ended with some error messages: Feb 9 08:40:09 UnRaid-Server kernel: sd 7:0:2:0: [sdj] tag#195 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 cmd_age=0s Feb 9 08:40:09 UnRaid-Server kernel: sd 7:0:2:0: [sdj] tag#195 CDB: opcode=0x28 28 00 03 25 e7 a0 00 00 20 00 Feb 9 08:40:09 UnRaid-Server kernel: blk_update_request: I/O error, dev sdj, sector 52815776 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0 Feb 9 08:40:09 UnRaid-Server kernel: BTRFS error (device dm-11): bdev /dev/mapper/sdj1 errs: wr 37, rd 260090, flush 0, corrupt 0, gen 0 I'm having problem to interpret these errors. Can you help? Thank you very much! Best regards Frank unraid-server-diagnostics-20220209-0854.zip Quote Link to comment
JorgeB Posted February 9, 2022 Share Posted February 9, 2022 Could be a device problem, but the SASLP driver crashed, one of the reasons those controllers are not recommended, connect that SSD to an Intel SATA port, swap with another device if needed, and try again, post new diags if it fails again. Quote Link to comment
knarf0007 Posted February 10, 2022 Author Share Posted February 10, 2022 (edited) Hi All, it's getting worse. My attempt to copy some data from my cache_VM drive to a data-disc using the mover ended in a lot of write errors on the data-disc which is now disabled. Is it possible that my 8port SATA extension card is somehow damaged? As a precaution, I stopped the array. Im afraid that I could really loose data now. Best regards Frank unraid-server-diagnostics-20220210-1947.zip Edited February 10, 2022 by knarf0007 Quote Link to comment
JorgeB Posted February 10, 2022 Share Posted February 10, 2022 Diags are after rebooting so we can't see what happened, but those controllers are not recommended for a long time, one of the reasons is that they tend to drop devices without a reason, and unlike suggested the SSD is still connected there. Quote Link to comment
JorgeB Posted February 10, 2022 Share Posted February 10, 2022 Forgot to mention, disabled disk looks healthy, so likely not a disk problem, most likely controller or a power/cable issue. Quote Link to comment
knarf0007 Posted February 10, 2022 Author Share Posted February 10, 2022 I have 4 SSDs in my setup. I switched all of them to the onboard intel SATA Controller. Everything looks fine for the moment. Nevertheless I find these kind of messages in the log. xfs_inode block 0xe91a8fb0 xfs_inode_buf_verify Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): Unmount and run xfs_repair Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): First 128 bytes of corrupted metadata buffer: Feb 10 21:28:31 UnRaid-Server kernel: 00000000: dd ec fd 51 64 2e 70 e1 bd 28 2c 85 d6 fb eb 4c ...Qd.p..(,....L Feb 10 21:28:31 UnRaid-Server kernel: 00000010: 2e 67 21 fe 25 1e 69 f8 ed d0 e2 7c fa 6f 55 ce .g!.%.i....|.oU. Feb 10 21:28:31 UnRaid-Server kernel: 00000020: af ea 17 e8 de eb 9b f1 a1 e6 36 91 25 58 2f 7b ..........6.%X/{ Feb 10 21:28:31 UnRaid-Server kernel: 00000030: 02 5e 02 e4 f8 82 17 3d 2d 3c 6d d6 c5 0e 0c 31 .^.....=-<m....1 Feb 10 21:28:31 UnRaid-Server kernel: 00000040: db 77 59 bb 85 75 f3 81 fe 75 bd 9c fb 2f b8 55 .wY..u...u.../.U Feb 10 21:28:31 UnRaid-Server kernel: 00000050: b9 07 a0 e4 32 7c 77 aa b4 a8 25 24 68 19 9c 6d ....2|w...%$h..m Feb 10 21:28:31 UnRaid-Server kernel: 00000060: 55 79 86 07 a2 49 ff fd 6c d0 87 57 d1 6b 79 61 Uy...I..l..W.kya Feb 10 21:28:31 UnRaid-Server kernel: 00000070: 1b f3 23 a3 b0 0d 1f 4b e7 d6 8f 9a be b2 a8 bd ..#....K........ Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): Metadata corruption detected at xfs_buf_ioend+0x51/0x284 [xfs], xfs_inode block 0xe91a8fb0 xfs_inode_buf_verify Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): Unmount and run xfs_repair Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): First 128 bytes of corrupted metadata buffer: Feb 10 21:28:31 UnRaid-Server kernel: 00000000: dd ec fd 51 64 2e 70 e1 bd 28 2c 85 d6 fb eb 4c ...Qd.p..(,....L Feb 10 21:28:31 UnRaid-Server kernel: 00000010: 2e 67 21 fe 25 1e 69 f8 ed d0 e2 7c fa 6f 55 ce .g!.%.i....|.oU. Feb 10 21:28:31 UnRaid-Server kernel: 00000020: af ea 17 e8 de eb 9b f1 a1 e6 36 91 25 58 2f 7b ..........6.%X/{ Feb 10 21:28:31 UnRaid-Server kernel: 00000030: 02 5e 02 e4 f8 82 17 3d 2d 3c 6d d6 c5 0e 0c 31 .^.....=-<m....1 Feb 10 21:28:31 UnRaid-Server kernel: 00000040: db 77 59 bb 85 75 f3 81 fe 75 bd 9c fb 2f b8 55 .wY..u...u.../.U Feb 10 21:28:31 UnRaid-Server kernel: 00000050: b9 07 a0 e4 32 7c 77 aa b4 a8 25 24 68 19 9c 6d ....2|w...%$h..m Feb 10 21:28:31 UnRaid-Server kernel: 00000060: 55 79 86 07 a2 49 ff fd 6c d0 87 57 d1 6b 79 61 Uy...I..l..W.kya Feb 10 21:28:31 UnRaid-Server kernel: 00000070: 1b f3 23 a3 b0 0d 1f 4b e7 d6 8f 9a be b2 a8 bd ..#....K........ Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): Metadata corruption detected at xfs_buf_ioend+0x51/0x284 [xfs], xfs_inode block 0xe91a8fb0 xfs_inode_buf_verify Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): Unmount and run xfs_repair Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): First 128 bytes of corrupted metadata buffer: Feb 10 21:28:31 UnRaid-Server kernel: 00000000: dd ec fd 51 64 2e 70 e1 bd 28 2c 85 d6 fb eb 4c ...Qd.p..(,....L Feb 10 21:28:31 UnRaid-Server kernel: 00000010: 2e 67 21 fe 25 1e 69 f8 ed d0 e2 7c fa 6f 55 ce .g!.%.i....|.oU. Feb 10 21:28:31 UnRaid-Server kernel: 00000020: af ea 17 e8 de eb 9b f1 a1 e6 36 91 25 58 2f 7b ..........6.%X/{ Feb 10 21:28:31 UnRaid-Server kernel: 00000030: 02 5e 02 e4 f8 82 17 3d 2d 3c 6d d6 c5 0e 0c 31 .^.....=-<m....1 Feb 10 21:28:31 UnRaid-Server kernel: 00000040: db 77 59 bb 85 75 f3 81 fe 75 bd 9c fb 2f b8 55 .wY..u...u.../.U Feb 10 21:28:31 UnRaid-Server kernel: 00000050: b9 07 a0 e4 32 7c 77 aa b4 a8 25 24 68 19 9c 6d ....2|w...%$h..m Feb 10 21:28:31 UnRaid-Server kernel: 00000060: 55 79 86 07 a2 49 ff fd 6c d0 87 57 d1 6b 79 61 Uy...I..l..W.kya Feb 10 21:28:31 UnRaid-Server kernel: 00000070: 1b f3 23 a3 b0 0d 1f 4b e7 d6 8f 9a be b2 a8 bd ..#....K........ I can't see which disc has the problem. Should I stop the array and run a xfs_repair? And what is the way to revive disc 9? Thanks a lot!!! Frank. unraid-server-diagnostics-20220210-2136.zip Quote Link to comment
knarf0007 Posted February 10, 2022 Author Share Posted February 10, 2022 (edited) Sorry, ignore the question how to revive disc 9, I found the procedure. But is it wise to start a rebuild as long I'm not sure, that erverything with the conroller is o.k? Edited February 10, 2022 by knarf0007 Quote Link to comment
JorgeB Posted February 11, 2022 Share Posted February 11, 2022 dm-6 is disk9, check filesystem. 10 hours ago, knarf0007 said: But is it wise to start a rebuild as long I'm not sure, that erverything with the conroller is o.k? After checking file system you can attempt to rebuild, but with that controller I would recommend using a spare disk, so you still have the old one if it goes bad. Quote Link to comment
knarf0007 Posted February 11, 2022 Author Share Posted February 11, 2022 Hi, the rebuild on an other spare disc is now running. So far erverthing works fine. During the drive swap, I checked all cable connections, maybe there was a small error there. Additionally - to be on the safe side - I ordered one of the recommanded LSI SAS Contoller. However, I doubt that the old controller type has a fundamental problem, because it has been working absolutely flawlessly for years, with hard drives and SSDs. But anyway, if I can further reduce the risk of failure of my "beloved" UnRaid server with a controller change, then I'm happy to do so. 😉 Best Regards Frank Quote Link to comment
knarf0007 Posted February 13, 2022 Author Share Posted February 13, 2022 Hi All, sorry I'm back. Something is still wrong. Everytime I try to copy all the data from my docker ssd I get error messages and the copy process stops. I changed the controller for the SSDs (now all on the internal intel controller). But still I get these strange error messages (11:30am). Why I want to copy all data from the Docker SSD? Because I have two single SSDs, one for all Dockers and one for all VMs. I want to create a Cache Pools instead (for redundency reasons) an run dockers an VMs from this Pool. Any Idea? Thanks! Best regards Frank unraid-server-diagnostics-20220213-1145.zip Quote Link to comment
JorgeB Posted February 13, 2022 Share Posted February 13, 2022 Cache device appears to be failing, sorry I forgot to mention that before, the reason I asked to swap controllers is that a failing device shouldn't drop and crash the controller, like it happens with the mvsas driver, you can run an extended SMART test to confirm, if it failing try to copy everything you can, either manually or using for example ddrescue. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.