Jump to content

SSD Error!?


Recommended Posts

Hi All,

 

I have a single Cache SSD drive in my UnRaid setting where all my Docker Applications are running. In the last couple of month ramdomly sector relacations on this drive took place. Last night I tried to transfer all data to a "classical" data disc to save the data. The Mover transfered the data partially and then ended with some error messages:

 

Feb 9 08:40:09 UnRaid-Server kernel: sd 7:0:2:0: [sdj] tag#195 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 cmd_age=0s
Feb 9 08:40:09 UnRaid-Server kernel: sd 7:0:2:0: [sdj] tag#195 CDB: opcode=0x28 28 00 03 25 e7 a0 00 00 20 00
Feb 9 08:40:09 UnRaid-Server kernel: blk_update_request: I/O error, dev sdj, sector 52815776 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0
Feb 9 08:40:09 UnRaid-Server kernel: BTRFS error (device dm-11): bdev /dev/mapper/sdj1 errs: wr 37, rd 260090, flush 0, corrupt 0, gen 0

 

I'm having problem to interpret these errors. Can you help? Thank you very much!

 

Best regards

Frank

unraid-server-diagnostics-20220209-0854.zip

Link to comment

Hi All,

 

it's getting worse. My attempt to copy some data from my cache_VM drive to a data-disc using the mover ended in a lot of write errors on the data-disc which is now disabled. Is it possible that my 8port SATA extension card is somehow damaged? As a precaution, I stopped the array. Im afraid that I could really loose data now. 

 

Best regards

Frank 

unraid-server-diagnostics-20220210-1947.zip

Edited by knarf0007
Link to comment

I have 4 SSDs in my setup. I switched all of them to the onboard intel SATA Controller. Everything looks fine for the moment. Nevertheless I find these kind of messages in the log.

 

xfs_inode block 0xe91a8fb0 xfs_inode_buf_verify
Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): Unmount and run xfs_repair
Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): First 128 bytes of corrupted metadata buffer:
Feb 10 21:28:31 UnRaid-Server kernel: 00000000: dd ec fd 51 64 2e 70 e1 bd 28 2c 85 d6 fb eb 4c ...Qd.p..(,....L
Feb 10 21:28:31 UnRaid-Server kernel: 00000010: 2e 67 21 fe 25 1e 69 f8 ed d0 e2 7c fa 6f 55 ce .g!.%.i....|.oU.
Feb 10 21:28:31 UnRaid-Server kernel: 00000020: af ea 17 e8 de eb 9b f1 a1 e6 36 91 25 58 2f 7b ..........6.%X/{
Feb 10 21:28:31 UnRaid-Server kernel: 00000030: 02 5e 02 e4 f8 82 17 3d 2d 3c 6d d6 c5 0e 0c 31 .^.....=-<m....1
Feb 10 21:28:31 UnRaid-Server kernel: 00000040: db 77 59 bb 85 75 f3 81 fe 75 bd 9c fb 2f b8 55 .wY..u...u.../.U
Feb 10 21:28:31 UnRaid-Server kernel: 00000050: b9 07 a0 e4 32 7c 77 aa b4 a8 25 24 68 19 9c 6d ....2|w...%$h..m
Feb 10 21:28:31 UnRaid-Server kernel: 00000060: 55 79 86 07 a2 49 ff fd 6c d0 87 57 d1 6b 79 61 Uy...I..l..W.kya
Feb 10 21:28:31 UnRaid-Server kernel: 00000070: 1b f3 23 a3 b0 0d 1f 4b e7 d6 8f 9a be b2 a8 bd ..#....K........
Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): Metadata corruption detected at xfs_buf_ioend+0x51/0x284 [xfs], xfs_inode block 0xe91a8fb0 xfs_inode_buf_verify
Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): Unmount and run xfs_repair
Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): First 128 bytes of corrupted metadata buffer:
Feb 10 21:28:31 UnRaid-Server kernel: 00000000: dd ec fd 51 64 2e 70 e1 bd 28 2c 85 d6 fb eb 4c ...Qd.p..(,....L
Feb 10 21:28:31 UnRaid-Server kernel: 00000010: 2e 67 21 fe 25 1e 69 f8 ed d0 e2 7c fa 6f 55 ce .g!.%.i....|.oU.
Feb 10 21:28:31 UnRaid-Server kernel: 00000020: af ea 17 e8 de eb 9b f1 a1 e6 36 91 25 58 2f 7b ..........6.%X/{
Feb 10 21:28:31 UnRaid-Server kernel: 00000030: 02 5e 02 e4 f8 82 17 3d 2d 3c 6d d6 c5 0e 0c 31 .^.....=-<m....1
Feb 10 21:28:31 UnRaid-Server kernel: 00000040: db 77 59 bb 85 75 f3 81 fe 75 bd 9c fb 2f b8 55 .wY..u...u.../.U
Feb 10 21:28:31 UnRaid-Server kernel: 00000050: b9 07 a0 e4 32 7c 77 aa b4 a8 25 24 68 19 9c 6d ....2|w...%$h..m
Feb 10 21:28:31 UnRaid-Server kernel: 00000060: 55 79 86 07 a2 49 ff fd 6c d0 87 57 d1 6b 79 61 Uy...I..l..W.kya
Feb 10 21:28:31 UnRaid-Server kernel: 00000070: 1b f3 23 a3 b0 0d 1f 4b e7 d6 8f 9a be b2 a8 bd ..#....K........
Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): Metadata corruption detected at xfs_buf_ioend+0x51/0x284 [xfs], xfs_inode block 0xe91a8fb0 xfs_inode_buf_verify
Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): Unmount and run xfs_repair
Feb 10 21:28:31 UnRaid-Server kernel: XFS (dm-6): First 128 bytes of corrupted metadata buffer:
Feb 10 21:28:31 UnRaid-Server kernel: 00000000: dd ec fd 51 64 2e 70 e1 bd 28 2c 85 d6 fb eb 4c ...Qd.p..(,....L
Feb 10 21:28:31 UnRaid-Server kernel: 00000010: 2e 67 21 fe 25 1e 69 f8 ed d0 e2 7c fa 6f 55 ce .g!.%.i....|.oU.
Feb 10 21:28:31 UnRaid-Server kernel: 00000020: af ea 17 e8 de eb 9b f1 a1 e6 36 91 25 58 2f 7b ..........6.%X/{
Feb 10 21:28:31 UnRaid-Server kernel: 00000030: 02 5e 02 e4 f8 82 17 3d 2d 3c 6d d6 c5 0e 0c 31 .^.....=-<m....1
Feb 10 21:28:31 UnRaid-Server kernel: 00000040: db 77 59 bb 85 75 f3 81 fe 75 bd 9c fb 2f b8 55 .wY..u...u.../.U
Feb 10 21:28:31 UnRaid-Server kernel: 00000050: b9 07 a0 e4 32 7c 77 aa b4 a8 25 24 68 19 9c 6d ....2|w...%$h..m
Feb 10 21:28:31 UnRaid-Server kernel: 00000060: 55 79 86 07 a2 49 ff fd 6c d0 87 57 d1 6b 79 61 Uy...I..l..W.kya
Feb 10 21:28:31 UnRaid-Server kernel: 00000070: 1b f3 23 a3 b0 0d 1f 4b e7 d6 8f 9a be b2 a8 bd ..#....K........

 

I can't see which disc has the problem. Should I stop the array and run a xfs_repair? 

 

And what is the way to revive disc 9? Thanks a lot!!!

 

Frank.

unraid-server-diagnostics-20220210-2136.zip

Link to comment

Hi, the rebuild on an other spare disc is now running. So far erverthing works fine. During the drive swap, I checked all cable connections, maybe there was a small error there. Additionally - to be on the safe side - I ordered one of the recommanded LSI SAS Contoller. However, I doubt that the old controller type has a fundamental problem, because it has been working absolutely flawlessly for years, with hard drives and SSDs. But anyway, if I can further reduce the risk of failure of my "beloved" UnRaid server with a controller change, then I'm happy to do so. 😉

 

Best Regards

Frank   

Link to comment

Hi All,

 

sorry I'm back. Something is still wrong. Everytime I try to copy all the data from my docker ssd I get error messages and the copy process stops. I changed the controller for the SSDs (now all on the internal intel controller). But still I get these strange error messages (11:30am). Why I want to copy all data from the Docker SSD? Because I have two single SSDs, one for all Dockers and one for all VMs. I want to create a Cache Pools instead (for redundency reasons) an run dockers an VMs from this Pool. Any Idea? Thanks!

 

Best regards

Frank

 

 

unraid-server-diagnostics-20220213-1145.zip

Link to comment

Cache device appears to be failing, sorry I forgot to mention that before, the reason I asked to swap controllers is that a failing device shouldn't drop and crash the controller, like it happens with the mvsas driver, you can run an extended SMART test to confirm, if it failing try to copy everything you can, either manually or using for example ddrescue.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...