I think UnRAID hates my Cache Drive


Recommended Posts

When I built my setup I set the cache to be BTRFS, stopping and staring the array, the drive mounted but then lost the FS after a few hours, Checking the system I discovered that the drive became unassailable  and I end up formatting it as XFS.  Looking at the Smart data I see that I had 1 Reallocated sector count which was mapped out.

 

This seemed to work and I setup a bit torrent client and tested out downloading ~500GB of torrent files to the Cache drive via a docker container. Everything seemed to be working fine, but then when i tried to install another container, docker became unavailable. I looked into the recommendations and stopped the docker service and deleting the /mnt/cache/system/docker/docker.img and then starting and re-downloading the image. However non of the containers would start. Looking into the logs I keep seeing.

 

Jul 23 15:58:41 Tower kernel: BTRFS warning (device loop2): csum failed root 294 ino 43871 off 16384 csum 0x473ac0bb expected csum 0xfbab9ca3 mirror 1
 

The only thing I can think of is that the docker.img is running the BTRFS internalized and something about the drive it doesn't like. To be safe I am removing the drive and will preform a full check on it.

 

If BTRFS has issues with Uncorrectable error count and Reallocated sector count occurring is it really ready for prime time. at least with XFS unraid seems to handle bad sectors occurring more gracefully in that the data is still accessible. In the mean time is there anything else I should be doing?

Link to comment

Ok got my setup back up a running. I tried moving the data off the drive using the mover, but it did do anything. so I resorted to rsync to move the one docker config file i cared about. and now I a running preclear to see if it will toss any more errors. I also swapped sata cables, but that didnt seem to do anything after the reboot.

 

Jul 23 20:50:02 Tower kernel: ata1: SATA max UDMA/133 abar m2048@0xdfe39000 port 0xdfe39100 irq 39
Jul 23 20:50:02 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 23 20:50:02 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible
Jul 23 20:50:02 Tower kernel: ata1.00: disabling queued TRIM support
Jul 23 20:50:02 Tower kernel: ata1.00: ATA-9: Samsung SSD 850 EVO 1TB, S21CNWAFC03914M, EMT02B6Q, max UDMA/133
Jul 23 20:50:02 Tower kernel: ata1.00: 1953525168 sectors, multi 1: LBA48 NCQ (depth 32), AA
Jul 23 20:50:02 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible
Jul 23 20:50:02 Tower kernel: ata1.00: disabling queued TRIM support
Jul 23 20:50:02 Tower kernel: ata1.00: configured for UDMA/133
Jul 23 20:50:02 Tower kernel: sd 1:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
Jul 23 20:50:02 Tower kernel: sd 1:0:0:0: [sdb] Write Protect is off
Jul 23 20:50:02 Tower kernel: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Jul 23 20:50:02 Tower kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul 23 20:50:02 Tower kernel: sdb: sdb1
Jul 23 20:50:02 Tower kernel: sd 1:0:0:0: [sdb] Attached SCSI disk
Jul 23 20:50:04 Tower kernel: ata1.00: exception Emask 0x50 SAct 0x1000 SErr 0x4090800 action 0xe frozen
Jul 23 20:50:04 Tower kernel: ata1.00: irq_stat 0x00400040, connection status changed
Jul 23 20:50:04 Tower kernel: ata1: SError: { HostInt PHYRdyChg 10B8B DevExch }
Jul 23 20:50:04 Tower kernel: ata1.00: failed command: READ FPDMA QUEUED
Jul 23 20:50:04 Tower kernel: ata1.00: cmd 60/08:60:00:6d:70/00:00:74:00:00/40 tag 12 ncq dma 4096 in
Jul 23 20:50:04 Tower kernel: ata1.00: status: { DRDY }
Jul 23 20:50:04 Tower kernel: ata1: hard resetting link
Jul 23 20:50:08 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 23 20:50:08 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible
Jul 23 20:50:08 Tower kernel: ata1.00: disabling queued TRIM support
Jul 23 20:50:08 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible
Jul 23 20:50:08 Tower kernel: ata1.00: disabling queued TRIM support
Jul 23 20:50:08 Tower kernel: ata1.00: configured for UDMA/133
Jul 23 20:50:08 Tower kernel: sd 1:0:0:0: [sdb] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Jul 23 20:50:08 Tower kernel: sd 1:0:0:0: [sdb] tag#12 Sense Key : 0x5 [current]
Jul 23 20:50:08 Tower kernel: sd 1:0:0:0: [sdb] tag#12 ASC=0x21 ASCQ=0x4
Jul 23 20:50:08 Tower kernel: sd 1:0:0:0: [sdb] tag#12 CDB: opcode=0x28 28 00 74 70 6d 00 00 00 08 00
Jul 23 20:50:08 Tower kernel: print_req_error: I/O error, dev sdb, sector 1953524992
Jul 23 20:50:08 Tower kernel: ata1: EH complete
Jul 23 20:50:17 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 23 20:50:17 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible
Jul 23 20:50:17 Tower kernel: ata1.00: disabling queued TRIM support
Jul 23 20:50:17 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible
Jul 23 20:50:17 Tower kernel: ata1.00: disabling queued TRIM support
Jul 23 20:50:17 Tower kernel: ata1.00: configured for UDMA/133
Jul 23 20:50:26 Tower emhttpd: Samsung_SSD_850_EVO_1TB_S21CNWAFC03914M (sdb) 512 1953525168
Jul 23 20:50:26 Tower emhttpd: import 30 cache device: (sdb) Samsung_SSD_850_EVO_1TB_S21CNWAFC03914M
Jul 23 20:53:58 Tower emhttpd: Samsung_SSD_850_EVO_1TB_S21CNWAFC03914M (sdb) 512 1953525168
Jul 23 20:56:05 Tower emhttpd: Samsung_SSD_850_EVO_1TB_S21CNWAFC03914M (sdb) 512 1953525168
Jul 23 20:56:05 Tower emhttpd: import 30 cache device: (sdb) Samsung_SSD_850_EVO_1TB_S21CNWAFC03914M

And down and down it goes till it bottoms out.

Jul 23 21:27:17 Tower kernel: ata1: link is slow to respond, please be patient (ready=0)
Jul 23 21:27:20 Tower kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jul 23 21:27:20 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible
Jul 23 21:27:20 Tower kernel: ata1.00: disabling queued TRIM support
Jul 23 21:27:20 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible
Jul 23 21:27:20 Tower kernel: ata1.00: disabling queued TRIM support
Jul 23 21:27:20 Tower kernel: ata1.00: configured for UDMA/33
Jul 23 21:27:28 Tower emhttpd: Samsung_SSD_850_EVO_1TB_S21CNWAFC03914M (sdb) 512 1953525168
Jul 23 21:28:25 Tower preclear_disk_S21CNWAFC03914M[2937]: Command: /usr/local/emhttp/plugins/preclear.disk/script/preclear_disk.sh --cycles 1 --no-prompt /dev/sdb
Jul 23 21:28:25 Tower preclear_disk_S21CNWAFC03914M[2937]: Disk /dev/sdb is a SSD, disabling head stress test.
Jul 23 21:28:27 Tower preclear_disk_S21CNWAFC03914M[2937]: Pre-Read: dd if=/dev/sdb of=/dev/null bs=2097152 skip=0 count=1000204886016 conv=notrunc,noerror iflag=nocache,count_bytes,skip_bytes
Jul 23 21:38:58 Tower preclear.disk: Pausing preclear of disk 'sdb'
Jul 23 21:38:58 Tower preclear.disk: Resuming preclear of disk 'sdb'
Jul 23 21:39:15 Tower preclear.disk: Pausing preclear of disk 'sdb'
Jul 23 21:39:22 Tower preclear.disk: Resuming preclear of disk 'sdb'
Jul 23 21:39:25 Tower kernel: ata1: link is slow to respond, please be patient (ready=0)
Jul 23 21:39:28 Tower kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jul 23 21:39:28 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible
Jul 23 21:39:28 Tower kernel: ata1.00: disabling queued TRIM support
Jul 23 21:39:28 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible
Jul 23 21:39:28 Tower kernel: ata1.00: disabling queued TRIM support
Jul 23 21:39:28 Tower kernel: ata1.00: configured for UDMA/33


 

Link to comment

I moved the drive to a different system and all the issues save for the one sector issue resolved (which the drive successfully remapped). I tried swapping out the cable first and it didn't do anything, which makes me wonder if its the sata port on the board. I will try some more swapping soon as I have the ability to. In the middle of playing musical drives moving all my data to the server.

 

Thank for the reply.

Link to comment
  • 2 weeks later...

New RAID Card, New cables and the Drive mounts and then after a few minutes drops from the server with this error.

 

Aug 8 18:49:33 HomeMedia kernel: mdcmd (60): check resume
Aug 8 18:49:54 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred
Aug 8 18:50:24 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred
Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): Metadata corruption detected at xfs_buf_ioend+0x4c/0x95 [xfs], xfs_inode block 0x80 xfs_inode_buf_verify
Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): Unmount and run xfs_repair
Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): First 128 bytes of corrupted metadata buffer:
Aug 8 18:50:24 HomeMedia kernel: 000000004dc2f79f: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Aug 8 18:50:24 HomeMedia kernel: 0000000066d17bd8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Aug 8 18:50:24 HomeMedia kernel: 00000000420fd23e: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Aug 8 18:50:24 HomeMedia kernel: 000000007f612d6c: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Aug 8 18:50:24 HomeMedia kernel: 00000000ba2cb676: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Aug 8 18:50:24 HomeMedia kernel: 000000006ff199fa: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Aug 8 18:50:24 HomeMedia kernel: 00000000c2ac7ef7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Aug 8 18:50:24 HomeMedia kernel: 0000000063a41862: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x80 len 32 error 117
Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): xfs_do_force_shutdown(0x8) called from line 3399 of file fs/xfs/xfs_inode.c. Return address = 00000000ecc14961
Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): Corruption of in-memory data detected. Shutting down filesystem
Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): Please umount the filesystem and rectify the problem(s)

 

Link to comment

Here is the log. to make it fail faster I scp'ed into the box and then created a folder and then tried to create a file.

 

Getting a consistent
Aug 8 21:50:08 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred
Aug 8 21:50:27 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred
Aug 8 21:50:38 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred
Aug 8 21:51:08 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred
Aug 8 21:51:39 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred

 

and when I tired to write to the drive

Aug 8 21:49:45 HomeMedia kernel: print_req_error: I/O error, dev sdd, sector 488381424
Aug 8 21:49:45 HomeMedia kernel: XFS (sdd1): writeback error on sector 488381432

 

Now I do know that the drive had one remapped sector, but the rest of the drive is healthy, I have run extensive drive testing using Victoria hdd, and nothing else is popping up, on windows at least. The drive is 5yrs old but still has 70-80% of its write span left.

 

After formatting it again, i have not be able to get it to though the

Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): Metadata corruption detected at xfs_buf_ioend+0x4c/0x95 [xfs], xfs_inode block 0x80 xfs_inode_buf_verify Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): Unmount and run xfs_repair

 

Still this is highly sketchy and I should just replace the drive and be done with it.

 

Thanks for looking this over.

homemedia-diagnostics-20200808-2201.zip

Link to comment

I checked Brodcom site and it looks like they have removed an references to lsi 9207-8i other then the last manual publish in 2014. The only firmware I can find is for the Avago 9207-8i https://www.thomas-krenn.com/en/download/frame.only_content/hide_filter.1/hide_filter_serial.1/product.9983.html

 

mpt2sas_cm0: LSISAS2308: FWVersion(14.00.00.00), ChipRevision(0x05), BiosVersion(07.27.00.00)
 

Edited by Matthew_K
Link to comment

Updated the firmware and replaced the cables, same issue as soon as I try to transfer data to the drive. Keeps thoughing sector relocation counts but when I go to the sector it says its fine. IIRC when I first got this drive I had an issue where the drive failed to boot is windows and i sent it in Samsung sent it back saying there was no issues. It was at that time I replaced it with an NVME drive. So yeah I am going to call this DEAD, even if the SMART status says its OK.

homemedia-diagnostics-20200811-0951.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.