Matthew_K Posted July 23, 2020 Posted July 23, 2020 When I built my setup I set the cache to be BTRFS, stopping and staring the array, the drive mounted but then lost the FS after a few hours, Checking the system I discovered that the drive became unassailable and I end up formatting it as XFS. Looking at the Smart data I see that I had 1 Reallocated sector count which was mapped out. This seemed to work and I setup a bit torrent client and tested out downloading ~500GB of torrent files to the Cache drive via a docker container. Everything seemed to be working fine, but then when i tried to install another container, docker became unavailable. I looked into the recommendations and stopped the docker service and deleting the /mnt/cache/system/docker/docker.img and then starting and re-downloading the image. However non of the containers would start. Looking into the logs I keep seeing. Jul 23 15:58:41 Tower kernel: BTRFS warning (device loop2): csum failed root 294 ino 43871 off 16384 csum 0x473ac0bb expected csum 0xfbab9ca3 mirror 1 The only thing I can think of is that the docker.img is running the BTRFS internalized and something about the drive it doesn't like. To be safe I am removing the drive and will preform a full check on it. If BTRFS has issues with Uncorrectable error count and Reallocated sector count occurring is it really ready for prime time. at least with XFS unraid seems to handle bad sectors occurring more gracefully in that the data is still accessible. In the mean time is there anything else I should be doing? Quote
trurl Posted July 23, 2020 Posted July 23, 2020 docker image is a vdisk formatted as btrfs. Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread. Quote
Matthew_K Posted July 23, 2020 Author Posted July 23, 2020 (edited) Attached, Something else I noticed is that it keeps defaulting to SATA 1.0 (1.5gbps) and yes I am aware that I have a bunch of failing WD 6.0 Red drives, that Why I am in the process of moving stuff off them. tower-diagnostics-20200723-1743.zip Edited July 23, 2020 by Matthew_K Quote
Matthew_K Posted July 24, 2020 Author Posted July 24, 2020 Ok got my setup back up a running. I tried moving the data off the drive using the mover, but it did do anything. so I resorted to rsync to move the one docker config file i cared about. and now I a running preclear to see if it will toss any more errors. I also swapped sata cables, but that didnt seem to do anything after the reboot. Jul 23 20:50:02 Tower kernel: ata1: SATA max UDMA/133 abar m2048@0xdfe39000 port 0xdfe39100 irq 39 Jul 23 20:50:02 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jul 23 20:50:02 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible Jul 23 20:50:02 Tower kernel: ata1.00: disabling queued TRIM support Jul 23 20:50:02 Tower kernel: ata1.00: ATA-9: Samsung SSD 850 EVO 1TB, S21CNWAFC03914M, EMT02B6Q, max UDMA/133 Jul 23 20:50:02 Tower kernel: ata1.00: 1953525168 sectors, multi 1: LBA48 NCQ (depth 32), AA Jul 23 20:50:02 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible Jul 23 20:50:02 Tower kernel: ata1.00: disabling queued TRIM support Jul 23 20:50:02 Tower kernel: ata1.00: configured for UDMA/133 Jul 23 20:50:02 Tower kernel: sd 1:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Jul 23 20:50:02 Tower kernel: sd 1:0:0:0: [sdb] Write Protect is off Jul 23 20:50:02 Tower kernel: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 Jul 23 20:50:02 Tower kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jul 23 20:50:02 Tower kernel: sdb: sdb1 Jul 23 20:50:02 Tower kernel: sd 1:0:0:0: [sdb] Attached SCSI disk Jul 23 20:50:04 Tower kernel: ata1.00: exception Emask 0x50 SAct 0x1000 SErr 0x4090800 action 0xe frozen Jul 23 20:50:04 Tower kernel: ata1.00: irq_stat 0x00400040, connection status changed Jul 23 20:50:04 Tower kernel: ata1: SError: { HostInt PHYRdyChg 10B8B DevExch } Jul 23 20:50:04 Tower kernel: ata1.00: failed command: READ FPDMA QUEUED Jul 23 20:50:04 Tower kernel: ata1.00: cmd 60/08:60:00:6d:70/00:00:74:00:00/40 tag 12 ncq dma 4096 in Jul 23 20:50:04 Tower kernel: ata1.00: status: { DRDY } Jul 23 20:50:04 Tower kernel: ata1: hard resetting link Jul 23 20:50:08 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jul 23 20:50:08 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible Jul 23 20:50:08 Tower kernel: ata1.00: disabling queued TRIM support Jul 23 20:50:08 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible Jul 23 20:50:08 Tower kernel: ata1.00: disabling queued TRIM support Jul 23 20:50:08 Tower kernel: ata1.00: configured for UDMA/133 Jul 23 20:50:08 Tower kernel: sd 1:0:0:0: [sdb] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Jul 23 20:50:08 Tower kernel: sd 1:0:0:0: [sdb] tag#12 Sense Key : 0x5 [current] Jul 23 20:50:08 Tower kernel: sd 1:0:0:0: [sdb] tag#12 ASC=0x21 ASCQ=0x4 Jul 23 20:50:08 Tower kernel: sd 1:0:0:0: [sdb] tag#12 CDB: opcode=0x28 28 00 74 70 6d 00 00 00 08 00 Jul 23 20:50:08 Tower kernel: print_req_error: I/O error, dev sdb, sector 1953524992 Jul 23 20:50:08 Tower kernel: ata1: EH complete Jul 23 20:50:17 Tower kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jul 23 20:50:17 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible Jul 23 20:50:17 Tower kernel: ata1.00: disabling queued TRIM support Jul 23 20:50:17 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible Jul 23 20:50:17 Tower kernel: ata1.00: disabling queued TRIM support Jul 23 20:50:17 Tower kernel: ata1.00: configured for UDMA/133 Jul 23 20:50:26 Tower emhttpd: Samsung_SSD_850_EVO_1TB_S21CNWAFC03914M (sdb) 512 1953525168 Jul 23 20:50:26 Tower emhttpd: import 30 cache device: (sdb) Samsung_SSD_850_EVO_1TB_S21CNWAFC03914M Jul 23 20:53:58 Tower emhttpd: Samsung_SSD_850_EVO_1TB_S21CNWAFC03914M (sdb) 512 1953525168 Jul 23 20:56:05 Tower emhttpd: Samsung_SSD_850_EVO_1TB_S21CNWAFC03914M (sdb) 512 1953525168 Jul 23 20:56:05 Tower emhttpd: import 30 cache device: (sdb) Samsung_SSD_850_EVO_1TB_S21CNWAFC03914M And down and down it goes till it bottoms out. Jul 23 21:27:17 Tower kernel: ata1: link is slow to respond, please be patient (ready=0) Jul 23 21:27:20 Tower kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jul 23 21:27:20 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible Jul 23 21:27:20 Tower kernel: ata1.00: disabling queued TRIM support Jul 23 21:27:20 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible Jul 23 21:27:20 Tower kernel: ata1.00: disabling queued TRIM support Jul 23 21:27:20 Tower kernel: ata1.00: configured for UDMA/33 Jul 23 21:27:28 Tower emhttpd: Samsung_SSD_850_EVO_1TB_S21CNWAFC03914M (sdb) 512 1953525168 Jul 23 21:28:25 Tower preclear_disk_S21CNWAFC03914M[2937]: Command: /usr/local/emhttp/plugins/preclear.disk/script/preclear_disk.sh --cycles 1 --no-prompt /dev/sdb Jul 23 21:28:25 Tower preclear_disk_S21CNWAFC03914M[2937]: Disk /dev/sdb is a SSD, disabling head stress test. Jul 23 21:28:27 Tower preclear_disk_S21CNWAFC03914M[2937]: Pre-Read: dd if=/dev/sdb of=/dev/null bs=2097152 skip=0 count=1000204886016 conv=notrunc,noerror iflag=nocache,count_bytes,skip_bytes Jul 23 21:38:58 Tower preclear.disk: Pausing preclear of disk 'sdb' Jul 23 21:38:58 Tower preclear.disk: Resuming preclear of disk 'sdb' Jul 23 21:39:15 Tower preclear.disk: Pausing preclear of disk 'sdb' Jul 23 21:39:22 Tower preclear.disk: Resuming preclear of disk 'sdb' Jul 23 21:39:25 Tower kernel: ata1: link is slow to respond, please be patient (ready=0) Jul 23 21:39:28 Tower kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jul 23 21:39:28 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible Jul 23 21:39:28 Tower kernel: ata1.00: disabling queued TRIM support Jul 23 21:39:28 Tower kernel: ata1.00: supports DRM functions and may not be fully accessible Jul 23 21:39:28 Tower kernel: ata1.00: disabling queued TRIM support Jul 23 21:39:28 Tower kernel: ata1.00: configured for UDMA/33 Quote
JorgeB Posted July 24, 2020 Posted July 24, 2020 Replace cables on the SSD, Samsung SSDs can be very picky with cable quality. Quote
Matthew_K Posted July 26, 2020 Author Posted July 26, 2020 I moved the drive to a different system and all the issues save for the one sector issue resolved (which the drive successfully remapped). I tried swapping out the cable first and it didn't do anything, which makes me wonder if its the sata port on the board. I will try some more swapping soon as I have the ability to. In the middle of playing musical drives moving all my data to the server. Thank for the reply. Quote
Matthew_K Posted August 8, 2020 Author Posted August 8, 2020 New RAID Card, New cables and the Drive mounts and then after a few minutes drops from the server with this error. Aug 8 18:49:33 HomeMedia kernel: mdcmd (60): check resume Aug 8 18:49:54 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred Aug 8 18:50:24 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): Metadata corruption detected at xfs_buf_ioend+0x4c/0x95 [xfs], xfs_inode block 0x80 xfs_inode_buf_verify Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): Unmount and run xfs_repair Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): First 128 bytes of corrupted metadata buffer: Aug 8 18:50:24 HomeMedia kernel: 000000004dc2f79f: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Aug 8 18:50:24 HomeMedia kernel: 0000000066d17bd8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Aug 8 18:50:24 HomeMedia kernel: 00000000420fd23e: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Aug 8 18:50:24 HomeMedia kernel: 000000007f612d6c: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Aug 8 18:50:24 HomeMedia kernel: 00000000ba2cb676: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Aug 8 18:50:24 HomeMedia kernel: 000000006ff199fa: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Aug 8 18:50:24 HomeMedia kernel: 00000000c2ac7ef7: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Aug 8 18:50:24 HomeMedia kernel: 0000000063a41862: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x80 len 32 error 117 Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117. Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): xfs_do_force_shutdown(0x8) called from line 3399 of file fs/xfs/xfs_inode.c. Return address = 00000000ecc14961 Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): Corruption of in-memory data detected. Shutting down filesystem Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): Please umount the filesystem and rectify the problem(s) Quote
Matthew_K Posted August 9, 2020 Author Posted August 9, 2020 Here is the log. to make it fail faster I scp'ed into the box and then created a folder and then tried to create a file. Getting a consistent Aug 8 21:50:08 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred Aug 8 21:50:27 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred Aug 8 21:50:38 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred Aug 8 21:51:08 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred Aug 8 21:51:39 HomeMedia kernel: sd 3:0:2:0: Power-on or device reset occurred and when I tired to write to the drive Aug 8 21:49:45 HomeMedia kernel: print_req_error: I/O error, dev sdd, sector 488381424 Aug 8 21:49:45 HomeMedia kernel: XFS (sdd1): writeback error on sector 488381432 Now I do know that the drive had one remapped sector, but the rest of the drive is healthy, I have run extensive drive testing using Victoria hdd, and nothing else is popping up, on windows at least. The drive is 5yrs old but still has 70-80% of its write span left. After formatting it again, i have not be able to get it to though the Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): Metadata corruption detected at xfs_buf_ioend+0x4c/0x95 [xfs], xfs_inode block 0x80 xfs_inode_buf_verify Aug 8 18:50:24 HomeMedia kernel: XFS (sdd1): Unmount and run xfs_repair Still this is highly sketchy and I should just replace the drive and be done with it. Thanks for looking this over. homemedia-diagnostics-20200808-2201.zip Quote
Matthew_K Posted August 9, 2020 Author Posted August 9, 2020 Here is an update, I had the XFS on the cache drive fail and unmount again, but UnRaid UI still tells me that the drive is healthy. homemedia-diagnostics-20200809-1307.zip Quote
JorgeB Posted August 10, 2020 Posted August 10, 2020 It still looks like a cable problem, but try another device if available. You should also update the LSI firmware. Quote
Matthew_K Posted August 10, 2020 Author Posted August 10, 2020 (edited) I checked Brodcom site and it looks like they have removed an references to lsi 9207-8i other then the last manual publish in 2014. The only firmware I can find is for the Avago 9207-8i https://www.thomas-krenn.com/en/download/frame.only_content/hide_filter.1/hide_filter_serial.1/product.9983.html mpt2sas_cm0: LSISAS2308: FWVersion(14.00.00.00), ChipRevision(0x05), BiosVersion(07.27.00.00) Edited August 10, 2020 by Matthew_K Quote
JorgeB Posted August 11, 2020 Posted August 11, 2020 6 hours ago, Matthew_K said: I checked Brodcom site and it looks like they have removed an references to lsi 9207-8i It's under legacy HBAs on the download site, current firmware is 20.00.07.00 Quote
Matthew_K Posted August 11, 2020 Author Posted August 11, 2020 Updated the firmware and replaced the cables, same issue as soon as I try to transfer data to the drive. Keeps thoughing sector relocation counts but when I go to the sector it says its fine. IIRC when I first got this drive I had an issue where the drive failed to boot is windows and i sent it in Samsung sent it back saying there was no issues. It was at that time I replaced it with an NVME drive. So yeah I am going to call this DEAD, even if the SMART status says its OK. homemedia-diagnostics-20200811-0951.zip Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.