December 13, 20178 yr I'm running Unraid 6.3.5 on a Supermicro 2U Server X8DTN+ 2x Xeon X5675 3.06ghz Hexcore with 48 Gigs of RAM. I have a Supermicro 933T which I am using as an expansion chassis which is connected from the backplane of the 933T to a SATA extender card. I am currently running two Marvel SATA cards in the Main Supermicro, but have two LSI cards on the way. I get daily random system crashes. The system will be unusable and when i view the console from IPMI I can see the login screen but it is also unresponsive. I get multiple sector errors in two 8TB drives which are installed in the expansion Chassis. I have removed these drives and done both SMART test and low-level block checks, with no errors. I have also changed the SATA cables. Any help would be appreciated. Again I'm hoping the LSI cards will rectify any issues. Thanks. sam-the-eagle-diagnostics-20171213-0930.zip FCPsyslog_tail.txt
December 13, 20178 yr Community Expert There are checksums errors on your cache device: Dec 13 09:14:28 Sam-The-Eagle kernel: BTRFS warning (device sdj1): csum failed ino 22600422 off 0 csum 2566472073 expected csum 3881293769 Dec 13 09:14:28 Sam-The-Eagle kernel: BTRFS warning (device sdj1): csum failed ino 22600422 off 0 csum 2566472073 expected csum 3881293769 Possibly a corrupt docker image, running a scrub will tell which files are affected.
December 14, 20178 yr Author I did a scrub, fixed the affected files. The system ran great for about 8 hours. Until it crashed again. sam-the-eagle-diagnostics-20171213-2138.zip sam-the-eagle-diagnostics-20171213-2047.zip FCPsyslog_tail.txt Edited December 14, 20178 yr by xcsuperfly
December 14, 20178 yr Community Expert These files are corrupt, since you have s single device cache scrub can't fix them, you'll need to replace or delete those files: Dec 13 11:53:37 Sam-The-Eagle emhttp: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog Dec 13 12:02:50 Sam-The-Eagle kernel: BTRFS warning (device sdj1): checksum error at logical 58057940992 on dev /dev/sdj1, sector 65176304, root 5, inode 22600422, offset 0, length 4096, links 1 (path: appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Cache/autotag_wordlist.json) Dec 13 12:02:50 Sam-The-Eagle kernel: BTRFS error (device sdj1): bdev /dev/sdj1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 Dec 13 12:02:50 Sam-The-Eagle kernel: BTRFS warning (device sdj1): checksum error at logical 58057945088 on dev /dev/sdj1, sector 65176312, root 5, inode 22600422, offset 4096, length 4096, links 1 (path: appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Cache/autotag_wordlist.json) Dec 13 12:02:50 Sam-The-Eagle kernel: BTRFS error (device sdj1): bdev /dev/sdj1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 Dec 13 12:02:50 Sam-The-Eagle kernel: BTRFS warning (device sdj1): checksum error at logical 58057949184 on dev /dev/sdj1, sector 65176320, root 5, inode 22600422, offset 8192, length 950, links 1 (path: appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Cache/autotag_wordlist.json) Dec 13 12:02:50 Sam-The-Eagle kernel: BTRFS error (device sdj1): bdev /dev/sdj1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 Dec 13 12:10:07 Sam-The-Eagle kernel: BTRFS warning (device sdj1): checksum error at logical 80486334464 on dev /dev/sdj1, sector 115273216, root 5, inode 22436957, offset 552960, length 4096, links 1 (path: appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Logs/Plex Media Server.2.log) Dec 13 12:10:07 Sam-The-Eagle kernel: BTRFS error (device sdj1): bdev /dev/sdj1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 Dec 13 12:10:07 Sam-The-Eagle kernel: BTRFS error (device sdj1): bdev /dev/sdj1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
December 14, 20178 yr Author Yesterday appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Cache/autotag_wordlist.json was corrupt and I deleted it. Could there be a reason these files continue to become corrupt? Sent from my Pixel XL using Tapatalk
December 14, 20178 yr Community Expert Not really, disk seems fine, you should be using ECC, so maybe try using a different disk for cache if you have one.
December 24, 20178 yr Author The server crashed again. Hasn't for the past few days. all seemed well. Attached the logs. sam-the-eagle-diagnostics-20171223-2049.zip FCPsyslog_tail.txt
December 24, 20178 yr Community Expert You likely have a very bad SATA cable/connection on disk12: 199 UDMA_CRC_Error_Count 0x000a 198 198 000 Old_age Always - 13417933 Most likely the reason for these: Dec 19 14:11:17 Sam-The-Eagle kernel: sd 8:0:1:0: [sdk] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Dec 19 14:11:17 Sam-The-Eagle kernel: sd 8:0:1:0: [sdk] tag#1 Sense Key : 0xb [current] Dec 19 14:11:17 Sam-The-Eagle kernel: sd 8:0:1:0: [sdk] tag#1 ASC=0x47 ASCQ=0x3 Dec 19 14:11:17 Sam-The-Eagle kernel: sd 8:0:1:0: [sdk] tag#1 CDB: opcode=0x88 88 00 00 00 00 00 05 c7 7f c0 00 00 04 00 00 00 Dec 19 14:11:17 Sam-The-Eagle kernel: blk_update_request: I/O error, dev sdk, sector 96960448 Dec 19 14:11:17 Sam-The-Eagle kernel: md: disk12 read error, sector=96960384 Dec 19 14:11:17 Sam-The-Eagle kernel: md: disk12 read error, sector=96960392 Dec 19 14:11:17 Sam-The-Eagle kernel: md: disk12 read error, sector=96960400 Dec 19 14:11:17 Sam-The-Eagle kernel: md: disk12 read error, sector=96960408 Dec 19 14:11:17 Sam-The-Eagle kernel: md: disk12 read error, sector=96960416 Dec 19 14:11:17 Sam-The-Eagle kernel: md: disk12 read error, sector=96960424 Dec 19 14:11:17 Sam-The-Eagle kernel: md: disk12 read error, sector=96960432 Dec 19 14:11:17 Sam-The-Eagle kernel: md: disk12 read error, sector=96960440 Dec 19 14:11:17 Sam-The-Eagle kernel: md: disk12 read error, sector=96960448 Dec 19 14:11:17 Sam-The-Eagle kernel: md: disk12 read error, sector=96960456 Dec 19 14:11:17 Sam-The-Eagle kernel: md: disk12 read error, sector=96960464
Archived
This topic is now archived and is closed to further replies.