Wgreen92 Posted April 10, 2021 Posted April 10, 2021 Hello all, I have been pulling my hair out a bit trying to figure out what's going on here, total noob, about 2 months in any help is appreciated. Main problem: Server is randomly shutting down. Hardware is still spun up, powered, fans start going nuts, no response from web gui on local network and plex shuts off. When I had a graphics card in there screen would go black and unresponsive as well. Have to hard shutdown and restart via power button long press. Log included, woke up with it "shut down and powered up" turned it off and pulled log, so last reports are leading up to crash. Sub problem: Cache (samsung 870 evo 500gb) reporting UDMA CRC error (up to 1240) I have swapped sata ports and cables. Unfortunately did not Preclear this disk, as I put it in before learning this was best practices. Put it in right out of the box. I don't think its causing main problem. Unraid 6.9.2 (also happened on 6.9.1) OS Plus Specs: Spoiler Mobo: ASUSTeK COMPUTER INC. CROSSHAIR V FORMULA-Z CPU: AMD FX™-9590 Eight-Core @ 4700 MHz Ram: 4*8GB = 32gb DDR3 Parity: Iron wolf Pro 7200 4tb Datas: WD Blue 5200 4tb WD Blue 5200 4tb WD Blue 5200 1TB Iron wolf 5900 4tb Cache: Samsung 870 evo 500GB Flash: Scandisk 3.2 Gen1 32gb ultra Flash attached va USB 3.0 20 Pin Motherboard Header Extension Connector Most glairing errors Spoiler Apr 10 06:39:50 Mainframe kernel: mce: [Hardware Error]: Machine check events logged Apr 10 06:39:50 Mainframe kernel: [Hardware Error]: Corrected error, no action required. Apr 10 06:39:50 Mainframe kernel: [Hardware Error]: CPU:0 (15:2:0) MC2_STATUS[Over|CE|MiscV|AddrV|-|CECC|-|-]: 0xdc25404000040136 Apr 10 06:39:50 Mainframe kernel: [Hardware Error]: Error Addr: 0x0000000401878cb8 Apr 10 06:39:50 Mainframe kernel: [Hardware Error]: MC2 Error: Fill ECC error on data fills. Apr 10 06:39:50 Mainframe kernel: [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD Apr 10 06:39:50 Mainframe kernel: mce: [Hardware Error]: Machine check events logged Apr 10 06:39:50 Mainframe kernel: [Hardware Error]: Corrected error, no action required. Apr 10 06:39:50 Mainframe kernel: [Hardware Error]: CPU:1 (15:2:0) MC2_STATUS[Over|CE|MiscV|AddrV|-|CECC|-|-]: 0xdc25409000040136 Apr 10 06:39:50 Mainframe kernel: [Hardware Error]: Error Addr: 0x000000059aebf238 Apr 10 06:39:50 Mainframe kernel: [Hardware Error]: MC2 Error: Fill ECC error on data fills. Apr 10 06:39:50 Mainframe kernel: [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD From searching it seems people claim this is CPU failure, either overheating, under voltage. or just straight failure. other glairing error Spoiler Apr 10 04:04:21 Mainframe dhcpcd[1678]: br0: failed to renew DHCP, rebinding Apr 10 04:35:23 Mainframe kernel: ata1.00: exception Emask 0x10 SAct 0xcc000 SErr 0x0 action 0x6 frozen Apr 10 04:35:23 Mainframe kernel: ata1.00: irq_stat 0x08000000, interface fatal error Apr 10 04:35:23 Mainframe kernel: ata1.00: failed command: WRITE FPDMA QUEUED Apr 10 04:35:23 Mainframe kernel: ata1.00: cmd 61/90:70:20:99:1e/07:00:04:00:00/40 tag 14 ncq dma 991232 out Apr 10 04:35:23 Mainframe kernel: res 40/00:70:20:99:1e/00:00:04:00:00/40 Emask 0x10 (ATA bus error) Apr 10 04:35:23 Mainframe kernel: ata1.00: status: { DRDY } Apr 10 04:35:23 Mainframe kernel: ata1.00: failed command: WRITE FPDMA QUEUED Apr 10 04:35:23 Mainframe kernel: ata1.00: cmd 61/c8:78:b0:a0:1e/03:00:04:00:00/40 tag 15 ncq dma 495616 out Apr 10 04:35:23 Mainframe kernel: res 40/00:70:20:99:1e/00:00:04:00:00/40 Emask 0x10 (ATA bus error) Apr 10 04:35:23 Mainframe kernel: ata1.00: status: { DRDY } Apr 10 04:35:23 Mainframe kernel: ata1.00: failed command: WRITE FPDMA QUEUED Apr 10 04:35:23 Mainframe kernel: ata1.00: cmd 61/68:90:78:a4:1e/00:00:04:00:00/40 tag 18 ncq dma 53248 out Apr 10 04:35:23 Mainframe kernel: res 40/00:70:20:99:1e/00:00:04:00:00/40 Emask 0x10 (ATA bus error) Apr 10 04:35:23 Mainframe kernel: ata1.00: status: { DRDY } Apr 10 04:35:23 Mainframe kernel: ata1.00: failed command: WRITE FPDMA QUEUED Apr 10 04:35:23 Mainframe kernel: ata1.00: cmd 61/f0:98:e8:bb:f8/00:00:02:00:00/40 tag 19 ncq dma 122880 out Apr 10 04:35:23 Mainframe kernel: res 40/00:70:20:99:1e/00:00:04:00:00/40 Emask 0x10 (ATA bus error) Apr 10 04:35:23 Mainframe kernel: ata1.00: status: { DRDY } Apr 10 04:35:23 Mainframe kernel: ata1: hard resetting link Apr 10 04:35:23 Mainframe kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Apr 10 04:35:23 Mainframe kernel: ata1.00: supports DRM functions and may not be fully accessible Apr 10 04:35:23 Mainframe kernel: ata1.00: supports DRM functions and may not be fully accessible Apr 10 04:35:23 Mainframe kernel: ata1.00: configured for UDMA/133 Apr 10 04:35:23 Mainframe kernel: ata1: EH complete Apr 10 04:35:23 Mainframe kernel: ata1.00: Enabling discard_zeroes_data Sub problem? syslog Quote
JorgeB Posted April 11, 2021 Posted April 11, 2021 CRC errors are a known issue with some Samsung SSDs and those AMD chipsets, other issue looks like a hardware problem, like a bad PSU, CPU, board, etc. 1 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.