dtempleton Posted November 27, 2022 Share Posted November 27, 2022 (edited) My server has been working great for the past 7-8 months, no problems at all, mainly as a media server and TimeMachine. One morning the server was not accessible and I had to do a hard reboot with the power button. The array reported a bad Parity2 drive, which I removed from the array and then did a parity rebuild that completed. I reformatted the failed Parity drive (XFS) and the Attributes seemed OK. It's my newest drive, an 8 GB WD drive about a year old, maybe a bit more. I rebooted at least once and now have a failed array drive which shows that the contents are emulated (the data seems to be there). Thinking the reformatted 8G drive was good, I substituted it for the failed disk 3 and rebooted, which initiated a rebuild that never finished. Now the syslog is full of disk0 read errors, the tail is below I have some new drives coming, but its unclear that the issue is actually drive fail. I tried the extended SMART test on the former P2 drive, but it stopped before completion. The short SMART test reports "no such device". The drive attributes "could not be read". Today I tried to get a diagnostics file but the script starts and never completes. I attach a diagnostics file from the time of the parity drive fail though and can try to get a current one if anyone has suggestions on how to get it. I'll format the new drives as soon as I can, but it will be days before they're ready. At this point its not clear how the array drive can be emulated, if it cant read from the Parity drive (!?) Thanks for any help Dennis Quote tail /var/log/syslog.1 Nov 26 13:06:51 Tower kernel: md: disk0 read error, sector=56325808 Nov 26 13:06:51 Tower kernel: md: disk0 read error, sector=56325816 Nov 26 13:06:51 Tower kernel: md: disk0 read error, sector=56325824 Nov 26 13:06:51 Tower kernel: md: disk0 read error, sector=56325832 Nov 26 13:06:51 Tower kernel: md: disk0 read error, sector=56325840 Nov 26 13:06:51 Tower kernel: md: disk0 read error, sector=56325848 Nov 26 13:06:51 Tower kernel: md: disk0 read error, sector=56325856 Nov 26 13:06:51 Tower kernel: md: disk0 read error, sector=56325864 Nov 26 13:06:51 Tower kernel: md: disk0 read error, sector=56325872 Here's the hardware profile:HW profile.xml.zip tower-smart-20221125-2237.zip tower-diagnostics-20221125-1150.zip Edited November 27, 2022 by dtempleton Replaced lengthy HW profile with zip file Quote Link to comment
Solution JorgeB Posted November 27, 2022 Solution Share Posted November 27, 2022 In the earlier diags there are issues with multiple devices before parity drops offline: Nov 25 07:36:55 Tower kernel: ata14.00: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen Nov 25 07:36:55 Tower kernel: ata14.00: irq_stat 0x48000001, interface fatal error Nov 25 07:36:55 Tower kernel: ata14.00: failed command: READ DMA EXT Nov 25 07:36:55 Tower kernel: ata14.00: cmd 25/00:40:48:0d:4a/00:05:f5:02:00/e0 tag 3 dma 688128 in Nov 25 07:36:55 Tower kernel: res 53/84:30:57:10:4a/00:02:f5:02:00/40 Emask 0x10 (ATA bus error) Nov 25 07:36:55 Tower kernel: ata14.00: status: { DRDY SENSE ERR } Nov 25 07:36:55 Tower kernel: ata14.00: error: { ICRC ABRT } Nov 25 07:36:55 Tower kernel: ata14: hard resetting link Nov 25 07:36:55 Tower kernel: ata11.00: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen Nov 25 07:36:55 Tower kernel: ata11.00: irq_stat 0x48000001, interface fatal error Nov 25 07:36:55 Tower kernel: ata11.00: failed command: WRITE DMA EXT Nov 25 07:36:55 Tower kernel: ata11.00: cmd 35/00:40:48:f8:49/00:05:f5:02:00/e0 tag 7 dma 688128 out Nov 25 07:36:55 Tower kernel: res 51/84:40:48:f8:49/00:05:f5:02:00/e0 Emask 0x10 (ATA bus error) Nov 25 07:36:55 Tower kernel: ata11.00: status: { DRDY ERR } Nov 25 07:36:55 Tower kernel: ata11.00: error: { ICRC ABRT } Nov 25 07:36:55 Tower kernel: ata11: hard resetting link Nov 25 07:36:55 Tower kernel: ata14: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Nov 25 07:36:55 Tower kernel: ata14.00: configured for UDMA/133 Nov 25 07:36:55 Tower kernel: ata14: EH complete Nov 25 07:36:55 Tower kernel: ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Nov 25 07:36:55 Tower kernel: ata11.00: configured for UDMA/133 Nov 25 07:36:55 Tower kernel: ata11: EH complete Nov 25 10:11:49 Tower kernel: ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Nov 25 10:11:49 Tower kernel: ata14.00: irq_stat 0x40000001 Nov 25 10:11:49 Tower kernel: ata14.00: failed command: READ DMA EXT Nov 25 10:11:49 Tower kernel: ata14.00: cmd 25/00:40:a8:16:ed/00:05:6d:03:00/e0 tag 24 dma 688128 in Nov 25 10:11:49 Tower kernel: res 53/84:c0:27:17:ed/00:04:6d:03:00/40 Emask 0x10 (ATA bus error) Nov 25 10:11:49 Tower kernel: ata14.00: status: { DRDY SENSE ERR } Nov 25 10:11:49 Tower kernel: ata14.00: error: { ICRC ABRT } Nov 25 10:11:49 Tower kernel: ata14: hard resetting link Nov 25 10:11:54 Tower kernel: ata14: link is slow to respond, please be patient (ready=0) Nov 25 10:11:59 Tower kernel: ata14: COMRESET failed (errno=-16) Nov 25 10:11:59 Tower kernel: ata14: hard resetting link Nov 25 10:12:02 Tower kernel: ata14: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Nov 25 10:12:02 Tower kernel: ata14.00: configured for UDMA/133 Nov 25 10:12:02 Tower kernel: ata14: EH complete Nov 25 10:12:03 Tower kernel: ata11.00: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen Nov 25 10:12:03 Tower kernel: ata11.00: irq_stat 0x80400000, PHY RDY changed Nov 25 10:12:03 Tower kernel: ata11: SError: { PHYRdyChg } Nov 25 10:12:03 Tower kernel: ata11.00: failed command: WRITE DMA EXT Nov 25 10:12:03 Tower kernel: ata11.00: cmd 35/00:40:28:71:ee/00:05:6d:03:00/e0 tag 3 dma 688128 out Nov 25 10:12:03 Tower kernel: res 50/00:00:28:71:ee/00:00:6d:03:00/e0 Emask 0x10 (ATA bus error) Nov 25 10:12:03 Tower kernel: ata11.00: status: { DRDY } Nov 25 10:12:03 Tower kernel: ata11: hard resetting link Nov 25 10:12:03 Tower kernel: ata11: SATA link down (SStatus 0 SControl 300) Nov 25 10:12:09 Tower kernel: ata11: hard resetting link Nov 25 10:12:09 Tower kernel: ata11: SATA link down (SStatus 0 SControl 300) Nov 25 10:12:14 Tower kernel: ata11: hard resetting link Nov 25 10:12:14 Tower kernel: ata19: SATA link down (SStatus 0 SControl 300) Nov 25 10:12:14 Tower kernel: ata20: SATA link down (SStatus 0 SControl 300) Nov 25 10:12:15 Tower kernel: ata11: SATA link down (SStatus 0 SControl 300) Nov 25 10:12:15 Tower kernel: ata11.00: disable device This is usually a power/connection problem, could also be a controller issue, save the current syslog cp /var/log/syslog /boot/syslog.txt then reboot and post new diags after array start. Quote Link to comment
dtempleton Posted November 27, 2022 Author Share Posted November 27, 2022 Thanks; reboot looks the same Here is the new syslog syslog.zip and diagnostics file tower-diagnostics-20221127-1441.zip I realize that all of the drive errors I'm seeing are controlled by one controller; this one: https://www.amazon.com/gp/product/B07SZDK6CZ/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1 Ziyituod PCIe SATA Card, 4 Port with 4 SATA Cable, SATA Controller Expansion Card with Low Profile Bracket, Marvell 9215 Non-Raid, Boot as System Disk its a Marvell 9215 device, about 7 months old; I thought that was a usable one. I'll go look at the list of usable controllers Quote Link to comment
dirkinthedark Posted November 27, 2022 Share Posted November 27, 2022 Yikes let me know, I just ordered the same card. Quote Link to comment
ChatNoir Posted November 27, 2022 Share Posted November 27, 2022 Marvell controller are not recommended in Unraid. Quote Link to comment
dtempleton Posted November 27, 2022 Author Share Posted November 27, 2022 (edited) 19 minutes ago, dirkinthedark said: Yikes let me know, I just ordered the same card. It looks like that Marvell 9215 controller is not on the approved list now, but the forum is full of questions. I just ordered a different one: https://www.amazon.com/gp/product/B08BHZQVP7/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 PCIe SATA Card, Electop SATA III 6 Gbps Expansion Controller, JMB585/SATA 3.0 Non-Raid ,Support 5 Ports with 5 SATA Cables, Standard & Low Profile Bracket for Desktop PC that has the JMicro controller that is recommended here: https://forums.unraid.net/topic/102010-recommended-controllers-for-unraid/ won't know for a few days if it fixes things ChatNoir: Thanks, our messages passed each other simultaneously. Edited November 27, 2022 by dtempleton acknowledge ChatNoir contribution Quote Link to comment
dirkinthedark Posted November 28, 2022 Share Posted November 28, 2022 I dont understand, the controller you listed is the asmedia1062 which should be good Quote Link to comment
ChatNoir Posted November 28, 2022 Share Posted November 28, 2022 1 hour ago, dirkinthedark said: I dont understand, the controller you listed is the asmedia1062 which should be good You are right, I said that based on the description from dtempleton that explicitly mentioned Marvell. Surprise, surprise, the Amazon seller kept the same link but changed the description !!! https://web.archive.org/web/20200831194618/https://www.amazon.com/Ziyituod-Controller-Expansion-Profile-Non-Raid/dp/B07SZDK6CZ Anyhow, ASM1062 is a two ports controller. If the card offer more, there is something sketchy. Either a port multiplier or it's another chip. And since the seller has shown he is clearly trustworthy ... 1 Quote Link to comment
JorgeB Posted November 28, 2022 Share Posted November 28, 2022 Syslog you've posted is the one from the diags. Quote Link to comment
dirkinthedark Posted November 28, 2022 Share Posted November 28, 2022 8 hours ago, ChatNoir said: You are right, I said that based on the description from dtempleton that explicitly mentioned Marvell. Surprise, surprise, the Amazon seller kept the same link but changed the description !!! https://web.archive.org/web/20200831194618/https://www.amazon.com/Ziyituod-Controller-Expansion-Profile-Non-Raid/dp/B07SZDK6CZ Anyhow, ASM1062 is a two ports controller. If the card offer more, there is something sketchy. Either a port multiplier or it's another chip. And since the seller has shown he is clearly trustworthy ... Ok perfect, so I will look for another one myself. Tricky, tricky hehe. Quote Link to comment
dtempleton Posted November 28, 2022 Author Share Posted November 28, 2022 14 hours ago, JorgeB said: Syslog you've posted is the one from the diags. sorry I posted the one after reboot. The earlier one had blown up with read error warnings, 128mb! I zipped it and add it herebelow I'll report after Wednesday when the new JMicro controller comes in. syslog.1.txt.zip Quote Link to comment
JorgeB Posted November 29, 2022 Share Posted November 29, 2022 Similar issue as before, on the same controller, note that unless the IDs changed it's the Asmedia controller dropping the drives. Quote Link to comment
dtempleton Posted December 11, 2022 Author Share Posted December 11, 2022 My server is working now, thanks for your input JorgeB and others. I bought this 5 port card: https://www.amazon.com/dp/B08BHZQVP7?psc=1&ref=ppx_yo2ov_dt_b_product_details and all seems to be ok except several of my drives report UMDA CRC errors (that seem to be permanent but unimportant). Regarding the previously purchased card from Ziyituod, it's a mess. It was listed at Amazon as originally listed as an ASMedia controller, then with the same part number a second version was listed on Amazon that was clearly a Marvell controller. When I pulled the controller the board actually showed neither of the Ziyituod model numbers and no identifiers at all. In future if I get a card that doesn't look like that advertised I'll send it right back. Thanks for helping put this back together. Dennis 2 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.