steve1977 Posted August 30, 2017 Share Posted August 30, 2017 No idea what and why this happened. I have rebuild my array. After rebuilding, disk 1 suddenly showed millions of read/write and parity disk disabled. Both within second at the same time. Disk 1 was still functional, but all data was wiped out except one iso file. I don't dare yet to rebuild / restart the array again. Attaching the log file and hope someone can give guidance. Hope the disk 1 is not lost for good? tower-diagnostics-20170830-1654.zip Quote Link to comment
JorgeB Posted August 30, 2017 Share Posted August 30, 2017 Pre-reboot diagnostics would be much more useful, start the array and all data on disk1 should be there, you'll need to re-sync parity, maybe good idea to check/replace cables before starting. Quote Link to comment
steve1977 Posted August 30, 2017 Author Share Posted August 30, 2017 Thanks. It indeed appears that the data is fine. Fingers crossed that this is indeed the case. I keep on having the "cable issue" that you refer to. It only is an issue for the harddisks that are connected to my raid card. The cables connected directly to the HD controller are fine. I have this issue ever-since using Unraid and it leads to errors and disks getting disabled. I have changed cables several times and also replaced the Raid card. All of this without any sustainable success. Any other idea what I can do to solve this cabling / controller issue? Quote Link to comment
JorgeB Posted August 30, 2017 Share Posted August 30, 2017 It can be a variety of things, including cables, controller, power supply, etc. Quote Link to comment
steve1977 Posted August 30, 2017 Author Share Posted August 30, 2017 I already changed all of the above (even got a new PSU). I am using the controller that is recommended on this forum (and even replaced it by a new one twice to avoid that this is caused by a faulty unit or firmware issue). Given the issue never happens with the disks connected to the HD, I'd assume cable and PSU were anyways unlikely to be the issue (as it would also impact the ones connected to the Mobom, which draw from same PSU and have "old" cables. So, it all points to a controller issue though, which I have changed twice already. So, somehow also not too likely. Any additional ideas? This issue is preventing me from using Unraid in a "safe" way as I am basically rebuilding parity every other week. Quote Link to comment
itimpi Posted August 30, 2017 Share Posted August 30, 2017 It is worth pointing out that a cable issue can also relate to the power connectors - not just SATA ones. Also a power supply that is slightly underpowered can lead to drives dropping offline unexpectedly. Another scenario for drives dropping offline that can be hard to diagnose is if you have a controller card that is not quite seated perfectly in the motherboard (although this issue tends to take out multiple drives simultaneously). Quote Link to comment
JorgeB Posted August 30, 2017 Share Posted August 30, 2017 (edited) 8 minutes ago, steve1977 said: Given the issue never happens with the disks connected to the HD The diagnostics you posted also shows a timeout error on disk11, and that one is using the onboard controller, it's right after boot and the disk isn't even mounted yet, maybe the onboard controller can recover better from timeouts, but your issues are not limited to the RAID controller. Aug 30 16:54:15 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Aug 30 16:54:15 Tower kernel: ata2.00: failed command: IDENTIFY DEVICE Aug 30 16:54:15 Tower kernel: ata2.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in Aug 30 16:54:15 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 30 16:54:15 Tower kernel: ata2.00: status: { DRDY } Aug 30 16:54:15 Tower kernel: ata2: hard resetting link Aug 30 16:54:15 Tower kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Edited August 30, 2017 by johnnie.black Quote Link to comment
steve1977 Posted August 30, 2017 Author Share Posted August 30, 2017 My "new" PSU is a Corsair AX650, which I'd hoped would be sufficient for 16 disks (plus USB). Also, the PSU should equally impact the drives connected to the mobo directly? I haven't changed the power cables, so this could indeed be an untapped issue. But feels unlikely, isn't it? Also, an additional indication. "Tapping" the cables lead to immediate errors. But as said, only happens with the disks connected to the raid card, so shouldn't be per se a cable issue? Plus I changed the cables twice... The card not being well seated could be an issue as well, but also feels less likely and I shouldn't be able to trigger the issue by tapping the cables. Quote Link to comment
steve1977 Posted August 30, 2017 Author Share Posted August 30, 2017 14 minutes ago, johnnie.black said: The diagnostics you posted also shows a timeout error on disk11, and that one is using the onboard controller, it's right after boot and the disk isn't even mounted yet, maybe the onboard controller can recover better from timeouts, but your issues are not limited to the RAID controller. That's what I anticipated as otherwise changing the controller should have solved it (in case controller is faulty). It must indeed be something like what you are guessing (better recovery from time-outs?). If that's the case, any idea what a solution could look like. I have the issue on 4 of the 8 disks connected to the raid card. The mobo disks have not been disabled once in several years of usage. The card-connected ones face this issue once every other week (sometimes less frequent if lucky). Quote Link to comment
JorgeB Posted August 30, 2017 Share Posted August 30, 2017 Another possibility, there is a suspicion, and only a suspicion for now as this was discovered very recently and people are testing to confirm, that one model of WD 6TB Red (WD60EFRX-68L0BN1) has firmware issues that causes timeouts, the errors can even happen on different model disks when at least one of those disks is in use together, and you have one of them, so maybe worth trying if you have a spare disk you could use to replace that one. Quote Link to comment
steve1977 Posted August 30, 2017 Author Share Posted August 30, 2017 I am not 100% certain, but most likely this is not the case. I used to have the issues before using Red drives. I historically only used WD Green and some "old" other disks. But only recently started using Red disks. To be honest, I even thought starting to use Red disks may reduce the issue... Quote Link to comment
steve1977 Posted August 31, 2017 Author Share Posted August 31, 2017 If the reason is indeed that my controller cannot recover well from time-outs, how could I solve this? Switch to a better controller? Other options? I have the M1015, which has been suggested on this forum. The device is not faulty as this is the third (!) one I now use. Worth to switch to a different controller? Or other suggestion? Quote Link to comment
JorgeB Posted August 31, 2017 Share Posted August 31, 2017 The best option would be getting rid of the timeouts, one once in a while it's ok, but not constantly. Quote Link to comment
miniwalks Posted August 31, 2017 Share Posted August 31, 2017 Are the M1015 China specials or genuine pulls?Sent from my iPhone using Tapatalk Quote Link to comment
steve1977 Posted August 31, 2017 Author Share Posted August 31, 2017 Well, the M1015 are bought via ebay (Germany). No idea how real they are, but I read in this forum that the ebay route is what others have been doing. They don't happen constantly, but already once in a month is enough to take my whole array down and requires a rebuild every month. Even getting it down to once per quarter would still be an issue. Any additional ideas what I can do to fix it? Better controller? Change power cabling (I doubt). Different PSU? Or just no way to get it done with Unraid? Quote Link to comment
JorgeB Posted August 31, 2017 Share Posted August 31, 2017 7 minutes ago, steve1977 said: Or just no way to get it done with Unraid? It's not an unRAID issue, timeouts are a hardware problem, now if you've been having issues with that system for a years after swapping a lot of parts I would start over with all different hardware. Quote Link to comment
steve1977 Posted August 31, 2017 Author Share Posted August 31, 2017 Not per se Unraid, but the issues do not show up when just using a plain vanilla Windows installation. Unraid only surfaces the issue or you can say that Unraid is not compatible with timeouts (while plain Windows is). I actually already redone everything else (new mobo, new PSU, etc.). Issue remains the same (though it feels that it is better now). I was thinking to buy a mobo supporting 16 disks, but was scared away by the price. Maybe a different controller card (I only used M1015 with all of my systems so far)? Or any other tweak? Quote Link to comment
JorgeB Posted August 31, 2017 Share Posted August 31, 2017 M1015, as long as it's genuine, is one of the recommended controllers, though you should update to latest firmware. Quote Link to comment
steve1977 Posted September 1, 2017 Author Share Posted September 1, 2017 I remember from past experience that it was stable / more stable as long as I only connect 6 instead of 8 disks to the controller card. Does this give any indication what the issue may be? I may do this again now. Would you advice me to connect 3x2 or 4x1+1x1 to the controller (referring to number of disks per cable, one cable can connect up to 4 disks to the controller card). Thanks! Quote Link to comment
JorgeB Posted September 1, 2017 Share Posted September 1, 2017 24 minutes ago, steve1977 said: Does this give any indication what the issue may be? Not really, controller should have no issues with 8 disks, or 30 disks, try the various options. Quote Link to comment
steve1977 Posted September 1, 2017 Author Share Posted September 1, 2017 Or maybe 850W PSU is not sufficient? 16 disks, most of them 6TB. Quote Link to comment
JorgeB Posted September 1, 2017 Share Posted September 1, 2017 If it's a quality PSU it's enough, I have 22 disks with 650W. Quote Link to comment
steve1977 Posted September 1, 2017 Author Share Posted September 1, 2017 Typo, 650W and not 850W. Besides the 16 harddisks, also powering the mobo/i5, the controller card and a PCI-on-board-SSD card. How do you judge the quality of the Corsair AX650. Not good enough? Quote Link to comment
JorgeB Posted September 1, 2017 Share Posted September 1, 2017 As long as it's working correctly it's enough, I'm using the TX650. Quote Link to comment
miniwalks Posted September 1, 2017 Share Posted September 1, 2017 If the PSU supports Corsair Link, you could pass through the USB to a VM to monitor the PSU statsHave you checked that your M1015 is running Firmware 20.07?Sent from my iPhone using Tapatalk Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.