Disk 1 wiped out

steve1977 · August 30, 2017

No idea what and why this happened. I have rebuild my array. After rebuilding, disk 1 suddenly showed millions of read/write and parity disk disabled. Both within second at the same time. Disk 1 was still functional, but all data was wiped out except one iso file. I don't dare yet to rebuild / restart the array again. Attaching the log file and hope someone can give guidance. Hope the disk 1 is not lost for good?

tower-diagnostics-20170830-1654.zip

JorgeB · August 30, 2017

Pre-reboot diagnostics would be much more useful, start the array and all data on disk1 should be there, you'll need to re-sync parity, maybe good idea to check/replace cables before starting.

steve1977 · August 30, 2017

Thanks. It indeed appears that the data is fine. Fingers crossed that this is indeed the case.

I keep on having the "cable issue" that you refer to. It only is an issue for the harddisks that are connected to my raid card. The cables connected directly to the HD controller are fine. I have this issue ever-since using Unraid and it leads to errors and disks getting disabled. I have changed cables several times and also replaced the Raid card. All of this without any sustainable success.

Any other idea what I can do to solve this cabling / controller issue?

JorgeB · August 30, 2017

It can be a variety of things, including cables, controller, power supply, etc.

steve1977 · August 30, 2017

I already changed all of the above (even got a new PSU). I am using the controller that is recommended on this forum (and even replaced it by a new one twice to avoid that this is caused by a faulty unit or firmware issue). Given the issue never happens with the disks connected to the HD, I'd assume cable and PSU were anyways unlikely to be the issue (as it would also impact the ones connected to the Mobom, which draw from same PSU and have "old" cables. So, it all points to a controller issue though, which I have changed twice already. So, somehow also not too likely. Any additional ideas? This issue is preventing me from using Unraid in a "safe" way as I am basically rebuilding parity every other week.

itimpi · August 30, 2017

It is worth pointing out that a cable issue can also relate to the power connectors - not just SATA ones. Also a power supply that is slightly underpowered can lead to drives dropping offline unexpectedly. Another scenario for drives dropping offline that can be hard to diagnose is if you have a controller card that is not quite seated perfectly in the motherboard (although this issue tends to take out multiple drives simultaneously).

JorgeB · August 30, 2017

8 minutes ago, steve1977 said:

Given the issue never happens with the disks connected to the HD

The diagnostics you posted also shows a timeout error on disk11, and that one is using the onboard controller, it's right after boot and the disk isn't even mounted yet, maybe the onboard controller can recover better from timeouts, but your issues are not limited to the RAID controller.

Aug 30 16:54:15 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug 30 16:54:15 Tower kernel: ata2.00: failed command: IDENTIFY DEVICE
Aug 30 16:54:15 Tower kernel: ata2.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Aug 30 16:54:15 Tower kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 30 16:54:15 Tower kernel: ata2.00: status: { DRDY }
Aug 30 16:54:15 Tower kernel: ata2: hard resetting link
Aug 30 16:54:15 Tower kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Edited August 30, 2017 by johnnie.black

steve1977 · August 30, 2017

My "new" PSU is a Corsair AX650, which I'd hoped would be sufficient for 16 disks (plus USB). Also, the PSU should equally impact the drives connected to the mobo directly?

I haven't changed the power cables, so this could indeed be an untapped issue. But feels unlikely, isn't it? Also, an additional indication. "Tapping" the cables lead to immediate errors. But as said, only happens with the disks connected to the raid card, so shouldn't be per se a cable issue? Plus I changed the cables twice...

The card not being well seated could be an issue as well, but also feels less likely and I shouldn't be able to trigger the issue by tapping the cables.

steve1977 · August 30, 2017

14 minutes ago, johnnie.black said:

The diagnostics you posted also shows a timeout error on disk11, and that one is using the onboard controller, it's right after boot and the disk isn't even mounted yet, maybe the onboard controller can recover better from timeouts, but your issues are not limited to the RAID controller.

That's what I anticipated as otherwise changing the controller should have solved it (in case controller is faulty). It must indeed be something like what you are guessing (better recovery from time-outs?). If that's the case, any idea what a solution could look like. I have the issue on 4 of the 8 disks connected to the raid card. The mobo disks have not been disabled once in several years of usage. The card-connected ones face this issue once every other week (sometimes less frequent if lucky).

JorgeB · August 30, 2017

Another possibility, there is a suspicion, and only a suspicion for now as this was discovered very recently and people are testing to confirm, that one model of WD 6TB Red (WD60EFRX-68L0BN1) has firmware issues that causes timeouts, the errors can even happen on different model disks when at least one of those disks is in use together, and you have one of them, so maybe worth trying if you have a spare disk you could use to replace that one.

steve1977 · August 30, 2017

I am not 100% certain, but most likely this is not the case. I used to have the issues before using Red drives. I historically only used WD Green and some "old" other disks. But only recently started using Red disks. To be honest, I even thought starting to use Red disks may reduce the issue...

steve1977 · August 31, 2017

If the reason is indeed that my controller cannot recover well from time-outs, how could I solve this? Switch to a better controller? Other options? I have the M1015, which has been suggested on this forum. The device is not faulty as this is the third (!) one I now use. Worth to switch to a different controller? Or other suggestion?

JorgeB · August 31, 2017

The best option would be getting rid of the timeouts, one once in a while it's ok, but not constantly.

miniwalks · August 31, 2017

Are the M1015 China specials or genuine pulls?

Sent from my iPhone using Tapatalk

steve1977 · August 31, 2017

Well, the M1015 are bought via ebay (Germany). No idea how real they are, but I read in this forum that the ebay route is what others have been doing.

They don't happen constantly, but already once in a month is enough to take my whole array down and requires a rebuild every month. Even getting it down to once per quarter would still be an issue.

Any additional ideas what I can do to fix it? Better controller? Change power cabling (I doubt). Different PSU? Or just no way to get it done with Unraid?

JorgeB · August 31, 2017

7 minutes ago, steve1977 said:

Or just no way to get it done with Unraid?

It's not an unRAID issue, timeouts are a hardware problem, now if you've been having issues with that system for a years after swapping a lot of parts I would start over with all different hardware.

steve1977 · August 31, 2017

Not per se Unraid, but the issues do not show up when just using a plain vanilla Windows installation. Unraid only surfaces the issue or you can say that Unraid is not compatible with timeouts (while plain Windows is).

I actually already redone everything else (new mobo, new PSU, etc.). Issue remains the same (though it feels that it is better now).

I was thinking to buy a mobo supporting 16 disks, but was scared away by the price. Maybe a different controller card (I only used M1015 with all of my systems so far)? Or any other tweak?

JorgeB · August 31, 2017

M1015, as long as it's genuine, is one of the recommended controllers, though you should update to latest firmware.

steve1977 · September 1, 2017

I remember from past experience that it was stable / more stable as long as I only connect 6 instead of 8 disks to the controller card. Does this give any indication what the issue may be? I may do this again now. Would you advice me to connect 3x2 or 4x1+1x1 to the controller (referring to number of disks per cable, one cable can connect up to 4 disks to the controller card). Thanks!

JorgeB · September 1, 2017

24 minutes ago, steve1977 said:

Does this give any indication what the issue may be?

Not really, controller should have no issues with 8 disks, or 30 disks, try the various options.

steve1977 · September 1, 2017

Or maybe 850W PSU is not sufficient? 16 disks, most of them 6TB.

JorgeB · September 1, 2017

If it's a quality PSU it's enough, I have 22 disks with 650W.

steve1977 · September 1, 2017

Typo, 650W and not 850W. Besides the 16 harddisks, also powering the mobo/i5, the controller card and a PCI-on-board-SSD card.

How do you judge the quality of the Corsair AX650. Not good enough?

JorgeB · September 1, 2017

As long as it's working correctly it's enough, I'm using the TX650.

miniwalks · September 1, 2017

If the PSU supports Corsair Link, you could pass through the USB to a VM to monitor the PSU stats

Have you checked that your M1015 is running Firmware 20.07?

Sent from my iPhone using Tapatalk

Disk 1 wiped out

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation