Disk 1 wiped out


Recommended Posts

No idea what and why this happened. I have rebuild my array. After rebuilding, disk 1 suddenly showed millions of read/write and parity disk disabled. Both within second at the same time. Disk 1 was still functional, but all data was wiped out except one iso file. I don't dare yet to rebuild / restart the array again. Attaching the log file and hope someone can give guidance. Hope the disk 1 is not lost for good?

tower-diagnostics-20170830-1654.zip

Link to comment
  • Replies 51
  • Created
  • Last Reply

Top Posters In This Topic

Thanks. It indeed appears that the data is fine. Fingers crossed that this is indeed the case.

 

I keep on having the "cable issue" that you refer to. It only is an issue for the harddisks that are connected to my raid card. The cables connected directly to the HD controller are fine. I have this issue ever-since using Unraid and it leads to errors and disks getting disabled. I have changed cables several times and also replaced the Raid card. All of this without any sustainable success.


Any other idea what I can do to solve this cabling / controller issue?

Link to comment

I already changed all of the above (even got a new PSU). I am using the controller that is recommended on this forum (and even replaced it by a new one twice to avoid that this is caused by a faulty unit or firmware issue). Given the issue never happens with the disks connected to the HD, I'd assume cable and PSU were anyways unlikely to be the issue (as it would also impact the ones connected to the Mobom, which draw from same PSU and have "old" cables. So, it all points to a controller issue though, which I have changed twice already. So, somehow also not too likely. Any additional ideas? This issue is preventing me from using Unraid in a "safe" way as I am basically rebuilding parity every other week.

Link to comment

It is worth pointing out that a cable issue can also relate to the power connectors - not just SATA ones.    Also a power supply that is slightly underpowered can lead to drives dropping offline unexpectedly.   Another scenario for drives dropping offline that can be hard to diagnose is if you have a controller card that is not quite seated perfectly in the motherboard (although this issue tends to take out multiple drives simultaneously).

Link to comment
8 minutes ago, steve1977 said:

Given the issue never happens with the disks connected to the HD

 

The diagnostics you posted also shows a timeout error on disk11, and that one is using the onboard controller, it's right after boot and the disk isn't even mounted yet, maybe the onboard controller can recover better from timeouts, but your issues are not limited to the RAID controller.

 

Aug 30 16:54:15 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug 30 16:54:15 Tower kernel: ata2.00: failed command: IDENTIFY DEVICE
Aug 30 16:54:15 Tower kernel: ata2.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Aug 30 16:54:15 Tower kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 30 16:54:15 Tower kernel: ata2.00: status: { DRDY }
Aug 30 16:54:15 Tower kernel: ata2: hard resetting link
Aug 30 16:54:15 Tower kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

 

Edited by johnnie.black
Link to comment

My "new" PSU is a Corsair AX650, which I'd hoped would be sufficient for 16 disks (plus USB). Also, the PSU should equally impact the drives connected to the mobo directly?

 

I haven't changed the power cables, so this could indeed be an untapped issue. But feels unlikely, isn't it? Also, an additional indication. "Tapping" the cables lead to immediate errors. But as said, only happens with the disks connected to the raid card, so shouldn't be per se a cable issue? Plus I changed the cables twice...

 

The card not being well seated could be an issue as well, but also feels less likely and I shouldn't be able to trigger the issue by tapping the cables.

Link to comment
14 minutes ago, johnnie.black said:

 

The diagnostics you posted also shows a timeout error on disk11, and that one is using the onboard controller, it's right after boot and the disk isn't even mounted yet, maybe the onboard controller can recover better from timeouts, but your issues are not limited to the RAID controller.

 

 

That's what I anticipated as otherwise changing the controller should have solved it (in case controller is faulty). It must indeed be something like what you are guessing (better recovery from time-outs?). If that's the case, any idea what a solution could look like. I have the issue on 4 of the 8 disks connected to the raid card. The mobo disks have not been disabled once in several years of usage. The card-connected ones face this issue once every other week (sometimes less frequent if lucky).

Link to comment

Another possibility, there is a suspicion, and only a suspicion for now as this was discovered very recently and people are testing to confirm, that one model of WD 6TB Red (WD60EFRX-68L0BN1) has firmware issues that causes timeouts, the errors can even happen on different model disks when at least one of those disks is in use together, and you have one of them, so maybe worth trying if you have a spare disk you could use to replace that one.

Link to comment

I am not 100% certain, but most likely this is not the case. I used to have the issues before using Red drives. I historically only used WD Green and some "old" other disks. But only recently started using Red disks. To be honest, I even thought starting to use Red disks may reduce the issue...

Link to comment

If the reason is indeed that my controller cannot recover well from time-outs, how could I solve this? Switch to a better controller? Other options? I have the M1015, which has been suggested on this forum. The device is not faulty as this is the third (!) one I now use. Worth to switch to a different controller? Or other suggestion?

Link to comment

Well, the M1015 are bought via ebay (Germany). No idea how real they are, but I read in this forum that the ebay route is what others have been doing.

 

They don't happen constantly, but already once in a month is enough to take my whole array down and requires a rebuild every month. Even getting it down to once per quarter would still be an issue.


Any additional ideas what I can do to fix it? Better controller? Change power cabling (I doubt). Different PSU? Or just no way to get it done with Unraid?

Link to comment

Not per se Unraid, but the issues do not show up when just using a plain vanilla Windows installation. Unraid only surfaces the issue or you can say that Unraid is not compatible with timeouts (while plain Windows is).

 

I actually already redone everything else (new mobo, new PSU, etc.). Issue remains the same (though it feels that it is better now).

 

I was thinking to buy a mobo supporting 16 disks, but was scared away by the price. Maybe a different controller card (I only used M1015 with all of my systems so far)? Or any other tweak?

Link to comment

I remember from past experience that it was stable / more stable as long as I only connect 6 instead of 8 disks to the controller card. Does this give any indication what the issue may be? I may do this again now. Would you advice me to connect 3x2 or 4x1+1x1 to the controller (referring to number of disks per cable, one cable can connect up to 4 disks to the controller card). Thanks!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.