Jump to content

Parity and disk 15 with errors


steve1977

Recommended Posts

Didn't look at your log, but the resolution is fairly straightforward ...

 

=>  If this is a dual parity system (v6.2) you can simply replace both disks and the system will rebuild them.

 

=>  If it's not dual parity, you can't rebuild either disk.  You'll need to replace both drives; do a New Config; and let it do a new parity sync.    Then copy the data that you lost from #15 back to your array from your backups.    If you don't have the data backed up, then you can attach the failed #15 as an "unassigned device" and see if you can read any of the data from it => if so, copy it to the array (probably to the new #15).

 

 

Link to comment

Thanks for your message. Agree with you on the need to get to the bottom of it. Directionally, the issue is that there is some form of instabiliy with the cables. And everytime this happens, Unraid disables the disk. No clue though how to change. I had already changed the whole cable tree (twice!) by a professional IT person. It may just be my setup with 16 disks in a Tower rather than using a professional IT storage for so many disks.

 

Can you have another look at my log. I am quite sure that disk 15 is just the "normal issue" with cables, but the parity disk keeps on having errors over errors, so I suspect this disk may indeed have some hardware error and not just some cabling. I am attaching an updated log.

tower-diagnostics-20161030-2028.zip

Link to comment

You need to start array after a new config to record the new assignments, this is the order (disk0 is parity):

 

 

Oct 30 20:26:00 Tower kernel: md: import disk0: (sdp) WDC_WD60EFRX-68L0BN1_WD-WX11D363T6L2 size: 5860522532 
Oct 30 20:26:00 Tower kernel: md: import disk1: (sdo) WDC_WD60EZRX-00MVLB1_WD-WX11DA40HUVD size: 5860522532 
Oct 30 20:26:00 Tower kernel: md: import disk2: (sdj) WDC_WD60EZRX-00MVLB1_WD-WX11D741AYXK size: 5860522532 
Oct 30 20:26:00 Tower kernel: md: import disk3: (sde) WDC_WD60EZRX-00MVLB1_WD-WX41D3402835 size: 5860522532 
Oct 30 20:26:00 Tower kernel: md: import disk4: (sdg) WDC_WD60EZRX-00MVLB1_WD-WXK1H641XM4J size: 5860522532 
Oct 30 20:26:00 Tower kernel: md: import disk5: (sdl) WDC_WD60EZRX-00MVLB1_WD-WX31D55A4944 size: 5860522532 
Oct 30 20:26:00 Tower kernel: md: import disk6: (sdn) WDC_WD40EZRX-00SPEB0_WD-WCC4EHJF6HU0 size: 3907018532 
Oct 30 20:26:00 Tower kernel: md: import disk7: (sdf) WDC_WD60EZRX-00MVLB1_WD-WX11DA49HHVY size: 5860522532 
Oct 30 20:26:00 Tower kernel: md: import disk8: (sdq) WDC_WD40EZRX-00SPEB0_WD-WCC4E1961434 size: 3907018532 
Oct 30 20:26:00 Tower kernel: md: import disk9: (sdr) WDC_WD40EZRX-00SPEB0_WD-WCC4E4Z7FKHA size: 3907018532 
Oct 30 20:26:00 Tower kernel: md: import disk10: (sdb) WDC_WD60EZRX-00MVLB1_WD-WX71DA4A03D2 size: 5860522532 
Oct 30 20:26:00 Tower kernel: md: import disk11: (sdc) WDC_WD40EZRX-00SPEB0_WD-WCC4E0076546 size: 3907018532 
Oct 30 20:26:00 Tower kernel: md: import disk12: (sdd) ST4000DM000-1F2168_W300JY79 size: 3907018532 
Oct 30 20:26:00 Tower kernel: md: import disk13: (sdi) ST4000DM000-1F2168_W3008JLM size: 3907018532 
Oct 30 20:26:00 Tower kernel: md: import disk14: (sdm) WDC_WD40EZRX-00SPEB0_WD-WCC4EJ4EN10X size: 3907018532 
Oct 30 20:26:00 Tower kernel: md: import disk15: (sdk) WDC_WD60EFRX-68L0BN1_WD-WXB1HB4PHE51 size: 5860522532 

 

For the extended SMART test, on the main page click on parity and then run "SMART extended self-test"

Link to comment

Disk 15 and parity disk both shows errors in the GUI. Parity disabled, but disk 15 did not.

 

I tried to rebuild a few times, but the parity disk keeps showing error and disabling.

 

I am now running the extended SMART for a while already, but it is still at 10%. I assume this will take overnight to complete?

 

Any more thoughts appreciated!

Link to comment

Thanks for your help.

 

This time, I just cannot get it to work. It starts, but then quite shortly disables given errors on the parity disk.

 

The disk appears ok given SMART test. I have changed cablings and even the raid card several cards to make sure that there is no hardware defect.

 

My read is that the issue will be that I have 16 disks in a Tower case, which may result in the cabling being too tight (causing cross-cable noise), which makes Unraid disable the disks. While this is not an issue without Unraid, it does surface in this setting.

 

I assume there is no way that Unraid is less sensitive from what point to disable disks? If so, my only solution may be to find a better case to suit my 16 disks. Any suggestions?

Link to comment

Many users have tower cases with many disks, I have two with 22 disks each.

 

The type your problem you have is often difficult to diagnose, what I usually do in those type of situations is swap hardware, starting with the most probable ones, in your case, I'd start with controller, PSU and motherboard.

Link to comment

Thanks, I have already swapped the controller (twice!). All three are the same model, which is recommended on this forum. So, it should not be the controller.

 

I doubt it is the mobo, but wouldn't rule it out. The 8 disks attached to the harddisk never face such issues, but it is always (!) the ones connected to the controller.

 

The PSU is rather new and it has more than enough power, so I doubt this is the reason.

 

All points to the cable. The issue is rather easy to replicate. Once everything is working and I touch the cables (just a very light touch), the disk immediately gets disabled. This does not happen to the 8 disks connected to the mobo directly, but holds true for the ones connected to the controller.

 

Besides cables, I could consider to buy a different model of my controller card. Maybe, it is not up to the job? All three I have tried are the same model. What model are you using?

Link to comment

All points to the cable. The issue is rather easy to replicate. Once everything is working and I touch the cables (just a very light touch), the disk immediately gets disabled. This does not happen to the 8 disks connected to the mobo directly, but holds true for the ones connected to the controller.

 

That does look like a cable issue, but because you said they were replace twice I was thinking of other possibilities.

 

A new PSU can have issues, just had a month old Seasonic gold PSU die, and before it did there were some strange symptoms, but it would not make much sense to only affect HDDs on the HBA.

 

Besides cables, I could consider to buy a different model of my controller card. Maybe, it is not up to the job? All three I have tried are the same model. What model are you using?

 

I'm using a lot of different controllers, but for 8 ports I would recommend something LSI based, like the one you're using.

Link to comment

Thanks.

 

You are right, I have actually changed the whole cabling twice and had it done by a professional, so I felt that it had been well done. The issue remained though.

 

What confuses me is that the cable issue never happens with the disks attached to the mobo directly. The cables are equally tight within the Tower for those 8 disks, but not once did I have an issue with the 8 disks connected to the mobo.

 

For the 8 connected to the controller, it is typically the same disk having problems, but not always. Sometimes, it is a different disk. Technically, there are two cable trees (4+4 disks connected to the controller card), but both are having issues.

 

Besides cables, I still think it may be related to the controller card? I was thinking to buy a mobo with 16 ports once this comes out and is somewhat affordable?

 

You mention you have 22 disks in one Tower? What controller cards are you using for this setup?

Link to comment

Given that you've had this issue repeatedly (I was not initially aware of this), and that it always seems related to the cables, that's almost certainly the issue.   

 

I'd have also suspected the controller (a contact issue where the SFF cable connects, or a trace issue related to that channel), EXCEPT you indicated you've replaced it and tried 3 different controller cards, and still have the same issue.

 

I might also suspect the drive slot, except you noted that if you connect it directly to the motherboard you never have problems.

 

Finally, the most telling point is your comment that if you  "... touch the cables (just a very light touch), the disk immediately gets disabled."

 

I'd absolutely replace that cable !!  It's either defective, or it's very poorly insulated and a mere touch is generating enough static to cause a signaling spike.      These breakout cables are available in several lengths -- get the shortest length you can that's "long enough" => more length = greater signal deterioration and lower reliability.

 

 

Link to comment

You mention you have 22 disks in one Tower? What controller cards are you using for this setup?

 

6 x onboard

8 x SAS2LP

4 x SASLP

4 x Adaptec 1430SA on one server and  4 x Adaptec 1405 on the other

 

On these servers I'm using the SAS2LP without any issues, but there is a small number of users that have issues with this controller and unRAID v6, I'm also using the Dell H310 on 3 other servers and AFAIK there are no users with issues with these and other similar LSI controllers, that's why I generally recommend the LSI instead of the SAS2LP for 8 ports.

Link to comment

Thanks for your messages.

 

To clarify - the three controller cards are the same model. So, it is safe to assume that there is no hardware defect with the controller card, but it may just be that this model is not fully compatible with my setup?

 

With regards to the cable? Would you have a recommendation where to buy high-quality cables that can connect with my controller card?

 

Thanks in advance!

Link to comment

To clarify - the three controller cards are the same model. So, it is safe to assume that there is no hardware defect with the controller card, but it may just be that this model is not fully compatible with my setup?

 

It's possible, you can also try a different slot if available.

 

 

With regards to the cable? Would you have a recommendation where to buy high-quality cables that can connect with my controller card?

 

I use mostly these:

 

https://www.amazon.com/3WARE-Cable-Multi-lane-Internal-SFF-8087/dp/B000FBYS2U

 

CBL-SFF8087OCF-05M - 50 cm long

CBL-SFF8087OCF-10M - 1 m long

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...