HD disabled

April 18, 20188 yr

Diagnostic attached. Something weng wrong with disk 14. Smart is ok, no error shown.

tower-diagnostics-20180418-2146.zip

Quote

April 18, 20188 yr

Diags are after rebooting so not much to see, SMART does look fine, so more likely a connection issue.

Quote

April 18, 20188 yr

Author

Thanks. I actually didn't reboot. My array doesn't auto-start, so I am sure it didn't reboot on its own.

Quote

April 18, 20188 yr

Just now, steve1977 said:

so I am sure it didn't reboot on its own.

It did:

Apr 16 08:07:42 Tower emhttpd: unclean shutdown detected

Quote

April 20, 20188 yr

Author

Mmh.. That's weird. I tried to rebuild on the same disk and disk disabled again with errors. Please see diagnostic attached. I can also run an extended smart if helpful?

tower-diagnostics-20180420-1932.zip

Quote

April 20, 20188 yr

Did you check/replace the cables?

Quote

April 20, 20188 yr

Author

The cables are quite new. I had issues earlier with disks keep on getting disabled every few weeks or so (and rebuild of larger disks not working). Changing the M1015 card seemed to have it fixed and it appeared the conclusion was that the M1015 over-heated / started mis-functioning.

It seems the issue is back now (or a genuinely faulty disk)?

Quote

April 20, 20188 yr

Disk 14 has dropped off line and Disk 11's link is being reset.

I don't think it matters how new the cables are. The SATA connector is a rather poor design.

Quote

April 20, 20188 yr

Looks more like a connection issue, but it could be the disk, swap cables/backplane with a different disk and try again, if it fails again see if it follows the disk or remains with the slot/cables.

Quote

April 20, 20188 yr

Author

Pushed again all cables and rebooted. Rebuild in progress. Also running a short smart, but stalling at 90%.

Didn't swap yet as I wasn't sure whether I can do this and still rebuild the disk. If I were to switch disks 13 and 14, I wonder whether this could prevent me from rebuilding?

Quote

April 20, 20188 yr

52 minutes ago, steve1977 said:

Didn't swap yet as I wasn't sure whether I can do this and still rebuild the disk.

You can.

Quote

April 20, 20188 yr

Author

Ok, I swapped two disks. Now, the short smart doesn't complete ("interrupted hoest reset"). Full diagnostic attached.

tower-diagnostics-20180420-2357.zip

Quote

April 20, 20188 yr

19 minutes ago, steve1977 said:

Ok, I swapped two disks.

The server was powered off for the swap?

Quote

April 20, 20188 yr

Author

Yes

Quote

April 20, 20188 yr

You're supposed to powerdown before swapping cables, reboot.

Quote

April 20, 20188 yr

Author

That's what I have done. Power down. Swap cables. Start.

Quote

April 20, 20188 yr

Sorry, misunderstood, interrupted by host usually means other process is trying to access the disk, e.g. a spin down you cause that, try rebuilding again.

Quote

April 21, 20188 yr

Author

I rebooted and it is now rebuilding. It is quite slow though (compared to prior rebuilds, but no error (yet).

Short smart completed. I am now running an extended smart. Let's see whether this completes and shows anything.

I am quite sure that it is not the cable. Hopefully, it is the HD. If not the HD, I am quite sure that it is yet again the M1015. I am having this same cabling issues for years and have changed pretty much everything (incl. cables). When I changed the M1015 two weeks ago, it appeared to have finally solved it once and for all. There was some advice in another threat that it may be that the M1015 over-heated / died, so the new one has fixed it. Maybe the new one is now also fried? No idea how I can better cool it or whether this is really the issue?

Quote

April 21, 20188 yr

You should run the extended test when there's minimal disk activity, not during a rebuild.

6 hours ago, steve1977 said:

There was some advice in another threat that it may be that the M1015 over-heated / died, so the new one has fixed it. Maybe the new one is now also fried? No idea how I can better cool it or whether this is really the issue?

More likely if the issue continues to happen the problem wasn't the HBA.

Quote

April 21, 20188 yr

Author

One theory is that the card just "colled down" when I changed to the new card and that's why it worked. Now, the new M1015 has been running for 24/7 for a few weeks and may also have a heat issue.

Anyhow, I am going ahead to switch the last component that I haven't changed yet to fix this. Will get a new PSU over the weekend and see where this leaves me. Maybe going from 850W to 1000W will do the trick...

Quote

April 21, 20188 yr

You have any fans that moves air over the HBA? Some HBA can be quite hot if there isn't enough air circulation.

Quote

April 21, 20188 yr

Author

The new PSU didn't fix the issue, so I am back here.

I can well imagine that the heat is the issue.

I am thinking of two things now: 1) installing a fan close to the heat sink (recommendations which ones?), 2) making another attempt to pass-through my primary GPU to VM and then install two M1015 (reduced load and heat).

Quote

April 21, 20188 yr

Author

Ok, getting closer to understanding the issue. It must be something around heat.

I removed my secondary GPU, which was adjacent to the M1015. It now seems to work (better). So, most likely, the heat from the GPU has negative impact on the HBA. Or the adjancy to it prevents it from getting air.

Brings me back to the two options above. I'll give it a shot again to pass-through the primary GPU to my VM (but assume I'll fail again). Then, will look for a fan for my M1015. Any advice appreciated!

Quote

April 21, 20188 yr

2 hours ago, steve1977 said:

installing a fan close to the heat sink (recommendations which ones?)

It isn't critical what fans you use because it isn't huge amounts of heat you need to move away. The important thing is to not have heat pockets - if you can get reasonably cold air to reach the boards then you will be fine even if the air speed is rather low. If you have good air movement close by, then it might also be enough with a shroud or similar to aim some of the air in the right direction.

Quote

April 22, 20188 yr

Author

Ok. it seems confirmed that overheating of the hba causes my long-standing issue.

i can now replicate the issue. when i install my gpu adjacent to the hba and play a graphics intense game, disk gets disabled. this likely is a result of the heat of the gpu impacting the hba.

it’s now working ok after removing one of my two GPUs and keeping one slot empty between hba and gpu.

i anticipate the issue will get worse again once the warm summer months will start.

i’ll look into buying an additional fan for the hba. one way could be a dedicated one to put on-top of the heat sink. no idea how to screw it to the heatsink though and what model to pick. another way could be to slot a general slot into the pci slot between the hba and gpu.

it seems you’re somewhat suggesting the latter? any recommendation on a specific model?

Quote

HD disabled

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)