Jump to content

Potential disk failing?


jlw_4049
Go to solution Solved by JorgeB,

Recommended Posts

Figured it would be best to start a new thread here since I was discussing it in this thread here and said I would post back. 

My server did not start to freeze this time but when I was rebuilding my parity it made it to about 88% and then then stopped due to the parity drive failing and saying Disabled.

 

When someone has the time would you please look at my logs and let me know what you think? It passes a short SMART test but skimming over the logs I see this quite frequently. 

Dec 20 00:23:08 jlw-unRaid kernel: md: disk0 write error, sector=24319852752


I checked the physical connections, although nothing has changed. I did put the server under a heavy load during the end of the parity check though, while I doubt it's power I guess that could be an option. It's a 10 year warranty PSU of pretty high quality, so I wouldn't think so. 

Thank you!

 

jlw-unraid-diagnostics-20231220-0036.zip

Link to comment

It looks like it could be a controller issue as just before parity 0 started showing write errors you got.

Dec 20 00:23:03 jlw-unRaid kernel: mpt2sas_cm0: SAS host is non-operational !!!!
### [PREVIOUS LINE REPEATED 5 TIMES] ###
Dec 20 00:23:08 jlw-unRaid kernel: mpt2sas_cm0: _base_fault_reset_work: Running mpt3sas_dead_ioc thread success !!!!

and there is no SMART information for the parity drive in the diagnostics.

Link to comment

This is the HBA card in my server. It's been in use since 2020 and I've had no issues. I pulled the side panels off, I pushed on the HBA card, but it was in there quite snugly didn't move or anything. I reseated the power/sata cables going to the parity drive, however I am still using the same cables. After this post last night I started an extended SMART test. Currently it's at 60%. 

When I did the diagnostics the drive was "disabled" or not reading I guess, so maybe that's why it didn't include the short SMART results. I attached the short SMART results, as I had tested them right before that. 

If the HBA was dropping off wouldn't it drop other drives off on it as well? Thanks for the replies so far. I'll update when the extended SMART run is completed. 

Additionally during this extended SMART run the CPU is at ~100% load so I could ensure no power issues (I'm encoding a huge queue of movies)

WDC_WD140EDGZ-11B2DA2_2CHEZJ8N-20231218-1804.txt

Edited by jlw_4049
Link to comment

Thanks for the response. Parity is complete with 0 errors. 

I'm using these cables on my LSI card 
https://www.amazon.com/gp/product/B012BPLYJC/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&th=1

They do connect to the HDD a little loose. I am wondering if I should just replace the cables since these are relatively cheap and loose to prevent these errors from happening again if it could be the issue. 

I'm wondering if over time during use the sata just got to a point where it wasn't connected good and caused all these issues. Is there any recommended brands for these type of cables? 

Edited by jlw_4049
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...