Jump to content

I think one of my cache mirrored drives may be dying. Could use some advice.


Go to solution Solved by JorgeB,

Recommended Posts

Hi guys. 

 

Earlier tonight i noticed a bunch of my docker containers were down. Unsurprisingly it was because my chache filed up a gain.

So i did the usual. Started the move, Then almost immediately got impatioent and told the system to reboot so i could get my dockers up and running again.

 

Only this time the system never came back up (Atleast not the webui) So i checked dmesg via ssh and found it was continuously spamming this.

Spoiler

c6a7a

From what i understand this is usually cause by bad sata connections but at this point i have tried re-seating the cables, replacing the cables and switching to diferent sata ports on the MB. It changed nothing.

 

Usually after a while the errors will stop and the mebgui will load but as soon as i try to access files on cache it starts up again and files are inaccessible. 

 

My guess is one of the drives is failing but i have seeb both drives mentioned in the errors so i have no idea which one. 

Spoiler

a93c5

 

Its a mirrored cache pool so If i can figure out which drive is failing it should be a simple matter of disconnecting the bad drive in order to get the sistem back up and running right?

 

Any advice would be most appreciated.

 

p.s. if your wondering why my nas is named what it is. Its because its slow and it tends to get stuck and bog down the network. So i geuss its just living up to its name...  

 

Edit: Looks like its sdb. But not sure if i should just remove it or attempt a scrub...

fdbc3

evergreen-diagnostics-20230208-2054.zip

Edited by brandon3055
Link to comment

I already checked and replaced the cables to both SSD's and it had no effect. Power connections also look good but i dont have a free sata power cable to rule it out completely.

The First report attached to this post was generated while the server was attemptine to start. (via ssh) The second was generated when the GUI finally loaded. 

evergreen-diagnostics-20230208-2225.zip evergreen-diagnostics-20230208-2235.zip

Link to comment
1 hour ago, JorgeB said:

Still showing the same issues, if cables didn't help remove that SSD, pool is mirrored so it should keep working, then add another device if you want to keep it raid1.

Going to have to continue this in the morning but i removed the bad drive and the cache si now readable but it looks like it has gone read-only as a result of having no space left? 

So the mover is unable to do its job.

b6bc9

At the very least i can access the file now and can manually copy everything off if i have to.

evergreen-diagnostics-20230209-0027.zip

Link to comment
14 hours ago, JorgeB said:

If needed you could cancel the balance, delete some data, then re-balance, but probably easier to just backup and re-format.

Yea in the end i just disabled cache on all shares, rsync'd everything to my backup share, Remove the cache pool and then restored everything to the appropriate shares.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...