brandon3055 Posted February 8, 2023 Share Posted February 8, 2023 (edited) Hi guys. Earlier tonight i noticed a bunch of my docker containers were down. Unsurprisingly it was because my chache filed up a gain. So i did the usual. Started the move, Then almost immediately got impatioent and told the system to reboot so i could get my dockers up and running again. Only this time the system never came back up (Atleast not the webui) So i checked dmesg via ssh and found it was continuously spamming this. Spoiler From what i understand this is usually cause by bad sata connections but at this point i have tried re-seating the cables, replacing the cables and switching to diferent sata ports on the MB. It changed nothing. Usually after a while the errors will stop and the mebgui will load but as soon as i try to access files on cache it starts up again and files are inaccessible. My guess is one of the drives is failing but i have seeb both drives mentioned in the errors so i have no idea which one. Spoiler Its a mirrored cache pool so If i can figure out which drive is failing it should be a simple matter of disconnecting the bad drive in order to get the sistem back up and running right? Any advice would be most appreciated. p.s. if your wondering why my nas is named what it is. Its because its slow and it tends to get stuck and bog down the network. So i geuss its just living up to its name... Edit: Looks like its sdb. But not sure if i should just remove it or attempt a scrub... evergreen-diagnostics-20230208-2054.zip Edited February 8, 2023 by brandon3055 Quote Link to comment
JorgeB Posted February 8, 2023 Share Posted February 8, 2023 Looks more like a power/connection problem, check/replace cables for cache2 and post new diags after array start. Quote Link to comment
brandon3055 Posted February 8, 2023 Author Share Posted February 8, 2023 I already checked and replaced the cables to both SSD's and it had no effect. Power connections also look good but i dont have a free sata power cable to rule it out completely. The First report attached to this post was generated while the server was attemptine to start. (via ssh) The second was generated when the GUI finally loaded. evergreen-diagnostics-20230208-2225.zip evergreen-diagnostics-20230208-2235.zip Quote Link to comment
JorgeB Posted February 8, 2023 Share Posted February 8, 2023 Still showing the same issues, if cables didn't help remove that SSD, pool is mirrored so it should keep working, then add another device if you want to keep it raid1. Quote Link to comment
brandon3055 Posted February 8, 2023 Author Share Posted February 8, 2023 1 hour ago, JorgeB said: Still showing the same issues, if cables didn't help remove that SSD, pool is mirrored so it should keep working, then add another device if you want to keep it raid1. Going to have to continue this in the morning but i removed the bad drive and the cache si now readable but it looks like it has gone read-only as a result of having no space left? So the mover is unable to do its job. At the very least i can access the file now and can manually copy everything off if i have to. evergreen-diagnostics-20230209-0027.zip Quote Link to comment
Solution JorgeB Posted February 8, 2023 Solution Share Posted February 8, 2023 If needed you could cancel the balance, delete some data, then re-balance, but probably easier to just backup and re-format. Quote Link to comment
brandon3055 Posted February 9, 2023 Author Share Posted February 9, 2023 14 hours ago, JorgeB said: If needed you could cancel the balance, delete some data, then re-balance, but probably easier to just backup and re-format. Yea in the end i just disabled cache on all shares, rsync'd everything to my backup share, Remove the cache pool and then restored everything to the appropriate shares. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.