snowmirage Posted October 14 Share Posted October 14 Had this error popup today, systems been working fine for many months. I noticed I may have an ssd failing maybe thats the cause? Nothing else I see stood out to me yet. phoenix-diagnostics-20241014-1538.zip Quote Link to comment
JorgeB Posted October 15 Share Posted October 15 Logs are filled with HBA related errors, difficult to see anything else, but looks like a device pool dropped offline: Oct 14 15:38:28 phoenix kernel: BTRFS error (device dm-22: state EA): bdev /dev/mapper/sdad1 errs: wr 8, rd 1, flush 0, corrupt 0, gen 0 Reboot and post new diags after array start. Quote Link to comment
snowmirage Posted Tuesday at 03:34 PM Author Share Posted Tuesday at 03:34 PM Finally got back to this today. Did a clean reboot and started array then grabbed this diag Thank you for the help it is greatly appreciated. phoenix-diagnostics-20241105-1033.zip Quote Link to comment
JorgeB Posted Tuesday at 03:46 PM Share Posted Tuesday at 03:46 PM Pool looks OK for now, suggesting running a scrub, if it happens again check/replace cables for that device, also see some issues with an Hitachi HDD, but it's unassigned. Quote Link to comment
snowmirage Posted Tuesday at 03:55 PM Author Share Posted Tuesday at 03:55 PM I was able to start the scrub but it quickly aborted Is there one specific drive that is causing an issue? Other than the Hitachi HDD that is unassigned, I need to just remove that after the next reboot I was just testing that drive a while ago. phoenix-diagnostics-20241105-1051.zip Quote Link to comment
JorgeB Posted Tuesday at 04:13 PM Share Posted Tuesday at 04:13 PM Another device dropped, this time: /dev/mapper/sdag1 Quote Link to comment
snowmirage Posted Tuesday at 06:38 PM Author Share Posted Tuesday at 06:38 PM I pulled that drive suspecting it has failed they are old ssds at this point. I was able to replace it with a new larger SSD and I can insert that into the cache pool But when I try to start the array I get an error about the pool. Could have sworn when I did this a few years back it prompted me to format the disk and then rebuild the array. Quote Link to comment
JorgeB Posted Tuesday at 06:55 PM Share Posted Tuesday at 06:55 PM The pool is using the single profile, not redundant, so you cannot use the GUI to replace a device, you can use the commend line, but only if the old device is also still connected. Quote Link to comment
snowmirage Posted Tuesday at 07:17 PM Author Share Posted Tuesday at 07:17 PM Oh my! I assume by "single profile" you mean I was not using btfs version of Raid1 or higher. So I had no redundancy... How did I do that I could have sworn this was setup with at least one spare drive guess I'm wrong. My only option then I guess is to create a new pool. Trying to do that via the GUI I don't see a way to remove the old cache_ssd pool though how can I do that after I remove all the disks and create the new pool? Quote Link to comment
JorgeB Posted Tuesday at 07:22 PM Share Posted Tuesday at 07:22 PM It still looks to me more like a power/connection issue, the SSD that dropped can be OK, it's a different one than the one that dropped before. Quote Link to comment
snowmirage Posted Tuesday at 07:55 PM Author Share Posted Tuesday at 07:55 PM Got ya... Well since those drives were all very old anyway. I removed the old cache_ssd pool and created a new one with 2 new 1TB SSDs. Now that its back up with the new pool I guess I just need to edit my share "logic" to include the new ssd_cache pool where I desire it. Then I'll let it run a bit to see if any more issues come up. Its using the same controller + cables + power cables as before, so if it was a drive it should be fixed. If it is a power cable or such I would expect to see the same issue. *fingers crossed* Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.