August 15, 2025Aug 15 Hi all :)I am writing this today because I am facing a problem I am not able to solve even after checking for previous solutions here.After some 8 TB disk failed, I replaced it with a 10 TB one, but since I have some "Not installed" error on the slot it was.Tried to stop the array, assign a disk to "pool 2" slot, and start the array to begin rebuilding but I have a "Wrong pool state" error when I try to force-start the array.Until a few days ago, it wasn't a real problem: everything worked fine except for errors saying a disk was missing.It changed recently, with a lot of errors from a disk, and scrubs that could never end, and were automatically canceled at 95.14%.Having changed all the cables and stuff, it is possible that one of the disks has actually failed.The problem is that I can't tell which one, since sdh is the disk that is indicated as being absent on the notifications, even though it is properly connected and recognized."MC5 kernel: BTRFS error (device sdh1): bdev /dev/sdi1 errs"On the rare occasions when P*ex stops working — until I reboot the machine — the sdj drive is the only one listed by Historical Unassigned Devices.Problem is: the SN of sdj corresponds to the drive currently connected to sdi.I suppose the first step is to successfully rebuild the pool to determine which drive is faulty, if it's truly a hardware issue?Thanks for the help <3 mc5-diagnostics-20250815-1357.zip
August 15, 2025Aug 15 Community Expert Reimport the pool degraded by doing the below:on main click on the first device for that pool and then "remove pool"back on main, create a new pool with the same name and 3 slotsassign the current 3 pool devices, leave the filesystem set to autostart the array to import the poolPost new diags after array start.
August 15, 2025Aug 15 Author Hi Jorge :)Thanks for the answer, but unfortunately even after following your steps, and tried to reboot, the "P*ex_pool 2" came back to life.I also tried to add another _ and to start in maintenance mode, in order to "bypass" what I thought was a problem of cache somewhere, but it still recreated a 4th "ghost" slot 🤯Latest Diag attached :)EDIT: Also tried your solution here https://forums.unraid.net/topic/154697-removing-a-cache-pool/#findComment-1374639, with a reboot before creating the pool again, but nothing changedEDIT 2: Tried with all shares (disks & shares) turned off and docker turned off in order to avoid some SMB disruption, but still the samemc5-diagnostics-20250815-2259.zip Edited August 15, 2025Aug 15 by resolute-clearance8449
August 16, 2025Aug 16 Community Expert 9 hours ago, resolute-clearance8449 said:the "P*ex_pool 2" came back to life.That is expected; it was just to be able to start the array12 hours ago, JorgeB said:Post new diags after array start.
August 16, 2025Aug 16 Author Hi Jorge!Oh ok well noted, diag was attached but here is the latest one :) mc5-diagnostics-20250816-0937.zip
August 16, 2025Aug 16 Community Expert With the array still running, type:btrfs dev remove missing /mnt/plex_poolWhen that's done, reimport the pool the same way as before, and it should now be fixed.
August 16, 2025Aug 16 Author No change unfortunately :/Here are a few lines that I was able to see in the logs, if it can help:Aug 16 13:15:25 MC5 emhttpd: plex_pool: btrfs recover profile: raid1Aug 16 13:15:25 MC5 emhttpd: plex_pool: btrfs assign devicesAug 16 13:15:25 MC5 emhttpd: /sbin/btrfs filesystem show a6cd4b77-e532-4d05-b935-778ea77d0d76 2>&1Aug 16 13:23:27 MC5 kernel: BTRFS info (device sdh1 state M): turning on async discardAug 16 13:23:27 MC5 emhttpd: /sbin/btrfs filesystem show a6cd4b77-e532-4d05-b935-778ea77d0d76 2>&1Aug 16 13:23:27 MC5 emhttpd: Label: none uuid: a6cd4b77-e532-4d05-b935-778ea77d0d76Aug 16 13:23:27 MC5 emhttpd: Total devices 4 FS bytes used 9.11TiBAug 16 13:23:27 MC5 emhttpd: devid 1 size 9.09TiB used 5.01TiB path /dev/sdh1Aug 16 13:23:27 MC5 emhttpd: devid 2 size 0 used 0 path MISSINGAug 16 13:23:27 MC5 emhttpd: devid 3 size 9.09TiB used 6.15TiB path /dev/sdi1Aug 16 13:23:27 MC5 emhttpd: devid 4 size 9.09TiB used 6.15TiB path /dev/sdf1Aug 16 13:23:27 MC5 emhttpd: plex_pool: num_missing: 1Aug 16 13:23:27 MC5 emhttpd: update_pool_cfg: 32 plex_pool 0Aug 16 13:23:29 MC5 emhttpd: shcmd (982): /usr/sbin/zfs mount -aAug 16 13:23:29 MC5 emhttpd: shcmd (983): syncMaybe the disk that is marked as failing is preventing the "btrfs dev remove missing" command to complete?Attached latest diags also! mc5-diagnostics-20250816-1432.zip
August 17, 2025Aug 17 Community Expert 20 hours ago, resolute-clearance8449 said:Maybe the disk that is marked as failing is preventing the "btrfs dev remove missing" command to complete?That's not the issue, there's a problem with the filesystem preventing the disk removal.Aug 16 13:29:30 MC5 kernel: BTRFS: error (device sdh1 state A) in btrfs_force_cow_block:603: errno=-117 Filesystem corruptedAug 16 13:29:30 MC5 kernel: BTRFS info (device sdh1 state EA): forced readonlyYou will need to backup what you can from the pool and recreate it from scratch.
August 19, 2025Aug 19 Author Ouch, 3rd time it happens... Would you know why, even in Raid1, each time a disk fails I have to loose (or save if possible) everything?Is it possible that, if I RMA the disk that fails and replace it by a new one, the system can re-operate smoothly? Or is it def corrupted?Thanks!
August 20, 2025Aug 20 Community Expert If it keeps happening, I would start by running memtest.It' not likely a device problem, just filesystem corruption, which, if it happens often, can be the result of bad RAM.
August 29, 2025Aug 29 Author Solution Hey! Few days later, happened a lot, but I think we are on a good way:unpluged the disk that "was faulty"removed the pool, recreated it with the 2 others disks in Raid1ran 4 passes of memtest with 0 errorBut I saw a few errors and corruptions coming back from another disk, as soon as I copied more than 1 TB of data from the "save" disk to the pool.So I :changed 3 SATA cables (all the disks were plugged on the LSI), and plugged the disks directly to the motherboardupdated the LSI 9300-16i from 07.00.01.00 to 16.00.12.00plugged back the "faulty" diskran a long SMART on each of the 3 disks, and all ended with no errorBut still errors on the syslog, so I thought about the first chan I created here, where I was talking about the infamous Cooler Master G650M that was powering the server.Sooooo I :bought a Lian Li Edge Gold in 750W, that comes natively with 12 SATAadded again the 3rd disk to the poolRan a balance on the pool (still in Raid1)am running a scrub to fix the few errors that appearedwill try to copy 2 TB to see what happensWill keep you updated, even if it is purely to potentially help others who might have the same problem in the future (👋)
September 6, 2025Sep 6 Author Update 1 week later: it seems that the Lian Li resolved everything.Even the disks that seemed to be dead passed the extended SMART test without any problem, they handled a 15 TB copy without a flinch, sooooo...Might have found the solution with this PSU!Thanks a lot for your help Jorge <3
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.