Disk "not installed" on pool + errors on a disk

August 15, 2025Aug 15

Hi all :)

I am writing this today because I am facing a problem I am not able to solve even after checking for previous solutions here.

After some 8 TB disk failed, I replaced it with a 10 TB one, but since I have some "Not installed" error on the slot it was.

Tried to stop the array, assign a disk to "pool 2" slot, and start the array to begin rebuilding but I have a "Wrong pool state" error when I try to force-start the array.

Until a few days ago, it wasn't a real problem: everything worked fine except for errors saying a disk was missing.

It changed recently, with a lot of errors from a disk, and scrubs that could never end, and were automatically canceled at 95.14%.

Having changed all the cables and stuff, it is possible that one of the disks has actually failed.

The problem is that I can't tell which one, since sdh is the disk that is indicated as being absent on the notifications, even though it is properly connected and recognized.

"MC5 kernel: BTRFS error (device sdh1): bdev /dev/sdi1 errs"

On the rare occasions when P*ex stops working — until I reboot the machine — the sdj drive is the only one listed by Historical Unassigned Devices.

Problem is: the SN of sdj corresponds to the drive currently connected to sdi.

I suppose the first step is to successfully rebuild the pool to determine which drive is faulty, if it's truly a hardware issue?

Thanks for the help <3

mc5-diagnostics-20250815-1357.zip

Quote

August 15, 2025Aug 15

Community Expert

Reimport the pool degraded by doing the below:

on main click on the first device for that pool and then "remove pool"

back on main, create a new pool with the same name and 3 slots

assign the current 3 pool devices, leave the filesystem set to auto

start the array to import the pool

Post new diags after array start.

Quote

August 15, 2025Aug 15

Author

Hi Jorge :)

Thanks for the answer, but unfortunately even after following your steps, and tried to reboot, the "P*ex_pool 2" came back to life.

I also tried to add another _ and to start in maintenance mode, in order to "bypass" what I thought was a problem of cache somewhere, but it still recreated a 4th "ghost" slot 🤯

Latest Diag attached :)

EDIT: Also tried your solution here https://forums.unraid.net/topic/154697-removing-a-cache-pool/#findComment-1374639, with a reboot before creating the pool again, but nothing changed

EDIT 2: Tried with all shares (disks & shares) turned off and docker turned off in order to avoid some SMB disruption, but still the same

mc5-diagnostics-20250815-2259.zip

Edited August 15, 2025Aug 15 by resolute-clearance8449

Quote

August 16, 2025Aug 16

Community Expert

9 hours ago, resolute-clearance8449 said:
the "P*ex_pool 2" came back to life.

That is expected; it was just to be able to start the array

12 hours ago, JorgeB said:
Post new diags after array start.

Quote

August 16, 2025Aug 16

Author

Hi Jorge!

Oh ok well noted, diag was attached but here is the latest one :)

mc5-diagnostics-20250816-0937.zip

Quote

August 16, 2025Aug 16

Community Expert

With the array still running, type:

btrfs dev remove missing /mnt/plex_pool

When that's done, reimport the pool the same way as before, and it should now be fixed.

Quote

August 16, 2025Aug 16

Author

No change unfortunately :/

Here are a few lines that I was able to see in the logs, if it can help:

Aug 16 13:15:25 MC5 emhttpd: plex_pool: btrfs recover profile: raid1

Aug 16 13:15:25 MC5 emhttpd: plex_pool: btrfs assign devices

Aug 16 13:15:25 MC5 emhttpd: /sbin/btrfs filesystem show a6cd4b77-e532-4d05-b935-778ea77d0d76 2>&1

Aug 16 13:23:27 MC5 kernel: BTRFS info (device sdh1 state M): turning on async discard

Aug 16 13:23:27 MC5 emhttpd: /sbin/btrfs filesystem show a6cd4b77-e532-4d05-b935-778ea77d0d76 2>&1

Aug 16 13:23:27 MC5 emhttpd: Label: none uuid: a6cd4b77-e532-4d05-b935-778ea77d0d76

Aug 16 13:23:27 MC5 emhttpd: Total devices 4 FS bytes used 9.11TiB

Aug 16 13:23:27 MC5 emhttpd: devid 1 size 9.09TiB used 5.01TiB path /dev/sdh1

Aug 16 13:23:27 MC5 emhttpd: devid 2 size 0 used 0 path MISSING

Aug 16 13:23:27 MC5 emhttpd: devid 3 size 9.09TiB used 6.15TiB path /dev/sdi1

Aug 16 13:23:27 MC5 emhttpd: devid 4 size 9.09TiB used 6.15TiB path /dev/sdf1

Aug 16 13:23:27 MC5 emhttpd: plex_pool: num_missing: 1

Aug 16 13:23:27 MC5 emhttpd: update_pool_cfg: 32 plex_pool 0

Aug 16 13:23:29 MC5 emhttpd: shcmd (982): /usr/sbin/zfs mount -a

Aug 16 13:23:29 MC5 emhttpd: shcmd (983): sync

Maybe the disk that is marked as failing is preventing the "btrfs dev remove missing" command to complete?

Attached latest diags also!

mc5-diagnostics-20250816-1432.zip

Quote

August 17, 2025Aug 17

Community Expert

20 hours ago, resolute-clearance8449 said:
Maybe the disk that is marked as failing is preventing the "btrfs dev remove missing" command to complete?

That's not the issue, there's a problem with the filesystem preventing the disk removal.

Aug 16 13:29:30 MC5 kernel: BTRFS: error (device sdh1 state A) in btrfs_force_cow_block:603: errno=-117 Filesystem corrupted

Aug 16 13:29:30 MC5 kernel: BTRFS info (device sdh1 state EA): forced readonly

You will need to backup what you can from the pool and recreate it from scratch.

Quote

August 19, 2025Aug 19

Author

Ouch, 3rd time it happens... Would you know why, even in Raid1, each time a disk fails I have to loose (or save if possible) everything?

Is it possible that, if I RMA the disk that fails and replace it by a new one, the system can re-operate smoothly? Or is it def corrupted?

Thanks!

Quote

August 20, 2025Aug 20

Community Expert

If it keeps happening, I would start by running memtest.

It' not likely a device problem, just filesystem corruption, which, if it happens often, can be the result of bad RAM.

Quote

August 29, 2025Aug 29

Author
Solution

Hey! Few days later, happened a lot, but I think we are on a good way:

unpluged the disk that "was faulty"
removed the pool, recreated it with the 2 others disks in Raid1
ran 4 passes of memtest with 0 error

But I saw a few errors and corruptions coming back from another disk, as soon as I copied more than 1 TB of data from the "save" disk to the pool.

So I :

changed 3 SATA cables (all the disks were plugged on the LSI), and plugged the disks directly to the motherboard
updated the LSI 9300-16i from 07.00.01.00 to 16.00.12.00
plugged back the "faulty" disk
ran a long SMART on each of the 3 disks, and all ended with no error

But still errors on the syslog, so I thought about the first chan I created here, where I was talking about the infamous Cooler Master G650M that was powering the server.

Sooooo I :

bought a Lian Li Edge Gold in 750W, that comes natively with 12 SATA
added again the 3rd disk to the pool
Ran a balance on the pool (still in Raid1)
am running a scrub to fix the few errors that appeared
will try to copy 2 TB to see what happens

Will keep you updated, even if it is purely to potentially help others who might have the same problem in the future (👋)

Quote

1

September 6, 2025Sep 6

Author

Update 1 week later: it seems that the Lian Li resolved everything.

Even the disks that seemed to be dead passed the extended SMART test without any problem, they handled a 15 TB copy without a flinch, sooooo...

Might have found the solution with this PSU!

Thanks a lot for your help Jorge <3

Quote

1

Disk "not installed" on pool + errors on a disk

Featured Replies

Solved by resolute-clearance8449

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)