Unraid only shows 125 of 240 disks in GUI

bonienl · May 31, 2022

Please post your diagnostics so we can have a deeper look.

There is no explicit limit in the GUI and as long as all devices are properly detected, they should be available, but this requires further investigation (you are the first with so many devices).

enderTown · May 31, 2022

Sure, see diagnostics attached.

I forgot to mention in my original post that the r720 also has 16 drives attached to its internal SAS controller, so there are actually 3 SAS HBAs and 256 drives in the mix, all flashed to IT mode.

BUT I've also already narrowed down the problem. I've unplugged all drives except a few in the drive arrays and noticed something odd right away. It depends on how "far away" the disk is from the HBA on the daisychain. In other words, if I put an 18TB on the 10th JBOD (the last one in the daisy chain), it will show up as unmountable (even tho I can mount it from terminal manually just fine). But if I just move that same disk to the first JBOD in the chain, it shows up as normal and I can mount it, etc. So it seems to be a problem with SAS addresses/ports maybe?

Here is the log I captured while I unplugged/plugged in that test drive. The first two are me trying it in the tenth array of both HBAs (last in chain) and it showed up as unmountable in both cases in Unassigned Devices plugin. Then the last one is me plugging it into the first array and it shows up like normal in the UI, I can mount it, etc.

May 31 08:30:08 farmer01 kernel: mpt2sas_cm2: handle(0x1e) sas_address(0x500c04f2178bac1b) port_type(0x1)
May 31 08:30:09 farmer01 kernel: scsi 9:0:11:0: Direct-Access ATA ST18000NE000-2YY EN01 PQ: 0 ANSI: 6
May 31 08:30:09 farmer01 kernel: scsi 9:0:11:0: SATA: handle(0x001e), sas_addr(0x500c04f2178bac1b), phy(27), device_name(0x0000000000000000)
May 31 08:30:09 farmer01 kernel: scsi 9:0:11:0: enclosure logical id (0x500c04f2178bac00), slot(1)
May 31 08:30:09 farmer01 kernel: scsi 9:0:11:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
May 31 08:30:09 farmer01 kernel: scsi 9:0:11:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: Attached scsi generic sg39 type 0
May 31 08:30:09 farmer01 kernel: end_device-9:9:2: add: handle(0x001e), sas_addr(0x500c04f2178bac1b)
May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: Power-on or device reset occurred
May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: [sds] 35156656128 512-byte logical blocks: (18.0 TB/16.4 TiB)
May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: [sds] 4096-byte physical blocks
May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: [sds] Write Protect is off
May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: [sds] Mode Sense: 7f 00 10 08
May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: [sds] Write cache: enabled, read cache: enabled, supports DPO and FUA
May 31 08:30:09 farmer01 kernel: sds: sds1
May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: [sds] Attached SCSI disk
May 31 08:30:10 farmer01 kernel: BTRFS: device fsid 24635fb7-bb07-4292-ae69-b4a02c5faad2 devid 2 transid 57 /dev/sds1 scanned by udevd (22820)
May 31 08:30:10 farmer01 unassigned.devices: Disk with ID 'ST18000NE000-2YY101_ZR52M4CN (sds)' is not set to auto mount.
May 31 08:30:45 farmer01 kernel: sd 9:0:11:0: device_block, handle(0x001e)
May 31 08:30:48 farmer01 kernel: sd 9:0:11:0: device_unblock and setting to running, handle(0x001e)
May 31 08:30:48 farmer01 kernel: sd 9:0:11:0: [sds] Synchronizing SCSI cache
May 31 08:30:48 farmer01 kernel: sd 9:0:11:0: [sds] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK
May 31 08:30:48 farmer01 kernel: mpt2sas_cm2: mpt3sas_transport_port_remove: removed: sas_addr(0x500c04f2178bac1b)
May 31 08:30:48 farmer01 kernel: mpt2sas_cm2: removing handle(0x001e), sas_addr(0x500c04f2178bac1b)
May 31 08:30:48 farmer01 kernel: mpt2sas_cm2: enclosure logical id(0x500c04f2178bac00), slot(1)
May 31 08:31:38 farmer01 kernel: mpt2sas_cm1: handle(0x1d) sas_address(0x500c04f29a7c6121) port_type(0x1)
May 31 08:31:39 farmer01 kernel: scsi 8:0:10:0: Direct-Access ATA ST18000NE000-2YY EN01 PQ: 0 ANSI: 6
May 31 08:31:39 farmer01 kernel: scsi 8:0:10:0: SATA: handle(0x001d), sas_addr(0x500c04f29a7c6121), phy(33), device_name(0x0000000000000000)
May 31 08:31:39 farmer01 kernel: scsi 8:0:10:0: enclosure logical id (0x500c04f29a7c6100), slot(0)
May 31 08:31:39 farmer01 kernel: scsi 8:0:10:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
May 31 08:31:39 farmer01 kernel: scsi 8:0:10:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: Attached scsi generic sg39 type 0
May 31 08:31:39 farmer01 kernel: end_device-8:9:1: add: handle(0x001d), sas_addr(0x500c04f29a7c6121)
May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: Power-on or device reset occurred
May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: [sds] 35156656128 512-byte logical blocks: (18.0 TB/16.4 TiB)
May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: [sds] 4096-byte physical blocks
May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: [sds] Write Protect is off
May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: [sds] Mode Sense: 7f 00 10 08
May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: [sds] Write cache: enabled, read cache: enabled, supports DPO and FUA
May 31 08:31:39 farmer01 kernel: sds: sds1
May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: [sds] Attached SCSI disk
May 31 08:31:41 farmer01 unassigned.devices: Disk with ID 'ST18000NE000-2YY101_ZR52M4CN (sds)' is not set to auto mount.
May 31 08:32:36 farmer01 kernel: sd 8:0:10:0: device_block, handle(0x001d)
May 31 08:32:38 farmer01 kernel: sd 8:0:10:0: device_unblock and setting to running, handle(0x001d)
May 31 08:32:38 farmer01 kernel: sd 8:0:10:0: [sds] Synchronizing SCSI cache
May 31 08:32:38 farmer01 kernel: sd 8:0:10:0: [sds] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK
May 31 08:32:38 farmer01 kernel: mpt2sas_cm1: mpt3sas_transport_port_remove: removed: sas_addr(0x500c04f29a7c6121)
May 31 08:32:38 farmer01 kernel: mpt2sas_cm1: removing handle(0x001d), sas_addr(0x500c04f29a7c6121)
May 31 08:32:38 farmer01 kernel: mpt2sas_cm1: enclosure logical id(0x500c04f29a7c6100), slot(0)
May 31 08:33:22 farmer01 kernel: mpt2sas_cm1: handle(0x1d) sas_address(0x500c04f2dec15121) port_type(0x1)
May 31 08:33:23 farmer01 kernel: scsi 8:0:11:0: Direct-Access ATA ST18000NE000-2YY EN01 PQ: 0 ANSI: 6
May 31 08:33:23 farmer01 kernel: scsi 8:0:11:0: SATA: handle(0x001d), sas_addr(0x500c04f2dec15121), phy(33), device_name(0x0000000000000000)
May 31 08:33:23 farmer01 kernel: scsi 8:0:11:0: enclosure logical id (0x500c04f2dec15100), slot(0)
May 31 08:33:23 farmer01 kernel: scsi 8:0:11:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
May 31 08:33:23 farmer01 kernel: scsi 8:0:11:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: Attached scsi generic sg39 type 0
May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: Power-on or device reset occurred
May 31 08:33:23 farmer01 kernel: end_device-8:0:2: add: handle(0x001d), sas_addr(0x500c04f2dec15121)
May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: [sds] 35156656128 512-byte logical blocks: (18.0 TB/16.4 TiB)
May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: [sds] 4096-byte physical blocks
May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: [sds] Write Protect is off
May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: [sds] Mode Sense: 7f 00 10 08
May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: [sds] Write cache: enabled, read cache: enabled, supports DPO and FUA
May 31 08:33:23 farmer01 kernel: sds: sds1
May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: [sds] Attached SCSI disk

diagnostics-20220531-1037.zip

Edited May 31, 2022 by enderTown

bonienl · May 31, 2022

It looks like you have daisy-chained a number of expanders, I am not an expert in expanders (not using them myself), but it seems to go wrong there, see the file "lsscsi.txt" in the diagnostics. Perhaps @JorgeB can give his expert view?

JorgeB · May 31, 2022

27 minutes ago, enderTown said:

But if I just move that same disk to the first JBOD in the chain, it shows up as normal and I can mount it, etc. So it seems to be a problem with SAS addresses/ports maybe?

Possibly, I have some experience with SAS daisy chain but never did it with so many expanders, only 2 or 3, don't known if there's a limit, did you test to find out in which number of the chain does it stop working?

You could also connect two enclosures per HBA, then would just need to daisy chain 5 units per connection, this would also be better for available bandwidth.

enderTown · May 31, 2022

19 minutes ago, JorgeB said:

Possibly, I have some experience with SAS daisy chain but never did it with so many expanders, only 2 or 3, don't known if there's a limit, did you test to find out in which number of the chain does it stop working?

You could also connect two enclosures per HBA, then would just need to daisy chain 5 units per connection, this would also be better for available bandwidth.

I don't know exactly where it stops but I will give that a try next. Also unfortunately I don't have two more cables of the proper length but that is a good idea regardless, so I'll order those.

But remember, these drives show up just fine in Windows Server/Ubuntu and even in the Terminal in Unraid, so that's why I think it is something in the Unraid driver that is getting confused?

-Daedalus · May 31, 2022

It's been a while since I've looked at the MD boxes, but per this guide (https://downloads.dell.com/manuals/common/MD_1200_MD1220_Reference Guide_EN.pdf) the last page indicates a maximum chain of 4 MDs is supported.

Now, I'm sure more than that will work - 4 is only what Dell will officially support - but it would explain what's happening here.

enderTown · May 31, 2022

1 minute ago, -Daedalus said:

It's been a while since I've looked at the MD boxes, but per this guide (https://downloads.dell.com/manuals/common/MD_1200_MD1220_Reference Guide_EN.pdf) the last page indicates a maximum chain of 4 MDs is supported.

Now, I'm sure more than that will work - 4 is only what Dell will officially support - but it would explain what's happening here.

Oh yes, they definitely will work with more than 4. I actually have been running all 20 daisy chained to a single SAS port first through Windows for several months, then Ubuntu for the last month or so, and now giving Unraid a shot. I actually just added the other 9200 SAS adapter as trial and error for this problem, but now that's its in there I will probably leave it and get a few more cords so I can get 5 MS1200's per port for more bandwidth.

Anyway, this config is definitely not under support from Dell but works in IT mode with normal SAS standards on Win/Linux and HBAs themselves can support over 1000 drives each IIRC?

enderTown · May 31, 2022

Here is how they show up btw. The first one Dev 1 is connected to first JBOD in chain and sdr is in the last one:

image.png.e32dab79b1f991086eca066eef9022b7.png

The only thing I can do in the UI for sdr is pre-clear (using preclear plugin) so I'm giving that a shot but doubt it will make any difference. If I move the sdr disk to the first JBOD in chain, it shows up as Dev 2 and I can mount/select it for array slot as normal.

enderTown · May 31, 2022

Something else that just happened that may be related (and is really bad):

I added another couple 18tb drives in the first array and added them into a pool in BTRFS single mode. Started up the array and it seemed to work fine. Then I added another 18tb ntfs drive - Unassigned devices wanted to "Format" it but I mounted it from terminal and was able to see the files just fine. My goal was to copy these files via terminal to the new BTRFS pool and then format that drive and add it to the same pool.

Once I restarted the array, both my unrelated Cache pool with 8 ssds AND my new pool with these 18tb drives now show as "Invalid pool config" (see screenshot). I tried rebooting and had the same problem. It seems similar to this thread: https://forums.unraid.net/bug-reports/prereleases/690-rc4-pool-unmountable-if-array-is-re-started-after-device-repacement-r1807/

I know this is probably low priority cause I'm an edge use case, but if you don't think these things can be addressed reasonably quickly then please let me know as I'm not feeling as great about Unraid as I did at first unfortunately...

enderTown · May 31, 2022

Latest diags

farmer01-diagnostics-20220531-1539.zip

limetech · May 31, 2022

Do you have all 240 devices hooked up? If so, please capture output of this command:

v /dev/dsk/by-id

There is no limitation inside Unraid OS for max number of devices, except for:

max 30 devices in unRAID array (future feature will permit more than 1 unRAID array)
max 30 devices per pool
max 35 pools

That permits up to 1080 theoretically managed within pools and the array. Additional devices would be managed as Unassigned Devices.

JorgeB · June 1, 2022

10 hours ago, enderTown said:

Once I restarted the array, both my unrelated Cache pool with 8 ssds AND my new pool with these 18tb drives now show as "Invalid pool config" (see screenshot). I tried rebooting and had the same problem.

Problem is that one member of the second pool, sdx, the last device, doesn't belong to that pool, and it's generating an error during device scan:

May 31 12:54:31 farmer01 kernel: BTRFS: device fsid 31db1a82-5506-42f0-bffb-44f72cf077c3 devid 2 transid 62 /dev/sdu1 scanned by udevd (1562)
May 31 12:54:31 farmer01 kernel: BTRFS: device fsid 31db1a82-5506-42f0-bffb-44f72cf077c3 devid 4 transid 62 /dev/sdv1 scanned by udevd (1601)
May 31 12:54:31 farmer01 kernel: BTRFS: device fsid 31db1a82-5506-42f0-bffb-44f72cf077c3 devid 3 transid 62 /dev/sdw1 scanned by udevd (1507)
May 31 12:54:31 farmer01 kernel: BTRFS: device fsid 31db1a82-5506-42f0-bffb-44f72cf077c3 devid 1 transid 62 /dev/sds1 scanned by udevd (1519)

Fisrt 4 member are devices 1 to 4 from this pool, 5th member is device 3 from a different one, note the different fsid:

May 31 12:54:31 farmer01 kernel: BTRFS: device fsid 06b8a25f-d7e5-4ab2-9e5b-cc1d3e77bfed devid 3 transid 57 /dev/sdx1 scanned by udevd (1507)

This device is causing an error during device scan and it's what's preventing both pools to mount:

May 31 13:20:47 farmer01 emhttpd: /mnt/cache ERROR: cannot scan /dev/sdx1: Input/output error

Assuming there's no data there you just need to wipe that device

wipefs -a /dev/sdx1

and unassign it from the second pool and both pools should now mount, you can then add it back.

enderTown · June 1, 2022

9 hours ago, JorgeB said:

This device is causing an error during device scan and it's what's preventing both pools to mount:

Nice sleuthing! I do remember one of the drives was part of another pool in a previous trial and error attempt at pooling. I remember formatting it when I added it to this pool though - I would expect that to "fix it"? Were there additional steps I was supposed to take and how would I have known to do them? This seems like something the software should do itself or warn me about. Unfortunately I wasn't able to try your fix because I just deleted the whole pool and gave up on that idea.

19 hours ago, limetech said:

Do you have all 240 devices hooked up? If so, please capture output of this command:

This was my next attempt - let's just get as many drives hooked up as possible, adding them one by one. I disconnected ALL drives and started the tedious task of re-slotting them all one by one, verifying that they showed up, renaming them for easy identification and mounting them (mostly NTFS partitions that I planned on converting to BTRFS later). The first 20 or so went pretty smoothly. Around 30 things were noticeably slower in the UI, specifically loading the unattached devices plugin device list. I watched the log and saw all the udev commands flying by, some eventually timing out, etc. Shouldn't this list be cached instead of loaded every time? By the time I got to 50 disks, the UI was unusable. I'd be waiting 15-20 seconds for the list of disks to load, clicking "Mount" takes forever and flickers back and forth, etc. But I was powering through it until around disk 70 I got a weird error upon renaming a drive - some PHP error about " = not valid" or something. After refreshing (30 seconds), ALL my custom drives names were gone and now had been replaced by Dev 1 - Dev 70. I rebooted, but my custom names didn't come back.

At this point, as you can hopefully understand, I have thrown in the towel for now and am going back to my normal Ubuntu setup. I can still boot into Unraid pretty easily so I'll be happy to test any improvements you make around large disk deployments, but although Unraid may technically support 1000ish drives, it appears to be functionally limited to about 70 on my setup.

Thanks again for the fast and excellent support here and I will be happy to test updates/workarounds/etc but it will have to be somewhat limited as I'll need to work around my system availability. I REALLY like Unraid for everything else and even if it doesn't work for this use case, I'll still be using my license for a home storage server at the very least.

Unraid only shows 125 of 240 disks in GUI

User Feedback

Recommended Comments

bonienl 1768

Link to comment

enderTown 0

Link to comment

bonienl 1768

Link to comment

JorgeB 7518

Link to comment

enderTown 0

Link to comment

-Daedalus 73

Link to comment

enderTown 0

Link to comment

enderTown 0

Link to comment

enderTown 0

Link to comment

enderTown 0

Link to comment

limetech 3328

Link to comment

JorgeB 7518

Link to comment

enderTown 0

Link to comment

Join the conversation