• Unraid only shows 125 of 240 disks in GUI


    enderTown
    • Minor

    Hello, I'm coming over from this post:

     

    https://forums.unraid.net/topic/124249-unraid-only-shows-125-of-240-disks-in-gui/#comment-1133189

     

    Hello, I'm trying unraid on a server that has 240 disks attached. The UI sees the first 125 or so as "unattached devices" but the others are not shown. However, if I go to the terminal, I can see all 240 disks and mount them/work with them as normal.

     

    I thought maybe this was a limitation with the trial because I know that the pro version is supposed to support "unlimited" devices and certainly more than 125 (using several pools). However, after upgrading to Pro and rebooting, I have the same problem.

     

    I wanted to put all of these in pool arrays, but the problem is I can't even select almost half of them from the UI and I don't see a way to manage this from CLI.

     

    Oh another oddity: unassigned devices plugin *DOES* show all of them, but the ones that were missing before show up as "unmountable". Also, it takes may 90 seconds to 2 minutes for unassigned devices plugin to load the list, so it is unusable for this amount of drives.

     

    Some more info on my server setup:

    Dell Poweredge r720, dual 14 core xeons, 224gb ram

    2x LSI 9200e SAS cards

    20x Dell MD200 12 disk JBODs

    Various drive sizes from 4TB up to 18TB, total storage about 1.6PB

    Each SAS HBA is connected to 10 of the JBODs via a single port and daisy chain

    Not using multi-path

    All verified to work both in Windows and in Ubuntu before trying Unraid

     

    Thank you!

     

     




    User Feedback

    Recommended Comments

    Please post your diagnostics so we can have a deeper look.

     

    There is no explicit limit in the GUI and as long as all devices are properly detected, they should be available, but this requires further investigation (you are the first with so many devices).

     

    Link to comment

    Sure, see diagnostics attached.

     

    I forgot to mention in my original post that the r720 also has 16 drives attached to its internal SAS controller, so there are actually 3 SAS HBAs and 256 drives in the mix, all flashed to IT mode.

     

    BUT I've also already narrowed down the problem. I've unplugged all drives except a few in the drive arrays and noticed something odd right away. It depends on how "far away" the disk is from the HBA on the daisychain. In other words, if I put an 18TB on the 10th JBOD (the last one in the daisy chain), it will show up as unmountable (even tho I can mount it from terminal manually just fine). But if I just move that same disk to the first JBOD in the chain, it shows up as normal and I can mount it, etc. So it seems to be a problem with SAS addresses/ports maybe?

     

    Here is the log I captured while I unplugged/plugged in that test drive. The first two are me trying it in the tenth array of both HBAs (last in chain) and it showed up as unmountable in both cases in Unassigned Devices plugin. Then the last one is me plugging it into the first array and it shows up like normal in the UI, I can mount it, etc.

     

     

    May 31 08:30:08 farmer01 kernel: mpt2sas_cm2: handle(0x1e) sas_address(0x500c04f2178bac1b) port_type(0x1)
    May 31 08:30:09 farmer01 kernel: scsi 9:0:11:0: Direct-Access     ATA      ST18000NE000-2YY EN01 PQ: 0 ANSI: 6
    May 31 08:30:09 farmer01 kernel: scsi 9:0:11:0: SATA: handle(0x001e), sas_addr(0x500c04f2178bac1b), phy(27), device_name(0x0000000000000000)
    May 31 08:30:09 farmer01 kernel: scsi 9:0:11:0: enclosure logical id (0x500c04f2178bac00), slot(1) 
    May 31 08:30:09 farmer01 kernel: scsi 9:0:11:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
    May 31 08:30:09 farmer01 kernel: scsi 9:0:11:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
    May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: Attached scsi generic sg39 type 0
    May 31 08:30:09 farmer01 kernel: end_device-9:9:2: add: handle(0x001e), sas_addr(0x500c04f2178bac1b)
    May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: Power-on or device reset occurred
    May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: [sds] 35156656128 512-byte logical blocks: (18.0 TB/16.4 TiB)
    May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: [sds] 4096-byte physical blocks
    May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: [sds] Write Protect is off
    May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: [sds] Mode Sense: 7f 00 10 08
    May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: [sds] Write cache: enabled, read cache: enabled, supports DPO and FUA
    May 31 08:30:09 farmer01 kernel: sds: sds1
    May 31 08:30:09 farmer01 kernel: sd 9:0:11:0: [sds] Attached SCSI disk
    May 31 08:30:10 farmer01 kernel: BTRFS: device fsid 24635fb7-bb07-4292-ae69-b4a02c5faad2 devid 2 transid 57 /dev/sds1 scanned by udevd (22820)
    May 31 08:30:10 farmer01 unassigned.devices: Disk with ID 'ST18000NE000-2YY101_ZR52M4CN (sds)' is not set to auto mount.
    May 31 08:30:45 farmer01 kernel: sd 9:0:11:0: device_block, handle(0x001e)
    May 31 08:30:48 farmer01 kernel: sd 9:0:11:0: device_unblock and setting to running, handle(0x001e)
    May 31 08:30:48 farmer01 kernel: sd 9:0:11:0: [sds] Synchronizing SCSI cache
    May 31 08:30:48 farmer01 kernel: sd 9:0:11:0: [sds] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK
    May 31 08:30:48 farmer01 kernel: mpt2sas_cm2: mpt3sas_transport_port_remove: removed: sas_addr(0x500c04f2178bac1b)
    May 31 08:30:48 farmer01 kernel: mpt2sas_cm2: removing handle(0x001e), sas_addr(0x500c04f2178bac1b)
    May 31 08:30:48 farmer01 kernel: mpt2sas_cm2: enclosure logical id(0x500c04f2178bac00), slot(1)
    May 31 08:31:38 farmer01 kernel: mpt2sas_cm1: handle(0x1d) sas_address(0x500c04f29a7c6121) port_type(0x1)
    May 31 08:31:39 farmer01 kernel: scsi 8:0:10:0: Direct-Access     ATA      ST18000NE000-2YY EN01 PQ: 0 ANSI: 6
    May 31 08:31:39 farmer01 kernel: scsi 8:0:10:0: SATA: handle(0x001d), sas_addr(0x500c04f29a7c6121), phy(33), device_name(0x0000000000000000)
    May 31 08:31:39 farmer01 kernel: scsi 8:0:10:0: enclosure logical id (0x500c04f29a7c6100), slot(0) 
    May 31 08:31:39 farmer01 kernel: scsi 8:0:10:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
    May 31 08:31:39 farmer01 kernel: scsi 8:0:10:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
    May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: Attached scsi generic sg39 type 0
    May 31 08:31:39 farmer01 kernel: end_device-8:9:1: add: handle(0x001d), sas_addr(0x500c04f29a7c6121)
    May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: Power-on or device reset occurred
    May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: [sds] 35156656128 512-byte logical blocks: (18.0 TB/16.4 TiB)
    May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: [sds] 4096-byte physical blocks
    May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: [sds] Write Protect is off
    May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: [sds] Mode Sense: 7f 00 10 08
    May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: [sds] Write cache: enabled, read cache: enabled, supports DPO and FUA
    May 31 08:31:39 farmer01 kernel: sds: sds1
    May 31 08:31:39 farmer01 kernel: sd 8:0:10:0: [sds] Attached SCSI disk
    May 31 08:31:41 farmer01 unassigned.devices: Disk with ID 'ST18000NE000-2YY101_ZR52M4CN (sds)' is not set to auto mount.
    May 31 08:32:36 farmer01 kernel: sd 8:0:10:0: device_block, handle(0x001d)
    May 31 08:32:38 farmer01 kernel: sd 8:0:10:0: device_unblock and setting to running, handle(0x001d)
    May 31 08:32:38 farmer01 kernel: sd 8:0:10:0: [sds] Synchronizing SCSI cache
    May 31 08:32:38 farmer01 kernel: sd 8:0:10:0: [sds] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK
    May 31 08:32:38 farmer01 kernel: mpt2sas_cm1: mpt3sas_transport_port_remove: removed: sas_addr(0x500c04f29a7c6121)
    May 31 08:32:38 farmer01 kernel: mpt2sas_cm1: removing handle(0x001d), sas_addr(0x500c04f29a7c6121)
    May 31 08:32:38 farmer01 kernel: mpt2sas_cm1: enclosure logical id(0x500c04f29a7c6100), slot(0)
    May 31 08:33:22 farmer01 kernel: mpt2sas_cm1: handle(0x1d) sas_address(0x500c04f2dec15121) port_type(0x1)
    May 31 08:33:23 farmer01 kernel: scsi 8:0:11:0: Direct-Access     ATA      ST18000NE000-2YY EN01 PQ: 0 ANSI: 6
    May 31 08:33:23 farmer01 kernel: scsi 8:0:11:0: SATA: handle(0x001d), sas_addr(0x500c04f2dec15121), phy(33), device_name(0x0000000000000000)
    May 31 08:33:23 farmer01 kernel: scsi 8:0:11:0: enclosure logical id (0x500c04f2dec15100), slot(0) 
    May 31 08:33:23 farmer01 kernel: scsi 8:0:11:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
    May 31 08:33:23 farmer01 kernel: scsi 8:0:11:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
    May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: Attached scsi generic sg39 type 0
    May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: Power-on or device reset occurred
    May 31 08:33:23 farmer01 kernel: end_device-8:0:2: add: handle(0x001d), sas_addr(0x500c04f2dec15121)
    May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: [sds] 35156656128 512-byte logical blocks: (18.0 TB/16.4 TiB)
    May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: [sds] 4096-byte physical blocks
    May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: [sds] Write Protect is off
    May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: [sds] Mode Sense: 7f 00 10 08
    May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: [sds] Write cache: enabled, read cache: enabled, supports DPO and FUA
    May 31 08:33:23 farmer01 kernel: sds: sds1
    May 31 08:33:23 farmer01 kernel: sd 8:0:11:0: [sds] Attached SCSI disk

    diagnostics-20220531-1037.zip

    Edited by enderTown
    Link to comment

    It looks like you have daisy-chained a number of expanders, I am not an expert in expanders (not using them myself), but it seems to go wrong there, see the file "lsscsi.txt" in the diagnostics. Perhaps @JorgeB can give his expert view?

     

    Link to comment
    27 minutes ago, enderTown said:

    But if I just move that same disk to the first JBOD in the chain, it shows up as normal and I can mount it, etc. So it seems to be a problem with SAS addresses/ports maybe?

     

    Possibly, I have some experience with SAS daisy chain but never did it with so many expanders, only 2 or 3, don't known if there's a limit, did you test to find out in which number of the chain does it stop working?

     

    You could also connect two enclosures per HBA, then would just need to daisy chain 5 units per connection, this would also be better for available bandwidth.

    Link to comment
    19 minutes ago, JorgeB said:

    Possibly, I have some experience with SAS daisy chain but never did it with so many expanders, only 2 or 3, don't known if there's a limit, did you test to find out in which number of the chain does it stop working?

     

    You could also connect two enclosures per HBA, then would just need to daisy chain 5 units per connection, this would also be better for available bandwidth.

     

    I don't know exactly where it stops but I will give that a try next. Also unfortunately I don't have two more cables of the proper length but that is a good idea regardless, so I'll order those.

     

    But remember, these drives show up just fine in Windows Server/Ubuntu and even in the Terminal in Unraid, so that's why I think it is something in the Unraid driver that is getting confused?

    Link to comment
    1 minute ago, -Daedalus said:

    It's been a while since I've looked at the MD boxes, but per this guide (https://downloads.dell.com/manuals/common/MD_1200_MD1220_Reference Guide_EN.pdf) the last page indicates a maximum chain of 4 MDs is supported. 

     

    Now, I'm sure more than that will work - 4 is only what Dell will officially support - but it would explain what's happening here.

     

    Oh yes, they definitely will work with more than 4. I actually have been running all 20 daisy chained to a single SAS port first through Windows for several months, then Ubuntu for the last month or so, and now giving Unraid a shot. I actually just added the other 9200 SAS adapter as trial and error for this problem, but now that's its in there I will probably leave it and get a few more cords so I can get 5 MS1200's per port for more bandwidth.

     

    Anyway, this config is definitely not under support from Dell but works in IT mode with normal SAS standards on Win/Linux and HBAs themselves can support over 1000 drives each IIRC?

    Link to comment

    Here is how they show up btw. The first one Dev 1 is connected to first JBOD in chain and sdr is in the last one:

     

    image.png.e32dab79b1f991086eca066eef9022b7.png

     

    The only thing I can do in the UI for sdr is pre-clear (using preclear plugin) so I'm giving that a shot but doubt it will make any difference. If I move the sdr disk to the first JBOD in chain, it shows up as Dev 2 and I can mount/select it for array slot as normal.

    Link to comment

    Something else that just happened that may be related (and is really bad):

    I added another couple 18tb drives in the first array and added them into a pool in BTRFS single mode. Started up the array and it seemed to work fine. Then I added another 18tb ntfs drive - Unassigned devices wanted to "Format" it but I mounted it from terminal and was able to see the files just fine. My goal was to copy these files via terminal to the new BTRFS pool and then format that drive and add it to the same pool.

     

    Once I restarted the array, both my unrelated Cache pool with 8 ssds AND my new pool with these 18tb drives now show as "Invalid pool config" (see screenshot). I tried rebooting and had the same problem. It seems similar to this thread: https://forums.unraid.net/bug-reports/prereleases/690-rc4-pool-unmountable-if-array-is-re-started-after-device-repacement-r1807/

     

    I know this is probably low priority cause I'm an edge use case, but if you don't think these things can be addressed reasonably quickly then please let me know as I'm not feeling as great about Unraid as I did at first unfortunately...

     

    image.thumb.png.fe4b8353db92726cf01be9a7997d9b9b.png

    Link to comment

    Do you have all 240 devices hooked up?  If so, please capture output of this command:

     

    v /dev/dsk/by-id

     

    There is no limitation inside Unraid OS for max number of devices, except for:

    • max 30 devices in unRAID array (future feature will permit more than 1 unRAID array)
    • max 30 devices per pool
    • max 35 pools

    That permits up to 1080 theoretically managed within pools and the array.  Additional devices would be managed as Unassigned Devices.

    Link to comment
    10 hours ago, enderTown said:

    Once I restarted the array, both my unrelated Cache pool with 8 ssds AND my new pool with these 18tb drives now show as "Invalid pool config" (see screenshot). I tried rebooting and had the same problem.

     

    Problem is that one member of the second pool, sdx, the last device, doesn't belong to that pool, and it's generating an error during device scan:

     

    May 31 12:54:31 farmer01 kernel: BTRFS: device fsid 31db1a82-5506-42f0-bffb-44f72cf077c3 devid 2 transid 62 /dev/sdu1 scanned by udevd (1562)
    May 31 12:54:31 farmer01 kernel: BTRFS: device fsid 31db1a82-5506-42f0-bffb-44f72cf077c3 devid 4 transid 62 /dev/sdv1 scanned by udevd (1601)
    May 31 12:54:31 farmer01 kernel: BTRFS: device fsid 31db1a82-5506-42f0-bffb-44f72cf077c3 devid 3 transid 62 /dev/sdw1 scanned by udevd (1507)
    May 31 12:54:31 farmer01 kernel: BTRFS: device fsid 31db1a82-5506-42f0-bffb-44f72cf077c3 devid 1 transid 62 /dev/sds1 scanned by udevd (1519)

     

    Fisrt 4 member are devices 1 to 4 from this pool, 5th member is device 3 from a different one, note the different fsid:

     

    May 31 12:54:31 farmer01 kernel: BTRFS: device fsid 06b8a25f-d7e5-4ab2-9e5b-cc1d3e77bfed devid 3 transid 57 /dev/sdx1 scanned by udevd (1507)

     

    This device is causing an error during device scan and it's what's preventing both pools to mount:

     

    May 31 13:20:47 farmer01 emhttpd: /mnt/cache ERROR: cannot scan /dev/sdx1: Input/output error

     

    Assuming there's no data there you just need to wipe that device

    wipefs -a /dev/sdx1

    and unassign it from the second pool and both pools should now mount, you can then add it back.

     

     

     

    Link to comment
    9 hours ago, JorgeB said:

    This device is causing an error during device scan and it's what's preventing both pools to mount:

     

    Nice sleuthing! I do remember one of the drives was part of another pool in a previous trial and error attempt at pooling. I remember formatting it when I added it to this pool though - I would expect that to "fix it"? Were there additional steps I was supposed to take and how would I have known to do them? This seems like something the software should do itself or warn me about. Unfortunately I wasn't able to try your fix because I just deleted the whole pool and gave up on that idea.

     

    19 hours ago, limetech said:

    Do you have all 240 devices hooked up?  If so, please capture output of this command:

     

    This was my next attempt - let's just get as many drives hooked up as possible, adding them one by one. I disconnected ALL drives and started the tedious task of re-slotting them all one by one, verifying that they showed up, renaming them for easy identification and mounting them (mostly NTFS partitions that I planned on converting to BTRFS later). The first 20 or so went pretty smoothly. Around 30 things were noticeably slower in the UI, specifically loading the unattached devices plugin device list. I watched the log and saw all the udev commands flying by, some eventually timing out, etc. Shouldn't this list be cached instead of loaded every time? By the time I got to 50 disks, the UI was unusable. I'd be waiting 15-20 seconds for the list of disks to load, clicking "Mount" takes forever and flickers back and forth, etc. But I was powering through it until around disk 70 I got a weird error upon renaming a drive - some PHP error about " = not valid" or something. After refreshing (30 seconds), ALL my custom drives names were gone and now had been replaced by Dev 1 - Dev 70. I rebooted, but my custom names didn't come back.

     

    At this point, as you can hopefully understand, I have thrown in the towel for now and am going back to my normal Ubuntu setup. I can still boot into Unraid pretty easily so I'll be happy to test any improvements you make around large disk deployments, but although Unraid may technically support 1000ish drives, it appears to be functionally limited to about 70 on my setup.

     

    Thanks again for the fast and excellent support here and I will be happy to test updates/workarounds/etc but it will have to be somewhat limited as I'll need to work around my system availability. I REALLY like Unraid for everything else and even if it doesn't work for this use case, I'll still be using my license for a home storage server at the very least.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.