Unassigned Devices not Showing up - all slots are working for already assigned devices.


abhi.ko

Recommended Posts

I have a very weird issue with my server now - I cannot see either of my two unassigned HDD's, there is an unassigned SSD - which shows up fine.  I did recently update my hardware, everything except for the disks pretty much was changed. So naturally I though it is a hardware related issue, but since then I have been able to pretty much rule out my SATA port multiplier cards or my NORCO 4224 SAS connectors and other hardware. 

 

I think the problem started after I updated the Unassigned Devices plugin yesterday at some point. I am thinking something changed after that.1874917025_2020-11-2610_28_11-Tower_SysDevs.png.c5a8e0a9bfa5e2e0e63947602892cecb.png

 

What I have tried so far in an effort to narrow down on the problem:

  1. SATA/SAS connectors and Hardware: I switched drive slots, cards and cables on those two and tried swapping with a device that Unraid does find, and those devices do show up without an issue, with a different device letter (sdX), but the unassigned devices, now connected via previously working hardware (SATA ports and cables) are still not found and does not show up. So in effect the assigned devices work on the same drive slots and other hardware without an issue. Unassigned does not seem to work on any. So doesn't look like it is a hardware issue to me, I might be wrong.
  2. I even tried switching the SATA port multiplier card to a different one and that also did not work.
  3. HDD'sOne of the unassigned devices were present and detected before yesterday and was pre-cleared. So I am pretty sure that drive is a working drive. so not a drive issue either. The other one is a brand new disk I just added, will be checking that drive on a different computer soon to see if it is an issue with the drive.
  4. Tried switching to 6.9-beta35 but no luck there either.

996375760_Screenshot10-22withtheDriveShowingasprecleared.thumb.jpg.8678757d4e5c6891415b28a8ffc7d15f.jpg

Has anyone encountered this before. Diagnostics is attached. All help appreciated.

 

tower-diagnostics-20201126-1012.zip

Edited by abhi.ko
Moved picture down
Link to comment

This looks like the only disk not assigned to the array:

devs
(
    [d..0] => Array
        (
            [id] => Samsung_SSD_860_PRO_2TB_S5G7NS0N600130K
            [tag] => 0
            [device] => sdl
            [sectors] => 4000797360
            [sector_size] => 512
        )

)
This is from the vars array that shows what Unraid says is unassigned.

 

You have 15 disks assigned to the array, plus cache, and parity.  One disk is unassigned.  That's 18 disks total.  You have 18 disks showing in the smart.txt of the diagnostics.

 

If you have more disks, there is an issue with your port multiplier?

Link to comment
16 minutes ago, dlandon said:

This looks like the only disk not assigned to the array:

devs
(
    [d..0] => Array
        (
            [id] => Samsung_SSD_860_PRO_2TB_S5G7NS0N600130K
            [tag] => 0
            [device] => sdl
            [sectors] => 4000797360
            [sector_size] => 512
        )

)
This is from the vars array that shows what Unraid says is unassigned.

 

You have 15 disks assigned to the array, plus cache, and parity.  One disk is unassigned.  That's 18 disks total.  You have 18 disks showing in the smart.txt of the diagnostics.

 

If you have more disks, there is an issue with your port multiplier?

Thanks - but that is what I was trying to explain.

 

Let us say the other HDD, the one that is not found, was on sds.

This unassigned drive (SSD) is on sdl and is found.

 

I tried switching the drive slots of the HDD and SSD (or any other drive on the array) and it gets found again and shows up as sds (for e.g.), and the other one which is now on sdl is still invisible. So I am not sure how one drive is not working on the same drive slot but another one is. How could that be a hardware problem?

Edited by abhi.ko
Link to comment
13 minutes ago, abhi.ko said:

Thanks - but that is what I was trying to explain.

 

Let us say the other HDD, the one that is not found, was on sds.

This unassigned drive (SSD) is on sdl and is found.

 

I tried switching the drive slots of the HDD and SSD (or any other drive on the array) and it gets found again and shows up as sds (for e.g.), and the other one which is now on sdl is still invisible. So I am not sure how one drive is not working on the same drive slot but another one is. How could that be a hardware problem?

UD can only find those disks discovered by Linux and not assigned to the array.  Nothing in UD has changed for many months regarding finding unassigned disks.

 

I think it would be best for @JorgeB to jump in here.  He is the expert on disks and controllers.

Link to comment

Marvell controllers AND port multipliers, two things you should avoid, especially together, one of the port multipliers appear to be having extra trouble initializing the disks, you can try disconnecting the port multiplier from the Marvell 9235 and connect it to the other 9215 instead, those usually have less issues, but I would really recommend getting rid of those and using LSI HBAs instead.

Link to comment
10 minutes ago, JorgeB said:

Marvell controllers AND port multipliers, two things you should avoid, especially together, one of the port multipliers appear to be having extra trouble initializing the disks, you can try disconnecting the port multiplier from the Marvell 9235 and connect it to the other 9215 instead, those usually have less issues, but I would really recommend getting rid of those and using LSI HBAs instead.

Thanks again @JorgeB and @dlandon

I have 24 drive slots on the case and 8 SATA ports on the mother board, the Case has Six internal SFF-8087 Mini SAS connectors that support up to twenty-four 3.5″ or 2.5″ SATA (II or III) or SAS hard drives. Currently I am using SAS to 4xSATA reverse breakout cables to connect these drives.

 

That said, any guidance on a better and more stable way to achieve this with an LSI HBA controller is welcome and appreciated, not sure as to which one I should get? 

 

Link to comment
3 minutes ago, JorgeB said:

One LSI with 16 ports is enough (plus the other 8 onboard ports), e.g., 9201-16i or 9300-16i, other option and probably cheaper would be two 9211-8i/9300-8i or similar.

Thanks a ton. Dang these are not cheap!  Hopefully there is something on sale this week or next.

 

I would need 4 SAS to SAS connectors as well I am assuming. 

 

Also is there a preference where parity and cache are connected to between the MB ports and the card?

Link to comment
28 minutes ago, abhi.ko said:

Thanks a ton. Dang these are not cheap!  Hopefully there is something on sale this week or next.

Look for used server pulls on ebay, those are much cheaper, avoid cheap new ones from China, could be fakes.

 

29 minutes ago, abhi.ko said:

I would need 4 SAS to SAS connectors as well I am assuming. 

Yep.

 

29 minutes ago, abhi.ko said:

Also is there a preference where parity and cache are connected to between the MB ports and the card?

Cache best to use onboard ports for trim, assuming SSDs, parity won't matter.

  • Thanks 1
Link to comment
On 11/26/2020 at 12:34 PM, JorgeB said:

One LSI with 16 ports is enough (plus the other 8 onboard ports), e.g., 9201-16i or 9300-16i, other option and probably cheaper would be two 9211-8i/9300-8i or similar.

Hi @JorgeB - I ended up getting the 9300-16i. Had some questions before connecting and turning the system on.

  • Does this card need a power connection from the PSU? 
  • Do I need to flash the FW or BIOS to put it in IT mode, or would it work out of the box?
  • Anything else I need to keep in mind before installing and booting up Unraid?

I plugged it into a free PCIe slot on the MB and connected all the SAS cables to the disks, but have not switched anything on yet, a little worried I might make a stupid mistake and fry something, because I know nothing about HBA cards. 

 

Thanks in advance.

 

PS: Is this discontinued? Asking because I cannot find the 9300-16i on the Broadcom site, I do see a 9300-8i and a 9305-16i, was trying to find any drivers or FW upgrades needed, just in case.

Edited by abhi.ko
Link to comment
6 hours ago, abhi.ko said:

Does this card need a power connection from the PSU? 

You should use for stability, I remember at least one user had issues without that cable connected.

 

6 hours ago, abhi.ko said:

Do I need to flash the FW or BIOS to put it in IT mode, or would it work out of the box?

 

6 hours ago, abhi.ko said:

Anything else I need to keep in mind before installing and booting up Unraid?

That one is IT mode only, so should be just plug and play.

Link to comment
2 hours ago, JorgeB said:

You should use for stability, I remember at least one user had issues without that cable connected.

That one is IT mode only, so should be just plug and play.

Thank you. So I plugged everything in and started the system.

  • Unfortunately, I do not see the unassigned disks still, all the assigned disks still do get recognized. So the HBA card did not solve the issue, those 2 disks are connected to the HBA card.
  • Also bigger problem is that I have a ton of missing disks now. I guess the way the card identifies the disk and how it was identified earlier with the SATA multiplier cards are different, please see the screenshot below. Unraid says they are wrong disks but they are the same.

907086630_MissingDisks.thumb.png.5a9d55ae3d3d3167b7c04bb56efeb126.png

unassigned.thumb.png.9a99ec7adbf4527c2a79ec0669eab1ef.png

How do I fix this without loosing any data please?

 

Thanks.

Edited by abhi.ko
Link to comment
14 hours ago, abhi.ko said:

PS: Is this discontinued? Asking because I cannot find the 9300-16i on the Broadcom site, I do see a 9300-8i and a 9305-16i, was trying to find any drivers or FW upgrades needed, just in case.

I will read through and try updating the FW, biggest problem I see is that there is no 9300-16i product on the Broadcom site. So should I use the 9305-16i as the product. Do you know? Found it, not sure why I couldn't yesterday while filtering for the HBA.

 

Also I tried to boot into the LSI Bios yesterday but it did not come up when I tried to start the system without unraid usb plugged in, tried charging the boot order and looking through the Bios management page but had no luck getting into it. 

Edit: For getting into the LSI BIOS should I be hitting Ctrl+C as soon as the system posts and start booting up? Obviously the HBA is installed and found by the system.

Edited by abhi.ko
Link to comment

Appreciate the guidance @JorgeB, so following your instructions I did a -listall and was surprised to see 2 devices come up, even though I only have one HBA card. But I think that it is because of the 2 chipsets on the card. Quote from some product documentation below, so I will have flash the FW twice with -c 0 and -c 1 , right?

Quote

The LSI SAS 9300-16i HBA uses two LSI SAS 3008 controller chips and one PCI Express switch to provide high-bandwidth interconnection ability.

root@Tower:/lsi# sas3flash -listall
Avago Technologies SAS3 Flash Utility
Version 17.00.00.00 (2018.04.02) 
Copyright 2008-2018 Avago Technologies. All rights reserved.

        Adapter Selected is a Avago SAS: SAS3008(C0)

Num   Ctlr            FW Ver        NVDATA        x86-BIOS         PCI Addr
----------------------------------------------------------------------------

0  SAS3008(C0)  05.00.00.00    05.00.00.01    08.11.00.00     00:19:00:00
1  SAS3008(C0)  05.00.00.00    05.00.00.01    08.11.00.00     00:1b:00:00

        Finished Processing Commands Successfully.
        Exiting SAS3Flash.
root@Tower:/lsi# sas3flash -c 0 -list
Avago Technologies SAS3 Flash Utility
Version 17.00.00.00 (2018.04.02) 
Copyright 2008-2018 Avago Technologies. All rights reserved.

        Adapter Selected is a Avago SAS: SAS3008(C0)

        Controller Number              : 0
        Controller                     : SAS3008(C0)
        PCI Address                    : 00:19:00:00
        SAS Address                    : 500062b-2-0168-0580
        NVDATA Version (Default)       : 05.00.00.01
        NVDATA Version (Persistent)    : 05.00.00.01
        Firmware Product ID            : 0x2221 (IT)
        Firmware Version               : 05.00.00.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9300-16i
        BIOS Version                   : 08.11.00.00
        UEFI BSD Version               : 06.00.00.00
        FCODE Version                  : N/A
        Board Name                     : SAS9300-16i
        Board Assembly                 : 03-25600-01B
        Board Tracer Number            : SP63315150

        Finished Processing Commands Successfully.
        Exiting SAS3Flash.
root@Tower:/lsi# sas3flash -c 1 -list
Avago Technologies SAS3 Flash Utility
Version 17.00.00.00 (2018.04.02) 
Copyright 2008-2018 Avago Technologies. All rights reserved.

        Adapter Selected is a Avago SAS: SAS3008(C0)

        Controller Number              : 1
        Controller                     : SAS3008(C0)
        PCI Address                    : 00:1b:00:00
        SAS Address                    : 500062b-2-0168-8c80
        NVDATA Version (Default)       : 05.00.00.01
        NVDATA Version (Persistent)    : 05.00.00.01
        Firmware Product ID            : 0x2221 (IT)
        Firmware Version               : 05.00.00.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9300-16i
        BIOS Version                   : 08.11.00.00
        UEFI BSD Version               : 06.00.00.00
        FCODE Version                  : N/A
        Board Name                     : SAS9300-16i
        Board Assembly                 : 03-25600-01B
        Board Tracer Number            : SP63315150

        Finished Processing Commands Successfully.
        Exiting SAS3Flash.

 Also the current FW version is v5, that doesn't sound right because the current version seems to be v16 (readme file text below) - is this just a really an old product that I got from ebay, it was in a sealed box? 

 

I will wait for your response to flash this, just being gun shy. Hopefully once this is done the other 2 unassigned disks will show up, otherwise I will have to troubleshoot that after this.

***********************************************************************************************************************
Package for SAS3 Phase 16 Firmware BIOS Upgrade on MSDOS & Windows
************************************************************************************************************************
LSI Host Bus Adapter(HBA) - LSI SAS9300_16i

Package Contents- 

Readme first note      :  README_9300_16i_Package_P16_IT_FW_BIOS_for_MSDOS_Windows.txt 

Component                   : Path                                                         Version              Release Date        
=============================================================================================================================
Firmware                    : \firmware\SAS9300_16i_IT\SAS9300_16i_IT.bin                  16.00.10.00          01-Aug-19          
BIOS                        : \sasbios_rel\mptsas3.rom                                     8.37.00.00           05-Apr-18   

 

Edited by abhi.ko
Link to comment
44 minutes ago, abhi.ko said:

so I will have flash the FW twice with -c 0 and -c 1 , right?

Not sure if they share the same BIOS, flash the first one and check.

 

45 minutes ago, abhi.ko said:

Also the current FW version is v5, that doesn't sound right because the current version seems to be v16

Yes, v5 is very old, hence the device ID issues.

  • Like 1
Link to comment

Thanks @JorgeB again. So the HBA card is flashed and I had to do it twice for both chipsets with -c 0 and -c 1,  for FW and BIOS, now everything looks good.

root@Tower:/lsi# sas3flash -listall
Avago Technologies SAS3 Flash Utility
Version 17.00.00.00 (2018.04.02) 
Copyright 2008-2018 Avago Technologies. All rights reserved.

        Adapter Selected is a Avago SAS: SAS3008(C0)

Num   Ctlr            FW Ver        NVDATA        x86-BIOS         PCI Addr
----------------------------------------------------------------------------

0  SAS3008(C0)  16.00.10.00    0e.01.00.03    08.37.00.00     00:19:00:00
1  SAS3008(C0)  16.00.10.00    0e.01.00.03    08.37.00.00     00:1b:00:00

        Finished Processing Commands Successfully.
        Exiting SAS3Flash.

Update: So the array is now back to normal, did not even have to reboot. The disks all got assigned properly immediately after the HBA was flashed. The only problem is that the 2TB SSD that was showing up prior as unassigned is no longer showing up, neither are the other 2 hard drives.

210368651_2020-12-0114_13_29-Tower_Main.thumb.jpg.114fe1c23f8e0cdc785c8cb0ef46522d.jpg

Should I just reboot the system and hit Ctrl+C as soon as it posts to bring up the LSI BIOS?

Edited by abhi.ko
Link to comment
7 hours ago, abhi.ko said:

Should I just reboot the system and hit Ctrl+C as soon as it posts to bring up the LSI BIOS?

So this did not work and I am not able to get in to LSI BIOS. Soon after POST the MB logo splash screen appears on my MB with the "Press F2/Del to Enter BIOS", Don't see a way to disable the boot logo splash screen in the BIOS either. Will have to dig through it.  Found this and disabled it, tried hitting Ctrl+C while booting up but still no luck on getting to the LSI BIOS. Any ideas?

Edited by abhi.ko
Link to comment

Check the motherboard BIOS for "option ROM" settings or similar and make sure that they are enable, also CSM/legacy boot also needs to be enable for the LSI BIOS to work, if you can't get it to work connect the missing devices to the onboard SATA ports instead and confirm they are being detected by the BIOS.

Link to comment
On 12/2/2020 at 1:55 AM, JorgeB said:

Check the motherboard BIOS for "option ROM" settings or similar and make sure that they are enable, also CSM/legacy boot also needs to be enable for the LSI BIOS to work, if you can't get it to work connect the missing devices to the onboard SATA ports instead and confirm they are being detected by the BIOS.

@JorgeB I missed this response till now, will try this. But have a different problem I am trying to solve for. Something is definitely up with my setup, not sure if the hard drive power connectors are to blame, or if something more problematic is at play.

 

Priority #1: I have an xfs disk in error now that happened when I was trying to insert another unassigned HDD on the hot swap bay to test while the array was running. Have done it many times before, this is the reason why I bought a case with hot swap-able bays to begin with, not sure what went wrong this time. Any how I have the array running with this disk showing as unmountable. How do I recover this and clear this?

 

Priortiy #2: Will come back to the other issue of drives not being detected soon, but I took those drives out (a WD Gold 8TB and a ST Ironwolf 8TB) and tested them on a windows system (through an external SATA hot swap bay attachment from SABRENT) and the ST did get recognized, but the WD did not, so I am RMA-ing that one (that is the one that was already precleared in Unraid before all this started). So I plugged in the external SATA device with the ST drive on to Unraid by USB and the drive did get detected, which leads me to believe that the issue is with the NORCO drive bays/SAS Backplanes/power connectors etc? The unassigned SSD is part of the same SAS Backplane (HDD row) and it works fine. So I'm not sure what the issue is.

 

The one I was trying to plug in today was different 5TB drive, which was being detected by windows as well but not detected now, by unraid . Anyhow that was just to test this hypothesis.

 

Priority is to recover the disk in error. All help appreciated please.

tower-diagnostics-20201204-1216.zip

Edited by abhi.ko
Link to comment

Besides disk15 there's also these:

Dec  4 10:12:54 Tower kernel: md: disk9 read error, sector=9000
Dec  4 10:12:54 Tower kernel: md: disk13 read error, sector=9000
Dec  4 10:12:54 Tower kernel: md: disk9 read error, sector=9008
Dec  4 10:12:54 Tower kernel: md: disk13 read error, sector=9008
Dec  4 10:12:54 Tower kernel: md: disk9 read error, sector=9016
Dec  4 10:12:54 Tower kernel: md: disk13 read error, sector=9016

 

So some hardware issue there, as for disk 15 reboot then check filesystem, if the emulated disk mounts you can rebuild on top, but ideally after fixing whatever is causing those issue, could be power, cables, etc.

Link to comment

So I did an xfs repair on disk 15 and had to use the -L option because the drive was un-mountable. It was completed and now the disk is mounted but still shows as disabled and there is a bunch of data in Lost+Found - I'm unclear on the next steps. No SMART errors on the disk.

 

What are the steps to get the drive to function normally and move the files from Lost+Found to the right folders/shares? 

tower-diagnostics-20201205-0902.zip

 

Edit - Can I not worry about the Lost+Found directory as mentioned here and just reconstruct the disk from parity? Or is the parity for this disk messed up (somehow) as well because of the fs corruption? Disk 15 was emulated when the array was started last, I have switched the server off for now.

Edited by abhi.ko
Clarifying
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.