external jbod: so, everything was working fine until...


Recommended Posts

quickly searched the forums, didn't find anything resembling my situation, so here goes:

 

some time ago, i bought a direct-attached jbod array (quantum dxi6500, aka: supermicro cse-826, with sas826el1 backplane) off ebay, connecting it to my "backup" server running unraid v6.x via a dell h200e ...  that was because i ran out of available drives bays in my backup server, hence the need to expand externally to keep up with need for space.

 

everything has been running fine as new drives were added to shares as required. (*) anyway, after basically filling the 8th drive in the das chassis, i added a 9th drive... that cannot be seen by unraid. (currently: v6.7.0)

checked the drive (it's ok), looked around in unraid if there was an overlooked setting that could limit the number of data drive at 21 for some reason (didn't see any), checked documentation for jumpers on the backplane (nothing stood out). right now i want to eliminate the "simple, obvious & dumb reason" that prevents the 9th external drive from being recognized by unraid before i look into swapping the backplane in case it is defective. i prefer to make sure nothing is staring me in the face (current gut feeling), because not all problems are "sexy".

and frankly, i would like to avoid unracking the jbod if at all possible.

so has anyone else seen anything like this? anyone else has an idea what could be going on?


thanks in advance,


(*) btw, this storage array totally convinced me that esata is not a good idea for servers and to stick with multilane (sff-8088, etc.) cables for reliability and performance.

Link to comment

@ken-ji:

when the array is stopped, i see both parity drives and the existing 20 data drives.

the 21st is still unassigned and the new disc (a wd20ezrz) does not appear in the drop-down menu.

 

one thing i did not mention previously: at first i thought the bay containing the new drive might be defective, so i shifted the disc to the next bay to see if that was the problem.  surprise: quite a few of the existing drives disappeared, as far as unraid was concerned.  putting the new drive back into its original bay restored the server to a working configuration (no missing drives... except the new one, of course).

 

i will try moving one of the existing drives into slot 9 of the drive array to see what happens.

will keep you posted.

 

p.s.: editing this message to add screenshot of drive list.

 

test_1 (dsq existant bougé dans baie no9).(ed2).jpg

Edited by tmp31416
adding drive list ; corrected word
Link to comment

@ken-ji:

a quick update before going to bed (can't call in sick tomorrow am!): 
(1) removed the new drive from the jbod chassis, and as you recommended, moved one of the existing discs into the 9th bay.  it does show up in unraid.  so the bay itself is not defective.
(2) put back the existing drive into its normal bay, put the new drive back into the 9th bay and checked the HBA's bios.  all 9 drives are listed in the 'sas topology'.
so at the lower levels, all seems to be working.
i'll continue testing tomorrow after work.
cheers.

 

p.s.: edit to add the following screenshot:

 

IMG_20190529_000306.(ed).jpg

Edited by tmp31416
added screenshot
Link to comment

 

quick update:

 

was not able to perform any significant testing this evening, but managed to check the unraid syslog to see if there would be any error message, anything that could yield a clue why unraid is not seeing the new drive that was added to the external jbod array.
whilst the h200e does see the new drive (see the photo in the previous post, it's drive 0 (zero) in the 'sas topology'), there is absolutely no trace of the new 2tb disc in the syslog.

not sure where and how the disc is getting lost. 
the only certainty i have at the moment is that this install of unraid v6.7 appears to have a problem dealing with more than 22 drives (14 internal + 8 external) total.

for reference, the main server is running 24 drives with absolutely no issue (norco 4224 chassis).

 

to be continued.
 

 

Link to comment

@ken-ji: everything's connected, as i wanted to see if i could find anything useful in the syslog.

never thought of [ /tools/diagnostics ], will look into it tonight.

 

also considering booting that box with a live distro (something like 'gparted live') to see what it tells me (dmesg, syslog, etc.).

 

since i already have a copy of the entire syslog (not that it is that big), i'll have another look to see if it contains messages concerning:
(1) the h200e, maybe the card's driver said something useful;
(2) the jbod's backplane -- assuming 'lsilogic sasx28 a.1' as seen in the 'sas toplogy' is what i should look for (that or simply 'enclosure')

 

 

 

Link to comment

another evening where external obligations didn't leave me time to do extensive testing.

 

i did manage to get that 2nd server to boot under gparted live, which didn't see the new drive either (22 drives only).

and to add insult to injury, could not find a way to get any log file off the box. things are going well -- not.

 

rebooted with the usb thumbdrive containing unraid and ... boot media not found.

huh, okay, turn off server, put thumbdrive in other front usb port, turn server back on... boot media not found.

huh, not okay, reset box and go into motherboard's bios... notice it wants to boot the thumbdrive with eufi, so let's try a boot override (non-eufi)...

and the unraid boot menu comes up.  that's much better.

 

once i got into the unraid web gui, i clicked on the drop-down menu besides the new drive slot ... and the new drive shows up?!?!?

for the life of me, i haven't the foggiest how it could be possible and/or what had changed.

 

after assigning the new drive to its slot in the main menu, i did download the file created by /tools/diagnostics but am unsure how good / useful it could be.

am running out of time tonight to elaborate (and edit this) any further, so will attach the file to this message in the hope someone else notices a clue of what happened.

 

i could also upload the previous syslog (from yesterday) if anyone asks.

drive clearing is still in progress as i'm typing this, so things appear to be stable.

 

cheers.

 

 

nasunraid-bis-diagnostics-20190531-0343.zip

Link to comment

after yesterday's unexpected improvements of sorts, i booted that server again tonight to continue the process of adding the new drive (shut it down after it finished clearing the disk, was now going to format it and the rest), only to have the ~"boot media not found" error anew.  after some cajoling, got unraid to boot... only to discover the new drive is again invisible to unraid.

i am now dealing with two problems:

(1) running gparted apparently did something to the motherboard's bios, since i cannot boot unraid like before. 

(2) that missing/invisible drive, as far as unraid is concerned.

 

not sure why & how, but there appears to be an issue with the motherboard's bios, so i might want to look into Tom's warning about bioses ('update your bios!') even though i wasn't affected in any way until that 9th drive in the external jbod chassis.

 

i did run /tools/diagnostics again and am attaching the new file to this message, hoping someone else notices something in there.

will ponder my next step(s) afterwards, to  be continued...

 

 

 

nasunraid-bis-diagnostics-20190601-0150.zip

Link to comment

curiouser and curiouser...

after taking care of the homefront (kitchen, etc.), i sat down to check motherboard & bios information via the dashboard, only to end up seeing the new drive visible again. so within an hour or so, it's as if the drive, somehow, decided to wake up and be recognized by unraid.  

 

i don't believe in problems that sort of fix themselves without any human intervention.

and last i checked, i don't have a "more magic" switch on the chassis.

 

i flipped between screens/tabs just in case it was the browser acting up and displaying random incorrect stuff.  even closed it and restarted it.

nope, chrome is not going non-linear on me and the drive is still visible.

started the array, and the drive formatted ok.

was even able to create a new share and add the drive to it.

 

all this is rather bewildering.

because of the change in state (i.e., things apparently now working), i ran /tools/diagnostics again and attached the file to this message.

 

i will take a step back, try to go over everything i've done to see if i can remember something useful, and go from there.

btw, updating the bios is not an option, i already have the latest & greatest.

 

cheers.

 

nasunraid-bis-diagnostics-20190601-0320.zip

Link to comment

Due to an illness in the family, was not able to beat on this situation as much as i would like.

I did observe something, though.  It means something, unsure what it is exactly.

it does indeed look as if you wait long enough (roughly 1.5 hours), the missing drive does become visible & usable in unraid.

 

not sure if this makes any sense, but one might think there is a standoff of sorts, with two or more devices waiting for each other before initializing.  or that there is a timer (?) that is triggered and prevents the new drive from completing its initialization.

 

I did notice what i assume is the H200's bios taking some time to do its post (the spinner that advances, as dots appear -- sorry this is the best description i can formulate right now), more than before the new drive was added to the external storage array.  or is this a red herring?

Link to comment

Hi, just checking back at your logs and I see this

May 31 21:17:39 NasUnraid-bis ntpd[1693]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
May 31 22:22:51 NasUnraid-bis kernel: scsi 10:0:9:0: Direct-Access     ATA      WDC WD20EZRZ-22Z 0A80 PQ: 0 ANSI: 6
May 31 22:22:51 NasUnraid-bis kernel: scsi 10:0:9:0: SATA: handle(0x0013), sas_addr(0x500304800156ba44), phy(4), device_name(0x00000000332e3020)
May 31 22:22:51 NasUnraid-bis kernel: scsi 10:0:9:0: enclosure logical id (0x50030442523a2033), slot(0) 
May 31 22:22:51 NasUnraid-bis kernel: scsi 10:0:9:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
May 31 22:22:51 NasUnraid-bis kernel: sd 10:0:9:0: Power-on or device reset occurred
May 31 22:22:51 NasUnraid-bis kernel: sd 10:0:9:0: Attached scsi generic sg24 type 0
May 31 22:22:51 NasUnraid-bis kernel: sd 10:0:9:0: [sdx] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
May 31 22:22:51 NasUnraid-bis kernel: sd 10:0:9:0: [sdx] 4096-byte physical blocks
May 31 22:22:51 NasUnraid-bis kernel: sd 10:0:9:0: [sdx] Write Protect is off
May 31 22:22:51 NasUnraid-bis kernel: sd 10:0:9:0: [sdx] Mode Sense: 7f 00 10 08
May 31 22:22:51 NasUnraid-bis kernel: sd 10:0:9:0: [sdx] Write cache: enabled, read cache: enabled, supports DPO and FUA
May 31 22:22:51 NasUnraid-bis kernel: sdx:
May 31 22:22:51 NasUnraid-bis kernel: sd 10:0:9:0: [sdx] Attached SCSI disk

Seems like for some reason, the newly added disk does not power-up until 1h 10m after the enclosure is started up.

 

Didn't see anything in the logs. have you tried this?

While the array is stopped, unplug the drive. wait 5m and stick it back in. Is it detected? I'm guessing something is up with the expander you are using, but I never had such an issue. then again my expander/enclosure is only an 8 port thing and I don't have any other bays chained off it.

 

I see that you have a lot of WD 2T Blue drive, it might a good idea to at least move up to 4TB drives (preferrably 8), but you'll neeed to get 3 to start with as the parity drives need to be upgraded too, as they would give you a better storage density. and you'll be able to need to have less drives overall.

Link to comment

quick update, hastily typed:
having to deal with a family member who is sadly in his sunset days has kept my attention away in the last week or two, so i haven't been able to continue this thread as diligently as i should have.

 

@ken-ji: ah-ha!  thanks for noticing this, it does confirm what i observed! i am not sure what could be causing this delayed event; i will have to re-read the backplane's documentation to find anything that could be causing this. but from my previous reading of it, i don't remember anything to be configurable except too many jumpers that are of the "don't touch this, leave as-is" kind. (that's when you wonder "why put in jumpers i can't use?")

 

i might also want to look into the hba's bios documentation to see if i did not accidentally toggle something on/off that i should not have.  maybe there's a "set hba bios to default values" that i could use to undo some mistake i might have made accidentaly.

 

this being said, am not sure if i could perform your test.  are standard consumer sata drives really able to support hotplug? i would not want to fry the drive. (i prefer to err on the side of paranoia, here)

 

as for moving to higher density drives, i use supermicro CSE-M35T-1B drive cages in my backup server, which, if i'm not mistaken, have issues with 3+tb drives.  that's why i use 2tb drives in that server (my main one has 4 & 3 tb drives).

 

That's it for now, to be continued.

Link to comment
  • 4 months later...

very, very delayed update to my situation.

so i ordered a spare backplane off e-bay some time ago and put it aside due to lack of time, family-related circumstances and also because things inexplicably got somewhat better for a brief period of time. then the situation suddenly took a turn for the worse after : all drives in the jbod chassis decided to go awol, a good motivator to do some surgery. finally was able to do the swap yesterday.

 

this chassis is not easy to work with, i'd say. lots of swearing was involved.

anyway, if the faster server boot-up, the fact that all drives were again visible *and* that i was able to add a new drive means anything, it's that the old backplane was indeed defective. the new one appears to run much better than the old one.


i hope this keeps up running without any issues for the foreseeable future.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.