3 M1015's in ESXi not showing all drives in unRAID


Recommended Posts

Thanks for loaning me your X9SCM-F Bob!  I only had a few moments to install it and play with it, but so far it acts exactly the same as mine.  With one M1015 and the Intel SAS expander, I get dog slow parity rebuild (~4MB/sec).  Tonight I'll try throwing in my three M1015's without the SAS expander and see what happens.  If it still acts the same as my board then I don't what to do.

If it is acting the same as your's it sounds like one of two possible problems to me.

[*]You have some bad M1015s and/or SAS expander - not very likely I expect but a possiblity - even though they do work.  I'm sure you have tried all of your M1015s connected to the SAS expander? Or will at least before you go to 3 M1015s at the same time anyway.

[*]Configuration problem.  I have unmenu installed on my unRAID 5.0rc4 but very few other plugins installed.  I have Powerdown, Monthly parity check and maybe a couple of others that I don't remember.  I do NOT have email setup since I don't check my email more than weekly so email notifications are of limited value to me.

Let me know if you have to disable the OPROMS when you test with 3 M1015s.  If you don't need to disable then there is at least one difference between the boards - but I'm not expecting a difference.  Have you got anything else pluged in besides the M1015(s)?  Can you setup a test unRAID server that is completely new with a couple of spare drives and see if you still get slow speeds?

Link to comment
  • Replies 139
  • Created
  • Last Reply

Top Posters In This Topic

I had limited time to spend on it again tonight, but I did try all three M1015's with the SAS expander and I even tried the cable from the M1015 in different ports on the SAS expander and each and every time all the drives were recognized, but I had parity speeds around 3-4MB/sec.  Tomorrow and Sunday I'll spend a lot more time on it trying all 3 M1015's and working with a fresh unRAID flash.  I have a few spare disks so I'll also try working with a test array on the SAS expander.

 

The really confusing part is that the 3 M1015's work just fine in bare metal unRAID.  Conversely, the parity sync speed problems occur in both bare metal unRAID and when running unRAID as a virtual machine.  I was able to let memtest run for 20 hours and it didn't report any errors, so that should be good.  The only other things I can think of that are different from other posted systems that run fine in ESXi are my PSU and my CPU.  I wouldn't think the power supply problems would manifest in this manner, but what about the CPU?  I think everybody else, including you Bob, were/are running Sandy Bridge Xeons and not my Ivy Bridge V2 processor.  We'll see I guess.

 

Here is the system I'm, trying to get running:

 

CPU: Intel Xeon E31230 V@

MB: Supermicro X9SCM-F-O

RAM: 2x Super Talent DDR3-1333 8GB ECC Micron Chip (W1333EB8GM)

PSU: COOLER MASTER Silent Pro Gold Series RS800-80GAD3-US 800W

HBA: 3x IBM Serveraid M1015 crossflashed to LSI 9211-8i IT mode

SAS EXPANDER: Intel RES2SV240NC Raid Expander

CASE: Norco 4224

 

Link to comment

...these symptoms are very confusing indeed.

Just a side note / some thoughts:

 

- what is the setup of your VM (32 vs 64bit comes into mind) and your ESXi

- what version of ESXi are you using? ESXi 5.1 should be "aware" of Ivy-Bridge hardware while ESXi 5.0x only will see it as Sandy-Bridge

- did you try other OSes besides unRAID?

 

The first time I created the unRAID VM, I followed the Atlas build instructions and chose 32bit FreeBSD as the base.  When I redid it I think I set it for 64bit Linux Other (or something like that).  What do you mean what is my setup of ESXi.  There really didn't seem to be much to configure that I remember.

 

I am running ESXi 5.1 and it did recognize my PCI-E slots as Ive Bridge.  5.0x saw them as "Generic".

 

I haven't tried any other OSes because I didn't think they would be much help to me.  I assume Windows wouldn't recognize the unRAID drives, and I'm not real familiar with Linux.  Will Ubuntu see the reiserfs drives?  I'll check it out after I try Bob's suggestions.

Link to comment

I've been trying some things and here's is what I get:

 

When running all three M1015's on the board Bob loaned me I get the exact same behavior as I do with my board: I get the MPT BIOS error if I leave OPROM enabled and I get the handshake/doorbell fault if I run unRAID in a VM.  Bare metal unRAID works fine.

 

I made a fresh unRAID 5b12a thumb drive and setup a test system with 3 spare drives running off the SAS epander.  Once I assigned drives it started a parity sync and I was seeing speeds ~75MB/sec, which is normal.  Why does it slow down so much with all the drives attached?  I would like to add one backplane at a time to see when and how the speed decreases, but is it safe to be playing around like that with disks that have data on them?

Link to comment

I've been trying some things and here's is what I get:

 

When running all three M1015's on the board Bob loaned me I get the exact same behavior as I do with my board: I get the MPT BIOS error if I leave OPROM enabled and I get the handshake/doorbell fault if I run unRAID in a VM.  Bare metal unRAID works fine.

This sounds like a M1015 problem.  One thing you could try here is rotating the cards around.  I.E. take the one closest to the CPU and move it to be the farthest and move the others closer to the CPU.  Then you will switch which cards bios is used when booting.  That is if it works like the SASLP cards anyway.  I had a mix of 15 and 21 roms and the display showed either 15 or 21 based on the order of the cards in the slots.  Don't know if the M1015s will display the same thing or not but is an option to try to see if you can eliminate the need to turn off the OPROMS.

 

I made a fresh unRAID 5b12a thumb drive and setup a test system with 3 spare drives running off the SAS epander.  Once I assigned drives it started a parity sync and I was seeing speeds ~75MB/sec, which is normal.  Why does it slow down so much with all the drives attached?  I would like to add one backplane at a time to see when and how the speed decreases, but is it safe to be playing around like that with disks that have data on them?

Safe is relative.  Anything could happen and without backups you could loose data.  However what you are proposing "shouldn't" have any bad effects. 

 

It will slow down with a SAS expander and a single lane to the M1015 (like I have mine attached) but not the drastic change you reported.  Some things that come to mind to me.  Bad drive - a drive might be going bad and is manifesting itself as poor performance.  You may have a bad backplane or SAS cable as well.  My original unRAID server performance as been halved ~30-40MB/s parity checks but I know exactly what caused that.  I fried 12 of my 20 drives when a fan died in my Lian Li V2000 case and all 12 drives were at 60c when I caught it at 97% complete.  I have no idea how long they were that way but I suspect it was for many hours.  The drives still work after replacing the fan but the performance is bad and I am slowly (as finances permit) replacing the drives.  When I get them (may only be one that is bad just don't know) I expect my performance to go back up.  My other unRAID server has 15 drives (only 3 less than currently in the other one) also on a SAS expander and is getting ~60-70MB/s parity checks.

Link to comment

This sounds like a M1015 problem.  One thing you could try here is rotating the cards around.  I.E. take the one closest to the CPU and move it to be the farthest and move the others closer to the CPU.  Then you will switch which cards bios is used when booting.  That is if it works like the SASLP cards anyway.  I had a mix of 15 and 21 roms and the display showed either 15 or 21 based on the order of the cards in the slots.  Don't know if the M1015s will display the same thing or not but is an option to try to see if you can eliminate the need to turn off the OPROMS.

 

Previously on my board I moved the three cards all around.  If you look at earlier posts I mention a bunch of various configurations.  I'm not really worried about the BIOS error (maybe I should be?).  I agree that it would seem to indicate something is amiss, but bare metal unRAID runs fine with that error so I wouldn't think it is keeping the unRAID VM from working. 

 

Safe is relative.  Anything could happen and without backups you could loose data.  However what you are proposing "shouldn't" have any bad effects. 

 

It will slow down with a SAS expander and a single lane to the M1015 (like I have mine attached) but not the drastic change you reported.  Some things that come to mind to me.  Bad drive - a drive might be going bad and is manifesting itself as poor performance.  You may have a bad backplane or SAS cable as well.  My original unRAID server performance as been halved ~30-40MB/s parity checks but I know exactly what caused that.  I fried 12 of my 20 drives when a fan died in my Lian Li V2000 case and all 12 drives were at 60c when I caught it at 97% complete.  I have no idea how long they were that way but I suspect it was for many hours.  The drives still work after replacing the fan but the performance is bad and I am slowly (as finances permit) replacing the drives.  When I get them (may only be one that is bad just don't know) I expect my performance to go back up.  My other unRAID server has 15 drives (only 3 less than currently in the other one) also on a SAS expander and is getting ~60-70MB/s parity checks.

 

Unless I'm missing something, I don't think it could be a bad drive, cable or backplane because it all works fine in bare metal unRAID.  Please don't take me as arguing with you, because I don't mean to at all.  It's just very perplexing that the problems only exist in the unRAID VM.  If I wasn't trying to do virtualization this thread wouldn't even exist because bare metal unRAID would have just worked. 

 

The only thing that is a problem in unRAID outside the VM is the performance with the SAS expander.  That problem is the only on that is consistent in both scenarios.

Link to comment

This sounds like a M1015 problem.  One thing you could try here is rotating the cards around.  I.E. take the one closest to the CPU and move it to be the farthest and move the others closer to the CPU.  Then you will switch which cards bios is used when booting.  That is if it works like the SASLP cards anyway.  I had a mix of 15 and 21 roms and the display showed either 15 or 21 based on the order of the cards in the slots.  Don't know if the M1015s will display the same thing or not but is an option to try to see if you can eliminate the need to turn off the OPROMS.

 

Previously on my board I moved the three cards all around.  If you look at earlier posts I mention a bunch of various configurations.  I'm not really worried about the BIOS error (maybe I should be?).  I agree that it would seem to indicate something is amiss, but bare metal unRAID runs fine with that error so I wouldn't think it is keeping the unRAID VM from working.

I figured you had was just throwing it out there if you hadn't.  I saw you tried different configurations but didn't know if you had tried rotating the drives as opposed to just different numbers installed. 

 

Safe is relative.  Anything could happen and without backups you could loose data.  However what you are proposing "shouldn't" have any bad effects. 

 

It will slow down with a SAS expander and a single lane to the M1015 (like I have mine attached) but not the drastic change you reported.  Some things that come to mind to me.  Bad drive - a drive might be going bad and is manifesting itself as poor performance.  You may have a bad backplane or SAS cable as well.  My original unRAID server performance as been halved ~30-40MB/s parity checks but I know exactly what caused that.  I fried 12 of my 20 drives when a fan died in my Lian Li V2000 case and all 12 drives were at 60c when I caught it at 97% complete.  I have no idea how long they were that way but I suspect it was for many hours.  The drives still work after replacing the fan but the performance is bad and I am slowly (as finances permit) replacing the drives.  When I get them (may only be one that is bad just don't know) I expect my performance to go back up.  My other unRAID server has 15 drives (only 3 less than currently in the other one) also on a SAS expander and is getting ~60-70MB/s parity checks.

 

Unless I'm missing something, I don't think it could be a bad drive, cable or backplane because it all works fine in bare metal unRAID.  Please don't take me as arguing with you, because I don't mean to at all.  It's just very perplexing that the problems only exist in the unRAID VM.  If I wasn't trying to do virtualization this thread wouldn't even exist because it would have just worked. 

 

The only thing that is a problem in unRAID outside the VM is the performance with the SAS expander.  That problem is the only on that is consistent in both scenarios.

I thought so too but I noticed a problem when using a 5in3 cage verses straight connection so tossed that out there as well.  Your problems with a VM unRAID really sound now like a configuration problem and/or possibly ESXi 5.0 problem with IvyBridge CPUs.  I have my unRAID VM setup as Other Linux 2.6.x 32bit and if you haven't tried that I suggest it.  I'll get my settings from my unRAID VM shortly (still building my other one back up).
Link to comment

...these symptoms are very confusing indeed.

Just a side note / some thoughts:

 

- what is the setup of your VM (32 vs 64bit comes into mind) and your ESXi

- what version of ESXi are you using? ESXi 5.1 should be "aware" of Ivy-Bridge hardware while ESXi 5.0x only will see it as Sandy-Bridge

- did you try other OSes besides unRAID?

 

The first time I created the unRAID VM, I followed the Atlas build instructions and chose 32bit FreeBSD as the base.  When I redid it I think I set it for 64bit Linux Other (or something like that).  What do you mean what is my setup of ESXi.  There really didn't seem to be much to configure that I remember.

 

I am running ESXi 5.1 and it did recognize my PCI-E slots as Ive Bridge.  5.0x saw them as "Generic".

 

I haven't tried any other OSes because I didn't think they would be much help to me.  I assume Windows wouldn't recognize the unRAID drives, and I'm not real familiar with Linux.  Will Ubuntu see the reiserfs drives?  I'll check it out after I try Bob's suggestions.

 

...set aside your problems with the expander, all trouble seems to be related to the virtualized set-up.

So my thoughts were around parameters that could be changed within ESXi and for the VMs.

There is indeed a lot that can be changed but if you haven't looked into it so far, the standards should be OK (...I did tweak power savings by enabling advanced C-(idle)states for my CPU which are not enabled by default in ESXi)

 

Link to comment

Thanks a lot, I'll check those out and see if anything helps.

 

I also just tried installing Ubuntu as a VM with the three M1015's in  pass through to it and I can't get it to work.  With any of teh M1015's installed (even without any drives attached) I get a bunch of errors at bootup and get thrown to a prompt rather than the GUI.  If I remove all the cards it will boot into the GUI just fine.  I don't know much about Linux so maybe that was to be expected.  Should it have worked?

 

I have to go to dinner with some family so I don't have time to catch the log from it.  I just wanted to do a quick test.  Will try more later.  ::)

Link to comment

...Ubuntu is not very advanced in this respect.

Try Fedora or CentOS....also OpenIndiana.....all these distros have Live-CDs that should pick up the controllers fine.

The first two should also see the ReiserFS and be able to mount unRAID data disks.

You could test with bare metal and from inside a VM.

 

I never had problems with Linux, Solaris or Windoze with my M1015s on passthrough (never went up to more than 2 at a time though).

With other cards I had strange effects like they worked just fine in Win but never booted a Linux when attached although bare metal worked fine.

Link to comment

There is indeed a lot that can be changed but if you haven't looked into it so far, the standards should be OK (...I did tweak power savings by enabling advanced C-(idle)states for my CPU which are not enabled by default in ESXi)

 

I thought you were asking what my ESXi installation configuration was.  Indeed there are many settings when creating a VM, but other than alot RAM and enable passthrough for the HBA's, I left everything else at default.

Link to comment

...Ubuntu is not very advanced in this respect.

Try Fedora or CentOS....also OpenIndiana.....all these distros have Live-CDs that should pick up the controllers fine.

The first two should also see the ReiserFS and be able to mount unRAID data disks.

You could test with bare metal and from inside a VM.

 

I never had problems with Linux, Solaris or Windoze with my M1015s on passthrough (never went up to more than 2 at a time though).

With other cards I had strange effects like they worked just fine in Win but never booted a Linux when attached although bare metal worked fine.

 

I tried CentOS Live bare metal and it recognized all my drives, but said it couldn't mount them because they are reiserfs.  I don't think that matters much since they were at least recognized.  Ran the CentOS Live CD as a VM and it wouldn't load just sat forever at the splash screen with the rotating circle.  If I remove the HBA's from the VM it boots up fine. 

 

So it really seems like something in ESXi is keeping things from working.

Link to comment

I tried CentOS Live bare metal and it recognized all my drives, but said it couldn't mount them because they are reiserfs.  I don't think that matters much since they were at least recognized.  Ran the CentOS Live CD as a VM and it wouldn't load just sat forever at the splash screen with the rotating circle.  If I remove the HBA's from the VM it boots up fine. 

 

So it really seems like something in ESXi is keeping things from working.

 

Yes, that's what is looks like.

The behavior with CentOS and others in a VM, I only had with the M1015 still on MEGARAID firmware...I know it sounds silly but you are dead sure that all cards are flashed to IT mode, aren't you  ::)

Link to comment

I'm pretty darned sure.  I never booted one up before I crossflashed them so I don't know what info it would show if it weren't in IT mode.  I know I followed madburg's guide to cross flashing them and everything went as he said it should.

 

I just tried another test that just furthers the weirdness of all this.  I have 16 data drives plus parity so right now I am driving a single drive with my third M1015.  As a test, I removed the third card and left the other two in teh 8x slots.  The unRAID Vm boots up without the handshake/doorbell error and the GUI shows all drives present except for the one that is no longer connected.  That is exactly as it should be.  I shut everything down and move the one card from the second 8x PCI-E slot and install it in the first 4x PCI-E slot.  I try the unRAID VM again and now I DO get the handshake/doorbell error and the GUI shows a bunch of drives missing.

 

I tried this same thing a week or so ago on my board and got the same results.  I tried even more configurations and, if i remember correctly, I would get that error every time I had one of the cards in a 4x slot unless it was the only card installed.  Does that make any sense?

 

EDIT:  I just tried putting both cards in the 4x slots and I get the handshake/doorbell error and all the drives connected to one of the cards shows as missing in unRAID.

 

EDIT 2: I just removed one of the cards leaving only a single card in the machine and I left it inserted in a 4x slot.  unRAID VM boots up without the error and all the drives connected to the card show up in unRAID.  So it would appear that indeed ESXi does not like having one of the M1015 cards in a 4x slot unless it is the only card in the system.  ???

Link to comment

Page 3

 

I went through and set all my settings exactly like yours.  Most were already the same except except for RAM amount and it looks like you are not using PLOP to boot unRAID in the VM while I am.  I didn't change that yet as I'll have to go through a tutorial first to figure out how.  I know John has one in the Atlas thread, but I haven't had time to thoroughly look through it.  How are you booting unRAID?  I'm assuming that PLOP is not my problem, but who knows.  At this point, I wouldn't be surprised by anything.  Thanks for taking the time to post all the screenshots of your configuration.

Link to comment

I also tried on final attempt with the Intel SAS expander.  I tried it in my old Asrock Intel LGA 440 board that only has one PCI-E slot (16x).  It was running one of my two servers before this new build/disaster.  I connected all my drives and powered up bare metal unRAID and it acted the same way as it did in the Supermicro mobo.  All drive showed in unRAID, but parity checks were ~2MB/sec.  Would it be safe to assume that it must be defective?  Can it be defective in such a way that everything seems to work fine but speeds are slow?

Link to comment

I also tried on final attempt with the Intel SAS expander.  I tried it in my old Asrock Intel LGA 440 board that only has one PCI-E slot (16x).  It was running one of my two servers before this new build/disaster.  I connected all my drives and powered up bare metal unRAID and it acted the same way as it did in the Supermicro mobo.  All drive showed in unRAID, but parity checks were ~2MB/sec.  Would it be safe to assume that it must be defective?  Can it be defective in such a way that everything seems to work fine but speeds are slow?

That sounds reasonable to me.  I bought my RES2SV240s from newegg.  Newegg lists that as the model #.  No NC in the model # like yours.  Doubt that makes a difference but is possible I guess if Intel really has two models of it.  I see Beta answered your other question.  But here is what I did.  I setup a 2GB virtual drive and connected it to a Windows VM.  I formatted it to Fat32.  Then created a boot drive just like it was a USB flash drive.  I did have to add "f" to the command that "make_bootable.bat" file uses to format the disk.  Then I disconnected it renamed it to BOOT instead of UNRAID and had it boot that.
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.