[SOLVED] System crash/unstable - SASLP-MV8


Recommended Posts

After months of procrastinating setting up my first Unraid home server, I finally got around to it with help from a friend of mine.

 

My current hardware setup is as follows:

CPU: Athlon II X3 440

RAM: 4 GB DD2 (2x2GB)

MB: Asus M4A78L-M LE

Video: Onboard

CPU: Corsair VX550W

HDDs:

2 x 3 TB WD Red (1 to be used for parity)

1 x 2 TB WD Green

1 x 1 TB WD Black

2 x 1 TB Hitachi

2 x 1 TB Seagate

1 x 750 GB Seagate

1 x 300 GB WD Raptor (to be used as cache)

 

The controller is a Supermicro AOC-SASLP-MV8, flashed with the appropriate firmware.

 

I am currently in the process of pre-clearing some of the HDDs, and I constantly come across system crashes/hang-ups - web interface dead, system accessible via telnet.

I have tried two separate identical controllers (SASLP-MV8) and different cables with no success, system crashes with errors that I cannot honestly decipher (limited linux experience).

 

Please see log-file, any help on this issue would be greatly appreciated (the error appears to be identical on every crash occasion). Full system log available at http://pastebin.com/nVeQs4KG

 

I considered the possibility that it might be a hardware related issue (MB/CPU/RAM), however I managed to get through 24 hours of memtest without an issue (3 CPU cores used). Furthermore, using the on-board SATA controller the system appears to be rock solid (32 hour pre-clear on a 3TB drive with no issues).

Link to comment

Not that it will help but I was having similar issues trying to preclear disks on a AM1 board with a Athlon 3850 just because I had it sitting around. Constant lockups. Moved that set up over to a FX4350 with 8GB DDR3 and problems are gone and preclearing 5 disks as I type this. Not sure if it was a CPU issue or running out of ram or exactly what. But going to a much more powerful machine seemed to fix the issue.

Link to comment

After months of procrastinating setting up my first Unraid home server, I finally got around to it with help from a friend of mine.

 

My current hardware setup is as follows:

CPU: Athlon II X3 440

RAM: 4 GB DD2 (2x2GB)

MB: Asus M4A78L-M LE

Video: Onboard

CPU: Corsair VX550W

HDDs:

2 x 3 TB WD Red (1 to be used for parity)

1 x 2 TB WD Green

1 x 1 TB WD Black

2 x 1 TB Hitachi

2 x 1 TB Seagate

1 x 750 GB Seagate

1 x 300 GB WD Raptor (to be used as cache)

 

The controller is a Supermicro AOC-SASLP-MV8, flashed with the appropriate firmware.

 

I am currently in the process of pre-clearing some of the HDDs, and I constantly come across system crashes/hang-ups - web interface dead, system accessible via telnet.

I have tried two separate identical controllers (SASLP-MV8) and different cables with no success, system crashes with errors that I cannot honestly decipher (limited linux experience).

 

Please see log-file, any help on this issue would be greatly appreciated (the error appears to be identical on every crash occasion). Full system log available at http://pastebin.com/nVeQs4KG

 

I considered the possibility that it might be a hardware related issue (MB/CPU/RAM), however I managed to get through 24 hours of memtest without an issue (3 CPU cores used). Furthermore, using the on-board SATA controller the system appears to be rock solid (32 hour pre-clear on a 3TB drive with no issues).

 

What version of unRAID are you using?

 

Ill assume v6.x given the forum you have posted in: http://lime-technology.com/forum/index.php?topic=39257

 

The key part of that post (if you haven't got enough time to read it) is:

 

"Almost always, the most important thing you need to do is capture your complete syslog, BEFORE YOU REBOOT!  We usually need to see what went wrong BEFORE the reboot, because once you reboot, it's lost!

 

[For unRAID v6.0-rc4 or later] If networking is working, browse to the unRAID webGui, go to the Tools tab, click on the Diagnostics icon, then click on the Download button (Collect button if v6.0).  After the diagnostic data collection is complete, it will save a diagnostics zip file to your computer, to the download location you specify or is configured in your browser.  This zipped file is ready to attach to a forum post.  It contains a copy of your syslog with DOS friendly line endings, copies of SMART reports for all drives present, copies of your system and share config files, and a technical file describing your array, including all of the content on the Main screen."

 

if you can, post those diagnostics and I am sure the powers that are the experienced users in the forum will be able to help you diagnose your issue!

Link to comment

Thanks for the advice, I am indeed running 6.1.6, firmware of the controller is .21

 

My main issues is that once the system "crashes" web gui is inaccessible, my only option is telnet into the machine. I did manage to get a copy of the syslog befor reboot (pastebin).

 

Array was not even started, just a preclear on one disk runing. The best I can tell smart data on the disk was ok (viewed before preclear start), preclear on onboard SATA controller completed sucesfully on the same disk.

 

Again the issue seems to be isolated to the controller (saslp-mv8). Managed to pr clear the disk running of the onboard controller with no issues and I currently 10 hours in preclearing 3 disks simultaneously (onboard again).

 

 

Link to comment

Any idea if the problems where isolated to preclears alone? I have read about speed related issues with the controller but nothing similar to my problems and I am somewhat reluctant to use the controller until I have a better understanding of the underlaying issues.

 

I am inclined (mainly due to feedback from linux familiar friends) that the issue is either driver related or some hardware incompatibility (MB and controller).

Link to comment

From your syslog it all starts to go wrong at 20:54:13, when your AOC-SASLP-MV8 loses contact with the disk on its port ata9, which was earlier identified at 19:38:51 as being disk WDC_WD1002FAEX-00Z3A0_WD-WCATRC210436 (/dev/sdd).

 

I've had a similar experience with the same controller and (a different) AMD chipset that I mention briefly here. The things that I tried replacing as I was troubleshooting are listed in order: SAS breakout cable (easy to change), power supply (a bad PSU or power cabling can cause all kinds of difficult to diagnose problems), hard disks (obvious, really!). Despite all being fine from a SMART point of view, I ended up retiring the old disks and replacing them with new, bigger ones. This fixed the problem. I'm able to pre-clear four disks at the same time on the AOC-SASLP-MV8 (I haven't tried any more) but I agree that it's generally safer to use the motherboard ports.

Link to comment

I had problems with those controllers on my system originally (I have 2 of them).  It turned out to be an issue with the way they were seated into the PCIe socket and the way they met the backplane.  They were such that they were not quite square, so periodically (particularly when the system was under heavy load) the controller would suddenly throw an error and the disks drop offline (I assume due to vibration causing momentary contact loss in the PCIe socket).  Eventually I identified this issue and got everything aligned better and since then have had no problems.

Link to comment

Interesting. I notice that the edge connectors of PCI and PCIe cards seldom sit squarely in their sockets. I usually end up adjusting the screws and bending the metal bracket to try to counteract the tendency for it to lift the card out of its socket. It's probably down to the cheap cases I sometimes use!

Link to comment

Well initial impressions can indeed be wrong, while I did not run into my previous problem I did run into a new one...IRQ 16 disabled. After reading up on the issue I am now testing with no monitor attached to the video card (quadro fx 580), hopefully this resolves the issues, just to be on the safe side no keyboard is attached either. Memory has been pushed up to 6gb.

Link to comment

I guess it's safe to label my post at "solved", in summary:

 

1) First system (AMD - Asus M4A78L-M LE/Athlon II X3): The SASLP-MV8 controller did not appear to play nice with my MB, never managed to figure out what exactly was causing the issues.

2) Second system (Intel - Asus P5KC/E7200): SASLP-MV8 controller had "IRQ16 nobody cared" issues whenever a monitor was plugged into the video card, or if already plugged in - turned on. With no monitor attached to the system I have had over 5 days up-time with no stability issues - during which time numerous disks where pre-cleared, data transferred between disks and a parity was created.

 

Current system is as follows:

Unraid 6.1.7 Pro

CPU: Intel E7200

RAM: 6 GB DD2 (2x2GB + 2x1GB)

MB: Asus P5KC

Video: Quadro FX 580

Controller: Supermicro AOC-SASLP-MV8

HDDs:

2 x 3 TB WD Red (1 of which parity)

1 x 2 TB WD Green

1 x 1 TB WD Black

2 x 1 TB Hitachi

2 x 1 TB Seagate

1 x 300 GB WD VelociRaptor (cache)

PSU: Corsair VX550W

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.