Jump to content

[SOLVED] Crash, at my wits end


Andrew307

Recommended Posts

Hi all,

 

Me again.  I have no idea what to do here.  Restarted my system and everything seems to be shutdown.  The only share I can access is the USB (Flash).  I cannot connect/see the GUI, just loads and loads. 

 

With a monitor/keyboard plugged in it seems to boot normally.  I can get to root@tower:.

 

I don't know enough about the system to attempt anything crazy.  I did go back to my last backup of my USB, but that didn't help at all.  Log from a telnet attached.

 

Please help!

log.txt

Link to comment

Agree on the smart report for disk 4, but its also seems from the frequency of the errors the drive was throwing out that it's not powering up / initializing correctly.  I would reseat the cables to the drives.

 

Also you've got a stack dump in there.  You should maybe restart the server in safe mode to stop the plugins.

Link to comment

Kk.  Dumb questions, ready go! 

 

1.  How do I run a smart report on just one disk?  I ran "smartctl -a -d ata /dev/sca >/boot/smart.txt"  I attached the result, but I have a feeling this wasnt what were looking for.

 

2.  How do I start the server in safemode with no access to the GUI.  Searched and searched, cant find a command...

 

Thanks again!!!

smart.txt

Link to comment

Kk.  Dumb questions, ready go! 

 

1.  How do I run a smart report on just one disk?  I ran "smartctl -a -d ata /dev/sca >/boot/smart.txt"  I attached the result, but I have a feeling this wasnt what were looking for.

 

2.  How do I start the server in safemode with no access to the GUI.  Searched and searched, cant find a command...

 

Thanks again!!!

since you can telnet in, type either
powerdown -r

or

shutdown -r

 

Attach a monitor & keyboard to the system, and when it boots up you'll see a menu.  Select safe mode

Link to comment

Kk.  Dumb questions, ready go! 

 

1.  How do I run a smart report on just one disk?  I ran "smartctl -a -d ata /dev/sca >/boot/smart.txt"  I attached the result, but I have a feeling this wasnt what were looking for.

 

2.  How do I start the server in safemode with no access to the GUI.  Searched and searched, cant find a command...

 

Thanks again!!!

And that smart report is for the wrong drive.  If you've reset the system then its possible that the drive letter has changed.  The easiest way to get a smart report is from the GUI.  Click on the drive, then select drive attributes for disk 4.

 

BTW, that smart report you posted looks good

Link to comment

Alrighty.  Started in safemode, attached is the log.  Doing a parity check right now.  What is my next step to ensure things are stable to boot in normal mode?

Let parity check complete and make sure the UI shows no parity errors or I/O errors on any disk. Then post another syslog.
Link to comment

That new syslog still has a bunch of errors on disk4 (sdc).  Do as trurl suggests and let the parity check finish, although I suspect disk4 will show lots of errors.  Once the parity check is complete, also post a SMART report for disk4.  The one you posted previously was the wrong disk.

 

Also, have you checked the cables / reseated connections like Squid suggested?

Link to comment

On topic but away from helping the OP - why is the user forced to drop to the command line when there is a disk issue here. I see more and more instances of the GUI locking up or being inaccessible by users (and have experienced it myself). I consistently see posts suggesting only Advanced users are expected to use the command line. Therefore shouldn't there be the issuing a bug report?

Link to comment

On topic but away from helping the OP - why is the user forced to drop to the command line when there is a disk issue here. I see more and more instances of the GUI locking up or being inaccessible by users (and have experienced it myself). I consistently see posts suggesting only Advanced users are expected to use the command line. Therefore shouldn't there be the issuing a bug report?

They're not forced when there is a disk issue (you can get all the smart info via the gui in v6).  But, if the GUI locks up / freezes for what ever reason (this is lessened on the next beta which is due to be released very shortly) there isn't really a choice)
Link to comment

Considering disk4 is obviously experiencing issues, I would suggest to cancel the parity check and pull a SMART report on the drive.  The reason I say this is if disk4 is really in the process of dying (which it certainly appears to be doing) you'd be better off replacing it now and rebuilding before you happen to have another drive failure and lose data.  The chances of another drive failing is small, but it can happen.

 

It could also be a loose connection to the drive.  You still have not said whether or not you checked / reseated cables.

Link to comment

Alrighty.  Parity is complete and had 2 million errors on disk 4.  Replaced the cable.  How do I get a SMART report on the drive?  I see on the wiki the command, but how do I get the report for just disk 4?

Click on the drive from Main, then Health, Disk Attributes
Link to comment

1.  How do I run a smart report on just one disk?  I ran "smartctl -a -d ata /dev/sca >/boot/smart.txt"  I attached the result, but I have a feeling this wasnt what were looking for.

 

The correct command would be =>  smartctl -a -d ata /dev/sdc >/boot/smart.txt.

 

The correct way to obtain the syslog now is to click the Download button at the bottom of the syslog page on the Tools tab.  That will download syslog.zip containing a complete syslog.  Attach the zip file here.  I believe there are improvements coming in the next release, and the button will move near the top.  Plus, I'm hoping there will be a button to download the full SMART report for each drive too.

 

The 3 syslogs were just pieces of the syslog, about 1400 lines each but missing the initial setup of the drives and missing the very first error messages, which are usually the most important.

 

It does appear that Disk 4 (sdc, 2GB WD with serial ending in 2186) is in serious trouble, with both media errors (bad magnetic surface issues) and AMNF error flags.  The AMNF error is rather unusual, means Address Mark Not Found, and isn't used any more (or so I thought), as it doesn't even appear in the ATA errors wiki page any more.  It's serious, similar to the IDNF error flag, and means a seek cannot succeed so a sector cannot be located.  It may indicate head and/or surface damage.  I suspect the drive is making strange noises?  It would not surprise me to find that the SMART report indicates the drive is already considered FAILED.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...