Jump to content
We're Hiring! Full Stack Developer ×

In Need of an unRAID ninja [SOLVED]


Recommended Posts

Hi Everyone,

 

I lost power yesterday. Upon booting my unRAID sever up, I'm having some peculiar troubles, and I'm not quite sure how to proceed. I am able to ssh into the box, and am able to connect over unMENU. unMENU shows that none of my drives are in the array - but shows all but one in the unprotected area. It does not see one of my drives at all. During boot, I noticed that the error message:

 

Aug 28 09:00:00 fridge kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)

Aug 28 09:00:00 fridge kernel: ata6.00: link online but device misclassifed (Drive related)

Aug 28 09:00:00 fridge kernel: ata6: link online but 1 devices misclassified, retrying (Drive related)

Aug 28 09:00:00 fridge kernel: ata6: reset failed (errno=-11), retrying in 5 secs (Minor Issues)

Aug 28 09:00:00 fridge kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)

Aug 28 09:00:00 fridge kernel: ata6.00: link online but device misclassifed (Drive related)

Aug 28 09:00:00 fridge kernel: ata6: link online but 1 devices misclassified, retrying (Drive related)

Aug 28 09:00:00 fridge kernel: ata6: reset failed (errno=-11), retrying in 5 secs (Minor Issues)

Aug 28 09:00:00 fridge kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)

Aug 28 09:00:00 fridge kernel: ata6.00: link online but device misclassifed (Drive related)

Aug 28 09:00:00 fridge kernel: ata6: link online but 1 devices misclassified, retrying (Drive related)

Aug 28 09:00:00 fridge kernel: ata6: reset failed (errno=-11), retrying in 30 secs (Minor Issues)

Aug 28 09:00:00 fridge kernel: ata6: limiting SATA link speed to 1.5 Gbps (Drive related)

 

Which correlates to the drive that is not showing up in unMENU at all.

Now, the interesting portion is that I cannot access the normal unRAID web interface. If I manually type the IP address in the browser, it simply times out. If I select, within unMENU, 'unRAID main', I get a connection refused. Attempting to load 'mymain' from within unMENU seems to crash it, disabling my web access until I relaunch it from the command line.

 

How should I best proceed here to minimize data loss and triage the issues accessing unRAID from the browser?

 

I've attached a syslog. Thank you in advance.

syslog-2011-08-28.txt

Link to comment

power down.  power up cleanly.  Let the array come back online.  It might take as much as 20 minutes or more until all the disks replay their journels and are mounted and accessible.

 

Just don't do anything rash or stupid and try to reformat the disks, or re-cable the server, or remove or add disks.

 

Joe L.

Link to comment

Thanks for the reply Joe L.

 

This is perhaps a dumb question, but do I need to do anything 'special' to get the unit to cleanly power down? I was not able to use the power down script from within unMENU. I sshed into the box, and initiated the powerdown command, receiving the following output:

 

 

root@fridge:~# powerdown

Capturing information to syslog. Please wait...

version[12274]: Linux version 2.6.32.9-unRAID (root@Develop) (gcc version 4.2.3) #5 SMP Wed Jun 16 20:45:26 MDT 2010

ls: cannot access /dev/hd[a-z]: No such file or directory

ls: cannot access /dev/hd[a-z]: No such file or directory

/etc/rc.d/rc.unRAID: line 84: ${FILE}: ambiguous redirect

/etc/rc.d/rc.unRAID: line 84: ${FILE}: ambiguous redirect

mdcmd; /proc/mdcmd does not exist

status[12375]: State:

status[12375]: D#          Model / Serial          Status        Device   

status[12375]: 0                  /                                         

status[12375]: SMART overall health assessment

ls: cannot access /dev/hd[a-z]: No such file or directory

status[12375]: /dev/sda: SMART overall-health self-assessment test result: PASSED

status[12375]: /dev/sdb: SMART overall-health self-assessment test result: PASSED

status[12375]: /dev/sdc: SMART overall-health self-assessment test result: PASSED

status[12375]: /dev/sdd: SMART overall-health self-assessment test result: PASSED

status[12375]: /dev/sde: SMART Health Status: OK

status[12375]: No active PIDS on the array

Saving current syslog: /boot/logs/syslog-20110828-113510.txt

-rwxrwxrwx 1 root root 151749 Aug 28 11:35 /boot/logs/syslog-20110828-113510.txt

zip not installed. Consider installing to automatically zip current syslog

 

Broadcast message from root (pts/0) (Sun Aug 28 11:35:10 2011):

 

The system is going down for system halt NOW!

root@fridge:~# Connection to fridge closed by remote host.

Connection to fridge closed.

 

I can no longer ssh into the box, however, and it still appears to be on. I hooked it up to an external display, and in addition to the output displayed during the ssh session, I see the following:

 

Stopping syste message bus . . .

Unmounting remote filesystems.

Saving random seed from /dev/urandom in /etc/random-seed

turning off swap.

Unmounting local file systems.

umount: /boot: device is busy

usbfs umounted

fusectl umounted

Remounted root filesystem read-only

mount: can't find / in /etc/fstab or /etc/mtab

 

I hooked up a keyboard to the box (USB) and it doesn't seem to be accepting any commands and is simply stuck like that.

Link to comment

I've powered the server back on, and the results look much the same as in the previous instance. I'll leave it alone without touching anything for a bit, but my disk activity lights are not on (though the power lights are). Is there a way to see if it is doing anything? One of my disks is still not seen on the main page at all (the one attached to ATA6 that the syslog was having trouble with).

 

I've attached a screen shot of my unMENU main page, and the new syslog.

 

Thank you for all of your assistance.

fridge.pdf

syslog-2011-08-28-1.txt

Link to comment

Update: After letting it sit for about an hour and a half I'm still stuck in the same boat.

 

I can't access the server through the normal interface (IP address, or through unMenu's 'unRAID main'). Is there a way to forcibly start/ensure the web service is started?

 

For what it's worth, I can see the server in my sidebar (running OS 10.7), but can't connect to it there, either (it just says connection failed, never gives me the opportunity to mount even the flash disk).

Link to comment

Update: After letting it sit for about an hour and a half I'm still stuck in the same boat.

 

I can't access the server through the normal interface (IP address, or through unMenu's 'unRAID main'). Is there a way to forcibly start/ensure the web service is started?

 

For what it's worth, I can see the server in my sidebar (running OS 10.7), but can't connect to it there, either (it just says connection failed, never gives me the opportunity to mount even the flash disk).

I'm stumped.  It looks like emhttp started, but nothing else happened.

 

Type

killall emhttp

nohup emhttp &

 

to kill it if it is running and re-start it.

Link to comment

Hi there,

 

I did the commands as you recommended.

 

root@fridge:~# killall emhttp

root@fridge:~# nohup emhttp &

[3] 15692

root@fridge:~# nohup: ignoring input and appending output to `nohup.out'

 

Now I have a nohup.out file in the directory as well.

 

root@fridge:~# ls -la

total 16

drwx--x---  2 root root    0 Aug 28 16:30 ./

drwxr-xr-x 16 root root    0 Aug 28 12:00 ../

-rw-------  1 root root  41 Aug 28 13:42 .bash_history

-rw-r--r--  1 root root 1968 Aug 28 15:47 dead.letter

lrwxrwxrwx  1 root root  26 Jul  3  2010 initconfig -> /usr/local/sbin/initconfig*

-rwxr-xr-x  1 root root  80 Jul  3  2010 mdcmd*

-rw-------  1 root root    0 Aug 28 16:30 nohup.out

lrwxrwxrwx  1 root root  25 Jul  3  2010 powerdown -> /usr/local/sbin/powerdown

lrwxrwxrwx  1 root root  18 Jul  3  2010 samba -> /etc/rc.d/rc.samba*

 

But I still cannot connect. Looking at the relevant system log:

 

Aug 28 16:32:51 fridge emhttp: unRAID System Management Utility version 4.5.6 (Lime Tech)

Aug 28 16:32:51 fridge emhttp: Copyright © 2005-2010, Lime Technology, LLC (Lime Tech)

Aug 28 16:32:51 fridge emhttp: Pro key detected, GUID: [mykey] (Other emhttp)

Aug 28 16:32:51 fridge emhttp: shcmd (1): udevadm settle (Other emhttp)

Aug 28 16:32:51 fridge emhttp: Device inventory: (Drive related)

Aug 28 16:32:51 fridge emhttp: pci-0000:00:11.0-scsi-0:0:0:0 host1 (sda) WDC_WD20EARS-00S8B1_WD-WCAVY2685618 (Drive related)

Aug 28 16:32:51 fridge emhttp: pci-0000:00:11.0-scsi-2:0:0:0 host3 (sdb) WDC_WD20EARS-00S8B1_WD-WCAVY2733980 (Drive related)

Aug 28 16:32:51 fridge emhttp: pci-0000:00:11.0-scsi-3:0:0:0 host4 (sdc) WDC_WD5000AAKS-00A7B2_WD-WCASY8936660 (Drive related)

Aug 28 16:32:51 fridge emhttp: pci-0000:00:11.0-scsi-4:0:0:0 host5 (sdd) ST31000528AS_9VP2PK7X (Drive related)

Aug 28 16:32:51 fridge emhttp: restart_md_driver: stat pci-0000:00:11.0-scsi-5:0:0:0: No such file or directory (System)

Aug 28 16:32:51 fridge emhttp: shcmd (2): modprobe -rw md-mod 2>$stuff$1 | logger (Other emhttp)

 

Is the bolded line causing the problem? I believe that's my drive that isn't being seen. Will it hurt anything if I shut down the server, unsled the drive, and reboot?

Link to comment

Removing the not properly found drive from the bay allows the server to boot properly. In this instance I can go into the server through the browser interface, and, as expected it shows the pulled drive as missing.

 

Doing a "powerdown" command does properly powerdown, and shut off of the server.

 

I had an extra disk of the same size lying around, so I reformatted that, and replaced the wonky disk. The server booted right up, and recognized that I was upgrading the disk. Hopefully it is able to successfully rebuild.

 

As an aside, I solved another issue that was plaguing me months ago: My server had been very slow to boot (taking 15-20 minutes). After a few days of messing around in the BIOS I was unable to find the issue. At some point I believed it had simply fixed itself (any restarts of my server were very fast). It turns out if I have a USB keyboard plugged in, it takes 15-20 minutes to boot. If I remove it, it boots fine!

 

I'll post again when I have *hopefully* good news. Joe - thanks again for all of your help.

Link to comment

I was able to rebuild to my new disk. Everything seems to be working OK. I noticed that the rebuilt disk had 0 space available, and I kept getting alerts about duplicate files. I cleared up some space on it and reset the minimum space-required. My iTunes share was on that disk and seems to be corrupt, but that I can live with as the media is all intact.

 

An yes, a UPS will be purchased in the next week ;)

 

Thank you again for all the help and support.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...