October 2, 201015 yr For some time now (possibly from the time I installed the Plus key file) my system has frequently failed to complete the boot. With frequent power failures (three in the last 24 hours) this is rather inconvenient. I normally run the system headless, but connected a monitor in order to investigate. What I discover is that my USB flash drive is being recognised as the kernel/drivers load, and the boot completes to the login command. However, the USB drive doesn't get mounted properly. I don't see any mount errors in the log, but 'ls /boot' and 'ls /mnt' both return empty results. This is different to the problems I had when I first built the system (fixed by a BIOS update) where the flash drive wasn't being recognised at all. I have tried the usb flash in several usb sockets and the problems persist. Can anyone suggest what might be going wrong here, and how to fix it? Just now it took four resets before I achieved a good boot.
October 4, 201015 yr Author Has no one any suggestions as to how I should go about investigating this issue? Is there anyway I can capture the log when unRAID hasn't mounted any storage devices? Copying it out by hand from the display could take a long time! Is there any particular area of the log which might yield pertinent information?
October 4, 201015 yr Try removing the USB stick from the server and running a check disk on the usb drive to see if something may have happened because of the power outages. Look into get an APC battery backup unit to help protect against power outages. I have one hooked up to my server and it has saved my butt a couple of times.
October 4, 201015 yr Author Try removing the USB stick from the server and running a check disk on the usb drive to see if something may have happened because of the power outages. Thank you very much for the reply, I will do that, but the fact that the problem is intermittent leads me to suspect that there's nothing physically wrong with the drive. It seems more like a marginal timing issue, and examination of the log might be more likely to identify the cause. Look into get an APC battery backup unit to help protect against power outages. I have one hooked up to my server and it has saved my butt a couple of times. Yes, I already have an APC, and it eliminates the need for repeated parity checks. We've had two power outages today, both of which required one press of the reset button to get unRAID started.
October 4, 201015 yr I had the same issue a while back with my USB stick. I formatted the drive and rebuilt it from scratch and it worked fine. I'm guessing something was currupted or misplaced. Not sure how, but after a rebuild its been working for months flawlessly since. Luckly when I first had the issue it was right after I first built my machine so I expected some kind of problem right out of the gate. All happy now. If that doesn't fix it you could have a faulty USB too. Of course I'm just guessing here.
October 6, 201015 yr Author I've performed a check disk (on a WinXP system), and it reports no errors on the USB drive. Closer examination of syslog doesn't really tell me a lot: .... Oct 6 11:40:00 Tower kernel: sda: sda1 Oct 6 11:40:00 Tower kernel: sd 0:0:0:0: [sda] Attached SCSI disk Oct 6 11:40:00 Tower kernel: sdd1 Oct 6 11:40:00 Tower kernel: sd 3:0:0:0: [sdd] Attached SCSI disk Oct 6 11:40:00 Tower kernel: sdc1 Oct 6 11:40:00 Tower kernel: sd 2:0:0:0: [sdc] Attached SCSI disk Oct 6 11:40:00 Tower kernel: sdb1 Oct 6 11:40:00 Tower kernel: sd 1:0:0:0: [sdb] Attached SCSI disk Oct 6 11:40:00 Tower kernel: i801_smbus 0000:00:1f.3: PCI INT C -> GSI 18 (level, low) -> IRQ 18 Oct 6 11:40:00 Tower kernel: generic-usb 0003:051D:0002.0001: hiddev96,hidraw0: USB HID v1.10 Device [American Power Conversion Back-UPS CS 650 FW:817.v4.I USB FW:v4] on usb-0000:00:1a.0-1.1/input0 Oct 6 11:40:00 Tower kernel: usb 1-1.5: new high speed USB device using ehci_hcd and address 4 Oct 6 11:40:00 Tower kernel: usb 1-1.5: configuration #1 chosen from 1 choice Oct 6 11:40:00 Tower kernel: scsi6 : SCSI emulation for USB Mass Storage devices Oct 6 11:40:00 Tower kernel: usb-storage: device found at 4 Oct 6 11:40:00 Tower kernel: usb-storage: waiting for device to settle before scanning .... At this point, additional information appears on screen but is not recorded in the syslog: usbfs on /proc/bus/usb type usbfs (rw) mounting non-root local filesystems: .... Oct 6 11:40:00 Tower kernel: scsi 6:0:0:0: Direct-Access Kingston DataTravelerMini PMAP PQ: 0 ANSI: 0 CCS Oct 6 11:40:00 Tower kernel: usb-storage: device scan complete .... That last line sometimes appears after either of the next two lines - I guess that we have asynchronous processes here. .... Oct 6 11:40:00 Tower kernel: sd 6:0:0:0: [sde] 2015232 512-byte logical blocks: (1.03 GB/984 MiB) Oct 6 11:40:00 Tower kernel: sd 6:0:0:0: [sde] Write Protect is off Oct 6 11:40:00 Tower kernel: sd 6:0:0:0: [sde] Mode Sense: 23 00 00 00 Oct 6 11:40:00 Tower kernel: sd 6:0:0:0: [sde] Assuming drive cache: write through Oct 6 11:40:00 Tower kernel: sd 6:0:0:0: [sde] Assuming drive cache: write through Oct 6 11:40:00 Tower kernel: sde: sde1 Oct 6 11:40:00 Tower kernel: sd 6:0:0:0: [sde] Assuming drive cache: write through Oct 6 11:40:00 Tower kernel: sd 6:0:0:0: [sde] Attached SCSI removable disk .... This is where the syslog finishes on a failed boot, but on screen I see additional information: mount: special device /dev/disk/by-label/UNRAID does not exist INIT Entering run level: 3 Going multi user ... Updating shared library links: /sbin/ldconfig .... etc. On a good boot, the syslog continues as follows: Oct 6 11:40:00 Tower logger: /etc/rc.d/rc.inet1: /sbin/ifconfig lo 127.0.0.1 Oct 6 11:40:00 Tower logger: /etc/rc.d/rc.inet1: /sbin/route add -net 127.0.0.0 netmask 255.0.0.0 lo Oct 6 11:40:00 Tower ifplugd(eth0)[1363]: ifplugd 0.28 initializing. Oct 6 11:40:00 Tower kernel: e1000e 0000:00:19.0: irq 28 for MSI/MSI-X Oct 6 11:40:00 Tower ifplugd(eth0)[1363]: Using interface eth0/70:71:BC:28:F5:6D with driver (version: 1.0.2-k2) Oct 6 11:40:00 Tower ifplugd(eth0)[1363]: Using detection mode: SIOCETHTOOL Oct 6 11:40:00 Tower ifplugd(eth0)[1363]: Initialization complete, link beat not detected. Oct 6 11:40:00 Tower kernel: e1000e 0000:00:19.0: irq 28 for MSI/MSI-X Oct 6 11:40:00 Tower rpc.statd[1375]: Version 1.1.4 Starting Oct 6 11:40:00 Tower emhttp: unRAID System Management Utility version 4.5.6 Oct 6 11:40:00 Tower emhttp: Copyright (C) 2005-2010, Lime Technology, LLC Oct 6 11:40:00 Tower emhttp: Plus key detected, GUID: 0951-1605-0000-5B731200014D Oct 6 11:40:00 Tower emhttp: shcmd (1): udevadm settle Oct 6 11:40:00 Tower emhttp: Device inventory: .... Clearly, the problem is the 'mount: special device /dev/disk/by-label/UNRAID does not exist', but I cannot see why this should be the case when the next attempt to boot can find the UNRAID device. It seems obvious that the problem is not related to any of the file content on the usb stick, because we don't even get as far as mounting. I guess that my step will be to back up the content of the usb stick, and reformat it.
October 6, 201015 yr It is most likely a timing issue, with the volume UNRAID label not detected when initially looked for. A "slow" flash drive might be the cause. I really do not know. Joe L.
October 6, 201015 yr Author Joe, thanks for the very speedy reply (as always ... do you do anything other than unRAID support???). Yes, I had already suspected that this is a timing (or temperature?) issue. However, I have to say that this is one of my faster usb sticks - I have another (SanDisk Cruzer Blade) which takes about twice as long to load the images. The reason I decided to register the DataTraveler was because of its higher performance. This boot problem only manifested itself after adding the key file (and a cache disk), when the usb stick had already been in use on the current h/w configuration for six weeks, and for a few weeks prior to that on my previous unRAID configuration!
October 6, 201015 yr Author Well, I've reformatted the drive, done syslinux (which didn't work from Ubuntu? - had to use Windows), and then added back all the files. The system booted cleanly from the first power-on ... I'll see what happens over the next few days. We haven't had a power cut since midnight (16 hours ago), so I'm sure we must be due for another one soon!
October 8, 201015 yr Author The next power cut occurred last night and, when I turned the generator on, the system booted up properly.
October 12, 201015 yr Author The next power cut occurred last night and, when I turned the generator on, the system booted up properly. Well, the system started after the next power cut, too. We've had other power cuts, but I've been around to start the generator before the 5 minute timeout I have set for the UPS-supported shutdown to occur. However, we had a power cut today while I was out and the system had failed to restart once again. Furthermore, when I did reboot, a parity check was initiated. This is the first unintentional parity check since I installed the APC UPS three months ago. Puzzled as to why this should be, I've tried to examine the syslog, only to find that there is no syslog stored for the period October 9 to October 12. What on earth is happening to my system??? ====================================== Well, as I suspected, it appears that the power monitoring/shutdown had failed. Even now, some two to three hours after mains power was restored, the UPS is reporting only 72% battery charge, suggesting that the UPS, and the unRAID server, just kept running until the battery was exhausted. Obviously, the UPS is reporting 'No transfers since turnon', so I think that I have no way of knowing when the power failed, or why the system didn't do a controlled shutdown. It would appear that the power was off for some considerable time, because my router rebooted 2:52 ago. The router runs through a 680 Watt dumb UPS - it will normally stay up for at least 2 hours after the mains power fails.
October 15, 201015 yr Author After this morning's powercut the system started up perfectly normally, but after this afternoon's powercut it had failed to restart. At least there was no parity check on either occasion.
October 15, 201015 yr So I gotta ask - why does your power keep going out? With the frequency you're experiencing outages, I'd be inclined to just leave the system off until its resolved.
October 16, 201015 yr Author So I gotta ask - why does your power keep going out? With the frequency you're experiencing outages, I'd be inclined to just leave the system off until its resolved. Hmmm ... I wish! Unfortunately this is the norm for (this part of?) the Philippines. We rarely have a day when there is no power outage - the worst I have counted was seven in one day. Then, of course, there was the time when the whole of the city was without power for three days - simply due to a transmission system failure. I would estimate that power cuts in the UK averaged at around one every three years (excluding the 'winter of discontent' - 1978-9). This is the reason I stand absolutely no chance of getting past Level 1 testing on my mobo.
Archived
This topic is now archived and is closed to further replies.