May 5, 201016 yr Woke up this morning to find that my machine was still on (I use the hibernate script to put it to sleep after no activity). When I went to check the syslog, it was full (literally, the entire syslog) of this: May 5 08:34:29 Tower kernel: md: disk1: ATA_OP_STANDBYNOW1 ioctl error: -5 May 5 08:34:29 Tower kernel: mdcmd (28657): spindown 2 May 5 08:34:29 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 May 5 08:34:39 Tower kernel: mdcmd (28659): spindown 1 Looks like unmenu died, but I was still able to access the normal web interface and log in through telnet. I tried stopping the array, but the unmount were being uncooperative. I think I finally managed to get them all unmounted, but now stopping is giving me trouble. The web interface is stuck on 'Stopping...' (and it looks like all disks have spun down). If I try to run: mdcmd stop I get: cmdOper=stop cmdResult=failed How should I proceed?
May 5, 201016 yr Type mount Odds are one of your disks is still mounted, otherwise the "mdcmd stop" would have succeeded. Joe L.
May 5, 201016 yr Author As usual, Joe, you are correct: root@Tower:~# mount fusectl on /sys/fs/fuse/connections type fusectl (rw) usbfs on /proc/bus/usb type usbfs (rw) /dev/sdg1 on /boot type vfat (rw,noatime,nodiratime,umask=0,shortname=mixed) /dev/md3 on /mnt/disk3 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr) /dev/md5 on /mnt/disk5 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr) /dev/md4 on /mnt/disk4 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr) nfsd on /proc/fs/nfs type nfsd (rw) I went through and tried to unmount the drives (which I'd tried before--each call to umount hung) and it appeared to work, I now get this from mount: root@Tower:~# mount fusectl on /sys/fs/fuse/connections type fusectl (rw) usbfs on /proc/bus/usb type usbfs (rw) /dev/sdg1 on /boot type vfat (rw,noatime,nodiratime,umask=0,shortname=mixed) nfsd on /proc/fs/nfs type nfsd (rw) But still no luck on stop: root@Tower:~# mdcmd stop cmdOper=stop cmdResult=failed Syslog is still continuously printing the complaints about disk 1 and 2 I mentioned in the OP.
May 5, 201016 yr As usual, Joe, you are correct: root@Tower:~# mount fusectl on /sys/fs/fuse/connections type fusectl (rw) usbfs on /proc/bus/usb type usbfs (rw) /dev/sdg1 on /boot type vfat (rw,noatime,nodiratime,umask=0,shortname=mixed) /dev/md3 on /mnt/disk3 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr) /dev/md5 on /mnt/disk5 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr) /dev/md4 on /mnt/disk4 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr) nfsd on /proc/fs/nfs type nfsd (rw) You probably need to stop the nfs program before you can un-mount those drives. I went through and tried to unmount the drives (which I'd tried before--each call to umount hung) and it appeared to work, I now get this from mount: root@Tower:~# mount fusectl on /sys/fs/fuse/connections type fusectl (rw) usbfs on /proc/bus/usb type usbfs (rw) /dev/sdg1 on /boot type vfat (rw,noatime,nodiratime,umask=0,shortname=mixed) nfsd on /proc/fs/nfs type nfsd (rw) But still no luck on stop: root@Tower:~# mdcmd stop cmdOper=stop cmdResult=failed Syslog is still continuously printing the complaints about disk 1 and 2 I mentioned in the OP. You can try to tell the array to spinup disk 1 and 2. Perhaps the messages to the error log will stop. mdcmd spinup 1 mdcmd spinup 2 See if the messages stop Then, shut down the nfsd daemon process. /etc/rc.d/rc.nfsd stop Then try to stop the array once more. mdcmd stop Joe L.
May 5, 201016 yr Author Hey Joe, thanks for the help. I spun up both of the disks, and that seemed to work alright; the messages seemed to stop. I then stopped the nfs daemon and that seemed alright too, but still got a failure when trying to stop the array. It complains that there are still devices in use. Here's the end of the syslog, starting when I spun up the drives: May 5 11:57:10 Tower kernel: mdcmd (32308): spinup 1 May 5 11:57:10 Tower kernel: May 5 11:57:10 Tower kernel: md: disk1: ATA_OP_SETIDLE1 ioctl error: -5 May 5 11:57:16 Tower kernel: mdcmd (32310): spindown 2 May 5 11:57:16 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 May 5 11:57:17 Tower kernel: mdcmd (32311): spinup 2 May 5 11:57:17 Tower kernel: May 5 11:57:17 Tower kernel: md: disk2: ATA_OP_SETIDLE1 ioctl error: -5 May 5 11:58:09 Tower kernel: mdcmd (32317): stop May 5 11:58:09 Tower kernel: md: 10 devices still in use.
May 5, 201016 yr Hey Joe, thanks for the help. I spun up both of the disks, and that seemed to work alright; the messages seemed to stop. I then stopped the nfs daemon and that seemed alright too, but still got a failure when trying to stop the array. It complains that there are still devices in use. Here's the end of the syslog, starting when I spun up the drives: May 5 11:57:10 Tower kernel: mdcmd (32308): spinup 1 May 5 11:57:10 Tower kernel: May 5 11:57:10 Tower kernel: md: disk1: ATA_OP_SETIDLE1 ioctl error: -5 May 5 11:57:16 Tower kernel: mdcmd (32310): spindown 2 May 5 11:57:16 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5 May 5 11:57:17 Tower kernel: mdcmd (32311): spinup 2 May 5 11:57:17 Tower kernel: May 5 11:57:17 Tower kernel: md: disk2: ATA_OP_SETIDLE1 ioctl error: -5 May 5 11:58:09 Tower kernel: mdcmd (32317): stop May 5 11:58:09 Tower kernel: md: 10 devices still in use. You've got me stumped... unless it it the "spinup" processes it creates that are keeping it busy. you can try a fuser -k /dev/md1 fuser -k /dev/md2 etc That might kill the process IDs keeping the disks busy. Or, try this killall emhttp nohup /usr/local/sbin/emhttp & That will kill emhttp if it was still running, then it will start it again, and then you might be able to use the management web-page at http://tower to stop the array.
May 6, 201016 yr Author Hmmm, the fuser calls didn't report anything. But after stopping and re-start emhttp, I see this in syslog which makes me a bit nervous: May 5 16:22:31 Tower emhttp: Device inventory: May 5 16:22:31 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 host3 (sdc) ST31500341AS_9VS10W5D May 5 16:22:31 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:0:0 host4 (sdd) ST31500341AS_9VS22PKG May 5 16:22:31 Tower emhttp: pci-0000:00:1f.5-scsi-0:0:0:0 host5 (sde) ST32000542AS_6XW067PY May 5 16:22:31 Tower emhttp: pci-0000:00:1f.5-scsi-1:0:0:0 host6 (sdf) ST31500341AS_9VS236TQ May 5 16:22:31 Tower emhttp: pci-0000:03:00.0-scsi-0:0:0:0 host0 (sda) no id May 5 16:22:31 Tower emhttp: pci-0000:03:00.0-scsi-1:0:0:0 host1 (sdb) no id Guess not a big surprise given what I'm seeing...hopefully the disks will show back up. What am I looking at here in terms of the best way to go? Reboot and a prayer? Thanks again for the help Joe.
May 6, 201016 yr Hmmm, the fuser calls didn't report anything. But after stopping and re-start emhttp, I see this in syslog which makes me a bit nervous: May 5 16:22:31 Tower emhttp: Device inventory: May 5 16:22:31 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 host3 (sdc) ST31500341AS_9VS10W5D May 5 16:22:31 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:0:0 host4 (sdd) ST31500341AS_9VS22PKG May 5 16:22:31 Tower emhttp: pci-0000:00:1f.5-scsi-0:0:0:0 host5 (sde) ST32000542AS_6XW067PY May 5 16:22:31 Tower emhttp: pci-0000:00:1f.5-scsi-1:0:0:0 host6 (sdf) ST31500341AS_9VS236TQ May 5 16:22:31 Tower emhttp: pci-0000:03:00.0-scsi-0:0:0:0 host0 (sda) no id May 5 16:22:31 Tower emhttp: pci-0000:03:00.0-scsi-1:0:0:0 host1 (sdb) no id Guess not a big surprise given what I'm seeing...hopefully the disks will show back up. What am I looking at here... What you're looking here may be a bug related to the version of udev that's in 4.5.3. I've seen other people reporting this version of unRAID failing to see disk IDs. (downgrade to 4.5.1 shows the disk IDs) Email Limetech and provide them with the details. [email protected] [...] these changes were made: - update linux kernel to 2.6.32.9 - update linux udev to 1.41 Limetech probably had their good reasons for chosing version 141, we don't know. Version 141 of udev was released on April 8, 2009, and there have been twelve releases since then. On March 3, 2010, (unRAID 4.5.3 release) the stable version of udev was 151. (released January 27, 2010) Currently, the stable version of udev is 153. (released April 21, 2010)
May 6, 201016 yr Slackware 13 used udev 141. The reason udev was upgraded at all was to resolve the USB GUID issue(s) people were having. Slackware-Current didn't upgrade to 151 until slightly after unRaid 4.5.3 was released, and they just upgraded to 153 within the past week or two.
May 6, 201016 yr I see. Well, in any case, the udev version is the only major change I can think of that may be causing the disk without ID issue. By going back to 4.5.1 all disk IDs show normally.
May 6, 201016 yr Anyone feel 'brave'/'stupid' enough trying to update to udev153 on their in-memory/ram filesystem unRAID system using something like the following? [stop array] [stop emhttp] wget http://slackware.osuosl.org/slackware-current/slackware/a/udev-153-i486-1.txz upgradepkg udev-153-i486-1.txz /etc/rc.d/rc.udev force-restart [start emhttp] Then you can examine the syslog device inventory dump to see if the drives are identified.
May 6, 201016 yr I should add, I do have unRAID working on my full Slackware-Current system that uses udev153. However, I did have to copy a as-of-recently obsolete lib or bin file from the 4.5.3 layout to completely pacify something, but I can't recall what it was right now or if it was even related to udev at all.
May 6, 201016 yr Author Hmmm, the fuser calls didn't report anything. But after stopping and re-start emhttp, I see this in syslog which makes me a bit nervous: May 5 16:22:31 Tower emhttp: Device inventory: May 5 16:22:31 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 host3 (sdc) ST31500341AS_9VS10W5D May 5 16:22:31 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:0:0 host4 (sdd) ST31500341AS_9VS22PKG May 5 16:22:31 Tower emhttp: pci-0000:00:1f.5-scsi-0:0:0:0 host5 (sde) ST32000542AS_6XW067PY May 5 16:22:31 Tower emhttp: pci-0000:00:1f.5-scsi-1:0:0:0 host6 (sdf) ST31500341AS_9VS236TQ May 5 16:22:31 Tower emhttp: pci-0000:03:00.0-scsi-0:0:0:0 host0 (sda) no id May 5 16:22:31 Tower emhttp: pci-0000:03:00.0-scsi-1:0:0:0 host1 (sdb) no id Guess not a big surprise given what I'm seeing...hopefully the disks will show back up. What am I looking at here... What you're looking here may be a bug related to the version of udev that's in 4.5.3. I've seen other people reporting this version of unRAID failing to see disk IDs. (downgrade to 4.5.1 shows the disk IDs) Email Limetech and provide them with the details. [email protected] [...] these changes were made: - update linux kernel to 2.6.32.9 - update linux udev to 1.41 Limetech probably had their good reasons for chosing version 141, we don't know. Version 141 of udev was released on April 8, 2009, and there have been twelve releases since then. On March 3, 2010, (unRAID 4.5.3 release) the stable version of udev was 151. (released January 27, 2010) Currently, the stable version of udev is 153. (released April 21, 2010) Purko, would the symptoms of this be that the disks are sometimes not recognized, or never? I've had 4.5.3 running for a while now without seeing this problem.
May 7, 201016 yr Anyone feel 'brave'/'stupid' enough trying to update to udev153 on their in-memory/ram filesystem unRAID system using something like the following? [stop array] [stop emhttp] wget http://slackware.osuosl.org/slackware-current/slackware/a/udev-153-i486-1.txz upgradepkg udev-153-i486-1.txz /etc/rc.d/rc.udev force-restart [start emhttp] Then you can examine the syslog device inventory dump to see if the drives are identified. Since it was me who brought up udev, it was only right that I be 'brave'/'stupid' and try it. 1. Got the needed packages: mkdir -p /boot/packages cd /boot/packages wget 'ftp://slackware.osuosl.org/pub/slackware/slackware-current/slackware/a/udev-*.t?z' wget 'ftp://slackware.osuosl.org/pub/slackware/slackware-current/slackware/a/glibc-solibs-*.t?z' 2. Stopped the array. 3. Upgraded the stuff and then started emhttp again: killall emhttp installpkg /boot/packages/glibc-solibs-*.t?z installpkg /boot/packages/udev-*.t?z /etc/rc.d/rc.udev force-restart /usr/local/sbin/emhttp & disown udevd --version 4. Pointed my web browser to the unRAID management interface, and verified that everything is working normally. Stopped/Started the array a few times, copied some stuff to/from the array, etc. Examined my syslog and found nothing bad there. Of course I can't exactly replicate fortytwo's siuation since all my disks in the 'Device inventory' have their IDs. It will be interesting to see what fortytwo's 'Device inventory' will be if he follows steps one through four above.
May 7, 201016 yr Author Alright, well...I was fairly confident the disks weren't bad that I went ahead and rebooted and it looks good now. It's doing a parity check as we speak. Purko, This is the first I've seen this issue...is the problem with udev that some disks would never be recognized? Or, randomly not recognized? Like I said, I hadn't seen this before. If it happens again, I'd probably be willing to give the udev update a try...
May 7, 201016 yr This is the first I've seen this issue... Can you please look in your syslog now and see what the "Device inventory" looks like? Do they all have IDs?
May 7, 201016 yr Author This is the first I've seen this issue... Can you please look in your syslog now and see what the "Device inventory" looks like? Do they all have IDs? Yep: May 6 17:40:15 Tower emhttp: Device inventory: May 6 17:40:15 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 host3 (sdc) ST31500341AS_9VS10W5D May 6 17:40:15 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:0:0 host4 (sdd) ST31500341AS_9VS22PKG May 6 17:40:15 Tower emhttp: pci-0000:00:1f.5-scsi-0:0:0:0 host5 (sde) ST32000542AS_6XW067PY May 6 17:40:15 Tower emhttp: pci-0000:00:1f.5-scsi-1:0:0:0 host6 (sdf) ST31500341AS_9VS236TQ May 6 17:40:15 Tower emhttp: pci-0000:03:00.0-scsi-0:0:0:0 host0 (sda) ST31500341AS_9VS048Q7 May 6 17:40:15 Tower emhttp: pci-0000:03:00.0-scsi-1:0:0:0 host1 (sdb) ST3750640AS_5QD334MF
May 7, 201016 yr This is very, very strange! Have you run memtest lately? Select memtest from the boot menu and let it run overnight.
May 7, 201016 yr Author This is very, very strange! Have you run memtest lately? Select memtest from the boot menu and let it run overnight. I haven't...I had two sticks of memory which I've had for a while, one went bad a little over 6 months ago, so I wouldn't be surprised if this one was starting to fail. Good idea!
Archived
This topic is now archived and is closed to further replies.