Can't stop array

May 5, 201016 yr

Woke up this morning to find that my machine was still on (I use the hibernate script to put it to sleep after no activity). When I went to check the syslog, it was full (literally, the entire syslog) of this:

May  5 08:34:29 Tower kernel: md: disk1: ATA_OP_STANDBYNOW1 ioctl error: -5
May  5 08:34:29 Tower kernel: mdcmd (28657): spindown 2
May  5 08:34:29 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5
May  5 08:34:39 Tower kernel: mdcmd (28659): spindown 1

Looks like unmenu died, but I was still able to access the normal web interface and log in through telnet. I tried stopping the array, but the unmount were being uncooperative. I think I finally managed to get them all unmounted, but now stopping is giving me trouble. The web interface is stuck on 'Stopping...' (and it looks like all disks have spun down). If I try to run:

mdcmd stop

I get:

cmdOper=stop
cmdResult=failed

How should I proceed?

May 5, 201016 yr

Type

mount

Odds are one of your disks is still mounted, otherwise the "mdcmd stop" would have succeeded.

Joe L.

May 5, 201016 yr

Author

As usual, Joe, you are correct:

root@Tower:~# mount
fusectl on /sys/fs/fuse/connections type fusectl (rw)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/sdg1 on /boot type vfat (rw,noatime,nodiratime,umask=0,shortname=mixed)
/dev/md3 on /mnt/disk3 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr)
/dev/md5 on /mnt/disk5 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr)
/dev/md4 on /mnt/disk4 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr)
nfsd on /proc/fs/nfs type nfsd (rw)

I went through and tried to unmount the drives (which I'd tried before--each call to umount hung) and it appeared to work, I now get this from mount:

root@Tower:~# mount
fusectl on /sys/fs/fuse/connections type fusectl (rw)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/sdg1 on /boot type vfat (rw,noatime,nodiratime,umask=0,shortname=mixed)
nfsd on /proc/fs/nfs type nfsd (rw)

But still no luck on stop:

root@Tower:~# mdcmd stop
cmdOper=stop
cmdResult=failed

Syslog is still continuously printing the complaints about disk 1 and 2 I mentioned in the OP.

May 5, 201016 yr

As usual, Joe, you are correct:
root@Tower:~# mount
fusectl on /sys/fs/fuse/connections type fusectl (rw)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/sdg1 on /boot type vfat (rw,noatime,nodiratime,umask=0,shortname=mixed)
/dev/md3 on /mnt/disk3 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr)
/dev/md5 on /mnt/disk5 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr)
/dev/md4 on /mnt/disk4 type reiserfs (rw,noatime,nodiratime,noacl,nouser_xattr)
nfsd on /proc/fs/nfs type nfsd (rw)
You probably need to stop the nfs program before you can un-mount those drives.

I went through and tried to unmount the drives (which I'd tried before--each call to umount hung) and it appeared to work, I now get this from mount:
root@Tower:~# mount
fusectl on /sys/fs/fuse/connections type fusectl (rw)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/sdg1 on /boot type vfat (rw,noatime,nodiratime,umask=0,shortname=mixed)
nfsd on /proc/fs/nfs type nfsd (rw)
But still no luck on stop:
root@Tower:~# mdcmd stop
cmdOper=stop
cmdResult=failed
Syslog is still continuously printing the complaints about disk 1 and 2 I mentioned in the OP.

You can try to tell the array to spinup disk 1 and 2. Perhaps the messages to the error log will stop.

mdcmd spinup 1

mdcmd spinup 2

See if the messages stop

Then, shut down the nfsd daemon process.

/etc/rc.d/rc.nfsd stop

Then try to stop the array once more.

mdcmd stop

Joe L.

May 5, 201016 yr

Author

Hey Joe, thanks for the help.

I spun up both of the disks, and that seemed to work alright; the messages seemed to stop. I then stopped the nfs daemon and that seemed alright too, but still got a failure when trying to stop the array. It complains that there are still devices in use. Here's the end of the syslog, starting when I spun up the drives:

May  5 11:57:10 Tower kernel: mdcmd (32308): spinup 1
May  5 11:57:10 Tower kernel:
May  5 11:57:10 Tower kernel: md: disk1: ATA_OP_SETIDLE1 ioctl error: -5
May  5 11:57:16 Tower kernel: mdcmd (32310): spindown 2
May  5 11:57:16 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5
May  5 11:57:17 Tower kernel: mdcmd (32311): spinup 2
May  5 11:57:17 Tower kernel:
May  5 11:57:17 Tower kernel: md: disk2: ATA_OP_SETIDLE1 ioctl error: -5
May  5 11:58:09 Tower kernel: mdcmd (32317): stop
May  5 11:58:09 Tower kernel: md: 10 devices still in use.

May 5, 201016 yr

Hey Joe, thanks for the help.

I spun up both of the disks, and that seemed to work alright; the messages seemed to stop. I then stopped the nfs daemon and that seemed alright too, but still got a failure when trying to stop the array. It complains that there are still devices in use. Here's the end of the syslog, starting when I spun up the drives:
May  5 11:57:10 Tower kernel: mdcmd (32308): spinup 1
May  5 11:57:10 Tower kernel:
May  5 11:57:10 Tower kernel: md: disk1: ATA_OP_SETIDLE1 ioctl error: -5
May  5 11:57:16 Tower kernel: mdcmd (32310): spindown 2
May  5 11:57:16 Tower kernel: md: disk2: ATA_OP_STANDBYNOW1 ioctl error: -5
May  5 11:57:17 Tower kernel: mdcmd (32311): spinup 2
May  5 11:57:17 Tower kernel:
May  5 11:57:17 Tower kernel: md: disk2: ATA_OP_SETIDLE1 ioctl error: -5
May  5 11:58:09 Tower kernel: mdcmd (32317): stop
May  5 11:58:09 Tower kernel: md: 10 devices still in use.

You've got me stumped... unless it it the "spinup" processes it creates that are keeping it busy.

you can try a

fuser -k /dev/md1

fuser -k /dev/md2

etc

That might kill the process IDs keeping the disks busy.

Or, try this

killall emhttp

nohup /usr/local/sbin/emhttp &

That will kill emhttp if it was still running, then it will start it again, and then you might be able to use the management web-page at http://tower to stop the array.

May 6, 201016 yr

Author

Hmmm, the fuser calls didn't report anything. But after stopping and re-start emhttp, I see this in syslog which makes me a bit nervous:

May  5 16:22:31 Tower emhttp: Device inventory:
May  5 16:22:31 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 host3 (sdc) ST31500341AS_9VS10W5D
May  5 16:22:31 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:0:0 host4 (sdd) ST31500341AS_9VS22PKG
May  5 16:22:31 Tower emhttp: pci-0000:00:1f.5-scsi-0:0:0:0 host5 (sde) ST32000542AS_6XW067PY
May  5 16:22:31 Tower emhttp: pci-0000:00:1f.5-scsi-1:0:0:0 host6 (sdf) ST31500341AS_9VS236TQ
May  5 16:22:31 Tower emhttp: pci-0000:03:00.0-scsi-0:0:0:0 host0 (sda) no id
May  5 16:22:31 Tower emhttp: pci-0000:03:00.0-scsi-1:0:0:0 host1 (sdb) no id

Guess not a big surprise given what I'm seeing...hopefully the disks will show back up.

What am I looking at here in terms of the best way to go? Reboot and a prayer?

Thanks again for the help Joe.

May 6, 201016 yr

Hmmm, the fuser calls didn't report anything. But after stopping and re-start emhttp, I see this in syslog which makes me a bit nervous:
May  5 16:22:31 Tower emhttp: Device inventory:
May  5 16:22:31 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 host3 (sdc) ST31500341AS_9VS10W5D
May  5 16:22:31 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:0:0 host4 (sdd) ST31500341AS_9VS22PKG
May  5 16:22:31 Tower emhttp: pci-0000:00:1f.5-scsi-0:0:0:0 host5 (sde) ST32000542AS_6XW067PY
May  5 16:22:31 Tower emhttp: pci-0000:00:1f.5-scsi-1:0:0:0 host6 (sdf) ST31500341AS_9VS236TQ
May  5 16:22:31 Tower emhttp: pci-0000:03:00.0-scsi-0:0:0:0 host0 (sda) no id
May  5 16:22:31 Tower emhttp: pci-0000:03:00.0-scsi-1:0:0:0 host1 (sdb) no id
Guess not a big surprise given what I'm seeing...hopefully the disks will show back up.

What am I looking at here...

What you're looking here may be a bug related to the version of udev that's in 4.5.3.

I've seen other people reporting this version of unRAID failing to see disk IDs. (downgrade to 4.5.1 shows the disk IDs)

Email Limetech and provide them with the details. [email protected]

[...]

these changes were made:

- update linux kernel to 2.6.32.9

- update linux udev to 1.41

Limetech probably had their good reasons for chosing version 141, we don't know.

Version 141 of udev was released on April 8, 2009, and there have been twelve releases since then.

On March 3, 2010, (unRAID 4.5.3 release) the stable version of udev was 151. (released January 27, 2010)

Currently, the stable version of udev is 153. (released April 21, 2010)

May 6, 201016 yr

Slackware 13 used udev 141. The reason udev was upgraded at all was to resolve the USB GUID issue(s) people were having.

Slackware-Current didn't upgrade to 151 until slightly after unRaid 4.5.3 was released, and they just upgraded to 153 within the past week or two.

May 6, 201016 yr

I see. Well, in any case, the udev version is the only major change I can think of that may be causing the disk without ID issue.

By going back to 4.5.1 all disk IDs show normally.

May 6, 201016 yr

Anyone feel 'brave'/'stupid' enough trying to update to udev153 on their in-memory/ram filesystem unRAID system using something like the following?

[stop array]

[stop emhttp]

wget http://slackware.osuosl.org/slackware-current/slackware/a/udev-153-i486-1.txz

upgradepkg udev-153-i486-1.txz

/etc/rc.d/rc.udev force-restart

[start emhttp]

Then you can examine the syslog device inventory dump to see if the drives are identified.

May 6, 201016 yr

I should add, I do have unRAID working on my full Slackware-Current system that uses udev153.

However, I did have to copy a as-of-recently obsolete lib or bin file from the 4.5.3 layout to completely pacify something, but I can't recall what it was right now or if it was even related to udev at all.

May 6, 201016 yr

Author

Hmmm, the fuser calls didn't report anything. But after stopping and re-start emhttp, I see this in syslog which makes me a bit nervous:
May  5 16:22:31 Tower emhttp: Device inventory:
May  5 16:22:31 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 host3 (sdc) ST31500341AS_9VS10W5D
May  5 16:22:31 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:0:0 host4 (sdd) ST31500341AS_9VS22PKG
May  5 16:22:31 Tower emhttp: pci-0000:00:1f.5-scsi-0:0:0:0 host5 (sde) ST32000542AS_6XW067PY
May  5 16:22:31 Tower emhttp: pci-0000:00:1f.5-scsi-1:0:0:0 host6 (sdf) ST31500341AS_9VS236TQ
May  5 16:22:31 Tower emhttp: pci-0000:03:00.0-scsi-0:0:0:0 host0 (sda) no id
May  5 16:22:31 Tower emhttp: pci-0000:03:00.0-scsi-1:0:0:0 host1 (sdb) no id
Guess not a big surprise given what I'm seeing...hopefully the disks will show back up.

What am I looking at here...
What you're looking here may be a bug related to the version of udev that's in 4.5.3.

I've seen other people reporting this version of unRAID failing to see disk IDs. (downgrade to 4.5.1 shows the disk IDs)

Email Limetech and provide them with the details. [email protected]

[...]

these changes were made:

- update linux kernel to 2.6.32.9

- update linux udev to 1.41

Limetech probably had their good reasons for chosing version 141, we don't know.

Version 141 of udev was released on April 8, 2009, and there have been twelve releases since then.

On March 3, 2010, (unRAID 4.5.3 release) the stable version of udev was 151. (released January 27, 2010)

Currently, the stable version of udev is 153. (released April 21, 2010)

Purko, would the symptoms of this be that the disks are sometimes not recognized, or never? I've had 4.5.3 running for a while now without seeing this problem.

May 7, 201016 yr

Anyone feel 'brave'/'stupid' enough trying to update to udev153 on their in-memory/ram filesystem unRAID system using something like the following?

[stop array]

[stop emhttp]

wget http://slackware.osuosl.org/slackware-current/slackware/a/udev-153-i486-1.txz

upgradepkg udev-153-i486-1.txz

/etc/rc.d/rc.udev force-restart

[start emhttp]

Then you can examine the syslog device inventory dump to see if the drives are identified.

Since it was me who brought up udev, it was only right that I be 'brave'/'stupid' and try it.

1. Got the needed packages:

mkdir -p /boot/packages
cd /boot/packages
wget 'ftp://slackware.osuosl.org/pub/slackware/slackware-current/slackware/a/udev-*.t?z'
wget 'ftp://slackware.osuosl.org/pub/slackware/slackware-current/slackware/a/glibc-solibs-*.t?z'

2. Stopped the array.

3. Upgraded the stuff and then started emhttp again:

killall emhttp
installpkg /boot/packages/glibc-solibs-*.t?z
installpkg /boot/packages/udev-*.t?z
/etc/rc.d/rc.udev force-restart
/usr/local/sbin/emhttp &
disown
udevd --version

4. Pointed my web browser to the unRAID management interface, and verified that everything is working normally.

Stopped/Started the array a few times, copied some stuff to/from the array, etc. Examined my syslog and found nothing bad there.

Of course I can't exactly replicate fortytwo's siuation since all my disks in the 'Device inventory' have their IDs.

It will be interesting to see what fortytwo's 'Device inventory' will be if he follows steps one through four above.

May 7, 201016 yr

Author

Alright, well...I was fairly confident the disks weren't bad that I went ahead and rebooted and it looks good now. It's doing a parity check as we speak.

Purko,

This is the first I've seen this issue...is the problem with udev that some disks would never be recognized? Or, randomly not recognized? Like I said, I hadn't seen this before. If it happens again, I'd probably be willing to give the udev update a try...

May 7, 201016 yr

This is the first I've seen this issue...

Can you please look in your syslog now and see what the "Device inventory" looks like? Do they all have IDs?

May 7, 201016 yr

Author

This is the first I've seen this issue...

Can you please look in your syslog now and see what the "Device inventory" looks like? Do they all have IDs?

Yep:

May 6 17:40:15 Tower emhttp: Device inventory:
May 6 17:40:15 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 host3 (sdc) ST31500341AS_9VS10W5D
May 6 17:40:15 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:0:0 host4 (sdd) ST31500341AS_9VS22PKG
May 6 17:40:15 Tower emhttp: pci-0000:00:1f.5-scsi-0:0:0:0 host5 (sde) ST32000542AS_6XW067PY
May 6 17:40:15 Tower emhttp: pci-0000:00:1f.5-scsi-1:0:0:0 host6 (sdf) ST31500341AS_9VS236TQ
May 6 17:40:15 Tower emhttp: pci-0000:03:00.0-scsi-0:0:0:0 host0 (sda) ST31500341AS_9VS048Q7
May 6 17:40:15 Tower emhttp: pci-0000:03:00.0-scsi-1:0:0:0 host1 (sdb) ST3750640AS_5QD334MF

May 7, 201016 yr

This is very, very strange!

Have you run memtest lately? Select memtest from the boot menu and let it run overnight.

May 7, 201016 yr

Author

This is very, very strange!

Have you run memtest lately? Select memtest from the boot menu and let it run overnight.

I haven't...I had two sticks of memory which I've had for a while, one went bad a little over 6 months ago, so I wouldn't be surprised if this one was starting to fail. Good idea!

Can't stop array

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)