JohnO

Members
  • Posts

    123
  • Joined

  • Last visited

Everything posted by JohnO

  1. I too would love to see some more information on switching from plop to plopkexec. Upgrading from 6.1.9 as a VM to 6.2 would be a great time to try this. Thanks, John.
  2. Sad that this seems to be the current point of view -- I specifically chose unRAID a couple of years ago because it was well-behaved as a VMware ESXi guest. I do hope that the apparent kernel issue (based on the conjecture from the other linked thread) can be resolved before 6.2 is released. On the other hand, 6.19 is working well for me on VMware, and will likely perform well for my simple needs for at least another year or so, in case I need to look for an alternative.
  3. I haven't heard of a resolution. It only seems to impact certain controllers. My old LSI SAS3041E completes the parity check in about the same time as before, if not a little faster.
  4. Greetings, Just downloaded the update. It looks good except this error in your test display area: Here's what share free space looks like: snmpwalk -v 2c localhost -c public NET-SNMP-EXTEND-MIB::nsExtendOutLine."sharefree" NET-SNMP-EXTEND-MIB::nsExtendOutLine."sharefree".1 = STRING: /usr/local/emhttp/plugins/snmp/share_free_space.sh: line 16: allocator=\"mostfree\" * 1024: syntax error: operand expected (error token is \"\"mostfree\" * 1024\") Thanks!
  5. Some additional data points below: TL;DR: All 6.x releases are performing about the same for me. My 6.0 parity checks are faster than my 5.x parity checks, but I've replaced some older drives since that, so that is not a valid comparison. My environment: unRAID is a VM on VMware ESXi 6.0. The VM has 1.8 GB memory allocated to it, as well as two processor cores. The underlying CPU is an 8-core AMD FX-8350. I am running 2 TB WD RED drives (3 data, 1 parity). My disk controller is dedicated "passed-through" to the unRAID VM. It's a LSI SAS3041E controller. The array is about 35% utilized. Disk format is XFS. Release Performance 6.1.3 5:15:36 105.6 MB/sec 6.1.1 4:59:28 113.3 MB/sec 6.0.1 5:05:53 109.0 MB/sec John
  6. The parity check completed successfully. After running for a couple of hours, I upgraded to 6.1.2, and used the unRAID GUI to reboot. I immediately started seeing the same sort of disk controller errors that started this journey! I shutdown the unRAID VM, and then did the same VMware host power down I did yesterday. After shutting everything down, and physically removing power, I waited for about a minute, then powered everything back up. That was about 8 hours ago. All has been running clean since. I don't know what it is - but something about the leap from 6.0.1 to 6.1.x really needed to see power removed from the server. Just rebooting from the VM was not enough to clear out something. I've added my current diagnostic logs for completeness, but will mark this as RESOLVED! Thanks RobJ! John oshtank-diagnostics-20150912-1754.zip
  7. Rebuild completed successfully with no errors. Parity Check underway. John
  8. The re-build is underway! Thanks again for your help. John
  9. Syslog still looks clean (attached). I had had a similar issue a few weeks ago with a video card passed through to Windows. I had done a Windows Update, and restart the VM, and the video card started to freeze up after about 30 seconds of use. I had assumed it was related to the Windows Update, in the same way that I initially thought my unRAID problem was related to the 6.1 upgrade. After seeing unRAID working, I went ahead and re-added the video card to my Windows VM (via passthrough) and now it is working as well! What would you recommend to bring the disabled drive back online? The message says the device is disabled, and contents are emulated. Should I follow the instructions to re-enable the drive that are here: http://lime-technology.com/wiki/index.php?title=Troubleshooting#Re-enable_the_drive or should I follow the steps to check and fix an XFS disk here: http://www.lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui or should I do something else entirely? Thanks for any guidance! John oshtank-diagnostics-20150911-1306.zip
  10. OK - this is looking better. For my next step in troubleshooting, I shutdown all my VMs, powered down the VM host, unplugged it from the UPS (it has not been unplugged in over a year), re-seated disk controller card, and checked all SATA connections (they all seemed tight), powered up the host, and re-started the unRAID VM. After a few minutes without error, I brought up my other VMs, including a Linux Host that NFS mounts the unRAID drives and uses it for CrashPlan backups. It's been 45 minutes with no console errors! There are 7 backup jobs running. I also used my TiVo to pull a movie from the unRAID NAS. No obvious problems there either. Maybe this was some weird VMhost and power issue? I've attached the diagnostic logs for the time since boot. John oshtank-diagnostics-20150911-1008.zip
  11. OK - thanks for the info. Time to start troubleshooting in earnest. Very good to know. Thanks, Rob. John
  12. Ok -- I rolled back to 6.01, and the errors are there - but seemed somewhat reduced. Sigh. Maybe the controller card is bad. Not sure. Suggestions? Here is what I see in the syslog now: Sep 10 21:44:41 OshTank rpc.mountd[7924]: authenticated mount request from 192.168.62.113:804 for /mnt/user/Vault (/mnt/user/Vault) Sep 10 21:45:58 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006c3b4c00) Sep 10 21:45:58 OshTank kernel: sd 3:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 75 1c 63 a8 00 00 08 00 Sep 10 21:46:28 OshTank kernel: mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x24000000 Sep 10 21:46:28 OshTank kernel: mptbase: ioc0: Initiating recovery Sep 10 21:46:37 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006c3b4c00) Sep 10 21:46:37 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006c3b4a80) Sep 10 21:46:37 OshTank kernel: sd 3:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 ae a8 67 80 00 00 08 00 Sep 10 21:46:37 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006c3b4a80) Sep 10 21:46:37 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006c784000) Sep 10 21:46:37 OshTank kernel: sd 3:0:3:0: [sde] tag#0 CDB: opcode=0x28 28 00 b3 f1 90 18 00 00 08 00 Sep 10 21:46:37 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006c784000) Sep 10 21:47:11 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006c784300) Sep 10 21:47:11 OshTank kernel: sd 3:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 ae a9 08 88 00 04 00 00 Sep 10 21:47:41 OshTank kernel: mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x24000000 Sep 10 21:47:41 OshTank kernel: mptbase: ioc0: Initiating recovery Sep 10 21:47:50 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006c784300) Sep 10 21:47:50 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006c784000) Sep 10 21:47:50 OshTank kernel: sd 3:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 ae a9 08 88 00 04 00 00 Sep 10 21:47:50 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006c784000) Sep 10 21:47:50 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006c784c00) Sep 10 21:47:50 OshTank kernel: sd 3:0:3:0: [sde] tag#0 CDB: opcode=0x28 28 00 ae a9 08 88 00 04 00 00 Sep 10 21:47:50 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006c784c00) Sep 10 21:48:26 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006c784f00) Sep 10 21:48:26 OshTank kernel: sd 3:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 ae d8 da 68 00 03 00 00 Sep 10 21:48:56 OshTank kernel: mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x24000000 Sep 10 21:48:56 OshTank kernel: mptbase: ioc0: Initiating recovery Sep 10 21:49:05 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006c784f00) Sep 10 21:49:05 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006c3b5500) Sep 10 21:49:05 OshTank kernel: sd 3:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 ae d8 d9 68 00 04 00 00 Sep 10 21:49:05 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006c3b5500) Sep 10 21:49:05 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006c3b4a80) Sep 10 21:49:05 OshTank kernel: sd 3:0:3:0: [sde] tag#0 CDB: opcode=0x2a 2a 00 ae a9 a1 c8 00 04 00 00 Sep 10 21:49:05 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006c3b4a80) Suggestions? If I were to get a replacement controller, could I just move the drives and expect them to be recognized correctly, or would I have to reformat everything and start from scratch? Thanks for any advice. John oshtank-diagnostics-20150910-2150.zip
  13. Ok -- I rolled back to 6.01, and the errors seemed to have stopped accumulating. Of course, the one drive is still disabled. What is the next recommended course of action? Should I try to re-build the drive? I have attached a fresh diagnostic captured after my rollback and reboot. Thanks for any advice! John oshtank-diagnostics-20150910-2128.zip
  14. I'm just heading off to sleep, so I'll try this tomorrow after work. Thanks for the confirmation on that part of the process. I'm using VMware ESXi, which is their "Enterprise-class" Hypervisor. In a smart move, VMware lets you use that Hypervisor for free in small environments. The configuration options are very granular. I can allocate in very small chucks - I manually selected 1856 MB, as you can see in the screen shot attached below from the VMware vSphere configuration client. Thanks again, John
  15. Thanks very much for the detailed feedback. Since it seems like such a simple thing to try, I'd like to roll back to 6.01, which I had been running for some time without issue. I see the /boot/previous folder, but I'm not sure of the correct way to roll back. Is it as simple as copying the files to /boot and re-starting? I'm guessing that even if it runs without errors, I'll still have to rebuild the one drive that is disabled. Is that correct? If I continue to see problems after rolling back, then I'll dig further. Does that plan of attack make sense to you? I can certainly allocate more RAM. My understanding was that it was not required, as I'm not using dockers, and other VMs within unRAID. If you recommend I increase RAM, I can certainly increase it. Thanks again for your assistance! John
  16. Anyone able to look at my diagnostic information and offer suggestions? Thanks, John
  17. Also -- the syslog is filling with the following: mptscsih: ioc0: attempting task abort! (sc=ffff88006aa2e000) Sep 6 13:10:44 OshTank kernel: sd 3:0:3:0: [sde] tag#0 CDB: opcode=0x2a 2a 00 ae a8 66 a0 00 00 18 00 Sep 6 13:11:14 OshTank kernel: mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x24000000 Sep 6 13:11:14 OshTank kernel: mptbase: ioc0: Initiating recovery Sep 6 13:11:23 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006aa2e000) Sep 6 13:16:02 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006c453c80) Sep 6 13:16:02 OshTank kernel: sd 3:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 3f 0f f1 48 00 00 08 00 Sep 6 13:16:32 OshTank kernel: mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x24000000 Sep 6 13:16:32 OshTank kernel: mptbase: ioc0: Initiating recovery Sep 6 13:16:32 OshTank kernel: blk_update_request: I/O error, dev sde, sector 0 Sep 6 13:16:41 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006c453c80) Sep 6 13:16:41 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006c453680) Sep 6 13:16:41 OshTank kernel: sd 3:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 3f 0f f1 48 00 00 08 00 Sep 6 13:16:41 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006c453680) Sep 6 13:16:48 OshTank emhttp: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog Sep 6 13:17:12 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006aa2e000) Sep 6 13:17:12 OshTank kernel: sd 3:0:2:0: [sdd] tag#0 CDB: opcode=0x28 28 00 3f 10 0e 58 00 00 08 00 Sep 6 13:17:42 OshTank kernel: mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x24000000 Sep 6 13:17:42 OshTank kernel: mptbase: ioc0: Initiating recovery Sep 6 13:17:51 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006aa2e000) Sep 6 13:17:51 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006aa2e180) Sep 6 13:17:51 OshTank kernel: sd 3:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 3f 10 0e 58 00 00 08 00 Sep 6 13:17:51 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006aa2e180) Sep 6 13:17:51 OshTank kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88006aa2e480) Sep 6 13:17:51 OshTank kernel: sd 3:0:3:0: [sde] tag#0 CDB: opcode=0x28 28 00 3f 10 0e 58 00 00 08 00 Sep 6 13:17:51 OshTank kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88006aa2e480)
  18. I've restarted the unRAID VM, and started the array. The one drive is still "x'd" out, and is disabled. The read error column is now zero'd out. The SMART tests results are all good. Not sure what to do next. Should I try to re-enable the disabled drive? Thanks, John
  19. I would not expect disk errors to show up as part of an upgrade either. I guess my question is around the way that the disk error information is collected. Could it have changed somehow? Thanks to the daily email notifications, I am sure that unRAID was not reporting any disk errors before the upgrade. Thanks, John
  20. Greetings, I've been running unRAID for about 18 months. I've been running 6.0 since it was released, and I upgraded to 6.1 yesterday. Since the upgrade, unRAID has been reporting read errors, and now one of 4 disks is offline. I have not physically touched the hardware in weeks. The physical machine diagnostics (temperature, etc.) all seem fine. The SMART reports all seem fine. This unRAID environment is a guest on ESXi 6.0. The disk controller and all attached disks are "passed through" to unRAID so that unRAID has complete control of the disks. I restarted the VM after the upgrade, but have not restarted since. Not sure if that would help or hurt at this point. I'm tempted to roll back to 6.0 as this seems to be a mighty coincidence. I've attached the diagnostics report. Thanks for any ideas. John oshtank-diagnostics-20150906-0833.zip
  21. I tried just deleting and re-installing the plug-in, and behavior didn't change. I did check permissions and they were set as you suggested. I then deleted the SNMP plug-in, upgraded from 6.01 to 6.1, and re-installed the plug-in and now it works! +============================================================================== | Testing SNMP by listing mounts +============================================================================== Looks like snmpd is working... Output: HOST-RESOURCES-MIB::hrFSMountPoint.22 = STRING: "/var/log" HOST-RESOURCES-MIB::hrFSMountPoint.25 = STRING: "/boot" Here's what drive temperatures look like: snmpwalk -v 2c localhost -c public NET-SNMP-EXTEND-MIB::nsExtendOutLine."disktemp" NET-SNMP-EXTEND-MIB::nsExtendOutLine."disktemp".1 = STRING: WDC_WD20EFRX-68EUZN0_WD-WCC4M0927675: 28 NET-SNMP-EXTEND-MIB::nsExtendOutLine."disktemp".2 = STRING: WDC_WD20EFRX-68EUZN0_WD-WCC4M4PJF00N: 29 NET-SNMP-EXTEND-MIB::nsExtendOutLine."disktemp".3 = STRING: WDC_WD20EFRX-68EUZN0_WD-WCC4M3PHD4DE: 29 NET-SNMP-EXTEND-MIB::nsExtendOutLine."disktemp".4 = STRING: WDC_WD20EFRX-68EUZN0_WD-WCC4M0072449: 27 Here's what share free space looks like: snmpwalk -v 2c localhost -c public NET-SNMP-EXTEND-MIB::nsExtendOutLine."sharefree" NET-SNMP-EXTEND-MIB::nsExtendOutLine."sharefree".1 = STRING: Thanks, John
  22. Working backwards: That seems to work fine: root@OshTank:~# bash /etc/rc.d/rc.snmpd stop Shutting down snmpd: . DONE root@OshTank:~# bash /etc/rc.d/rc.snmpd start Starting snmpd: /usr/sbin/snmpd -LF w /var/log/snmpd.log -LF w /var/log/snmpd.log -A -p /var/run/snmpd -a -c /usr/local/emhttp/plugins/snmp/snmpd.conf root@OshTank:~# It doesn't find the disk temp stuff if I run it manually: root@OshTank:~# !snmpwalk snmpwalk -v 2c localhost -c public 'NET-SNMP-EXTEND-MIB::nsExtendOutLine."disktemp"' NET-SNMP-EXTEND-MIB::nsExtendOutLine."disktemp" = No Such Instance currently exists at this OID root@OshTank:~# It looks like you are trying to execute the whole statement below as a single command, but that didn't work for me: root@OshTank:~# root@OshTank:~# tail -n 1 /usr/local/emhttp/plugins/snmp/snmpd.conf extend disktemp /usr/local/emhttp/plugins/snmp/drive_temps.sh ==> /usr/local/emhttp/plugins/snmp/snmpd.conf <== disk /mnt/cache tail: cannot open ‘extend’ for reading: No such file or directory tail: cannot open ‘disktemp’ for reading: No such file or directory ==> /usr/local/emhttp/plugins/snmp/drive_temps.sh <== exit 0 root@OshTank:~# Here is some other stuff that might be useful: root@OshTank:/usr/local/emhttp/plugins/snmp# more snmpd.conf rocommunity public syslocation Here syscontact root@tower disk /mnt/disk1 disk /mnt/disk2 disk /mnt/disk3 disk /mnt/disk4 disk /mnt/disk5 disk /mnt/disk6 disk /mnt/disk7 disk /mnt/disk8 disk /mnt/disk9 disk /mnt/disk10 disk /mnt/disk11 disk /mnt/disk12 disk /mnt/disk13 disk /mnt/disk14 disk /mnt/disk15 disk /mnt/disk16 disk /mnt/disk17 disk /mnt/disk18 disk /mnt/disk19 disk /mnt/disk20 disk /mnt/cache root@OshTank:/usr/local/emhttp/plugins/snmp# root@OshTank:/usr/local/emhttp/plugins/snmp# root@OshTank:/usr/local/emhttp/plugins/snmp# cat drive_temps.sh #!/usr/bin/bash MDCMD=/root/mdcmd AWK=/usr/bin/awk CAT=/usr/bin/cat FIND=/usr/bin/find GREP=/usr/bin/grep RM=/usr/bin/rm SED=/usr/bin/sed HDPARM=/usr/sbin/hdparm SMARTCTL=/usr/sbin/smartctl CACHE=/tmp/plugins/snmp/drive_temps.txt mkdir -p $(dirname $CACHE) # Cache the results for 5 minutes at a time, to speed up queries if $FIND $(dirname $CACHE) -mmin -5 -name drive_temps.txt | $GREP -q drive_temps.txt then $CAT $CACHE exit 0 fi $RM -f $CACHE $MDCMD status | $GREP '\(rdevId\|rdevName\).*=.' | while read -r device do read -r name # Double-check the data to make sure it's in sync device_num=$(echo $device | $SED 's#.*\.\(.*\)=.*#\1#') name_num=$(echo $name | $SED 's#.*\.\(.*\)=.*#\1#') if [[ "$device_num" != "$name_num" ]] then echo 'ERROR! Couldn'"'"'t parse mdcmd output. Command was:' echo 'mdcmd status | $GREP '"'"'\(rdevId\|rdevName\).*=.'"'"' | while read -r device' fi device=$(echo $device | $SED 's#.*=#/dev/#') name=$(echo $name | $SED 's/.*=//') if ! $HDPARM -C $device 2>&1 | $GREP -cq standby then temp=$($SMARTCTL -A $device | $GREP -m 1 -i Temperature_Celsius | $AWK '{print $10}') fi # For debugging # echo "$name = $device, $temp" echo "$name: $temp" >> $CACHE done $CAT $CACHE exit 0 root@OshTank:/usr/local/emhttp/plugins/snmp# Thanks for your help! John
  23. That worked: root@OshTank:~# /usr/local/emhttp/plugins/snmp/drive_temps.sh WDC_WD20EFRX-68EUZN0_WD-WCC4M0927675: 28 WDC_WD20EFRX-68EUZN0_WD-WCC4M4PJF00N: 28 WDC_WD20EFRX-68EUZN0_WD-WCC4M3PHD4DE: 29 WDC_WD20EFRX-68EUZN0_WD-WCC4M0072449: 27 root@OshTank:~#
  24. Hmm... On the plus side, this plug-in architecture works great! Here's the fresh output - no real change: +============================================================================== | Testing SNMP by listing mounts +============================================================================== Looks like snmpd is working... Output: HOST-RESOURCES-MIB::hrFSMountPoint.1 = STRING: "/mnt/disk1" HOST-RESOURCES-MIB::hrFSMountPoint.2 = STRING: "/mnt/disk2" HOST-RESOURCES-MIB::hrFSMountPoint.3 = STRING: "/mnt/disk3" HOST-RESOURCES-MIB::hrFSMountPoint.22 = STRING: "/var/log" HOST-RESOURCES-MIB::hrFSMountPoint.25 = STRING: "/boot" Here's what drive temperatures look like: NET-SNMP-EXTEND-MIB::nsExtendOutLine."disktemp" = No Such Instance currently exists at this OID ----------------------------------------------------------- snmp has been installed. Copyright 2015, David Coppit Version: 2015.08.25 ----------------------------------------------------------- plugin: updated unRAID can see the temps - as shown in the screen shot below. To complicate things - My unRAID system is a virtual guest on VMware ESXi. The disk controller is "passed through" to the VM, and is dedicated to the VM, as are all the attached disks. Thanks, John