montery Posted January 7, 2016 Share Posted January 7, 2016 Hi all! So, for the last little while now, Unraid 6.1.x has been crashing on me unexpectedly. Since upgrading to 6.1.6, at least the GUI crashes (partially) and I can access the terminal prompt, but the server goes off-line. I'm not sure what's causing this, but here's the syslog (attached) and a screen shot of the gui when it's dead, as well as a screen shot of the terminal when I've issued the powerdown script. Any idea which "device" that Powerdown complains about? Also, in the syslog, it complains that the mover has no space... I'll have to look at that drive to see if there is no space there. Thanks in advance! syslog.zip Quote Link to comment
BRiT Posted January 7, 2016 Share Posted January 7, 2016 In the meantime, what does "df -h" and "df -h /" produce? Also going forward, always try using the command "diagnostics" to produce a zip file that contains more information, such as what processes are running, it will also include disk share information and what you may be running from the go script. This will create a zip file under /boot/logs/machinename-diagnostics-yyyymmdd-HHMM.zip . I dont know if powerdown is having issues with a full log (/var/log/) or a full USB Flash drive (/boot). Quote Link to comment
BRiT Posted January 7, 2016 Share Posted January 7, 2016 You're running out of ram on /var/ filesystem. Also on the last day, mover complained about some filesystem running out of space as well. Line 44 writes to the /var/run/mover.pid file. If your /var/run is running out of space then essentially you're running out of RAM since /var/ is located in RAM. You need to do some of the following: Add more RAM to your system. Figure out what docker or other process is writing to the /var/ filesystem instead of on your cache drive or array drives. Stop using as many plugins as you're using. Jan 6 03:40:01 Tower logger: /usr/local/sbin/mover: line 44: echo: write error: No space left on device Also log rotate aborted abnormally, not sure why: Jan 6 04:40:01 Tower logrotate: ALERT - exited abnormally. Your syslog does indicate you had (have?) some errors with your BTRFS on /dev/sdn1. Jan 1 11:06:28 Tower kernel: BTRFS (device sdn1): parent transid verify failed on 1246248960 wanted 309699 found 303642 Jan 1 11:06:28 Tower kernel: BTRFS: read error corrected: ino 1 off 1246248960 (dev /dev/sdo1 sector 2434080) Jan 1 11:06:28 Tower kernel: BTRFS: read error corrected: ino 1 off 1246253056 (dev /dev/sdo1 sector 2434088) Jan 1 11:06:28 Tower kernel: BTRFS: read error corrected: ino 1 off 1246257152 (dev /dev/sdo1 sector 2434096) Jan 1 11:06:28 Tower kernel: BTRFS: read error corrected: ino 1 off 1246261248 (dev /dev/sdo1 sector 2434104) Jan 1 11:06:28 Tower kernel: BTRFS (device sdn1): parent transid verify failed on 1246232576 wanted 309699 found 303642 Jan 1 11:06:28 Tower kernel: BTRFS: read error corrected: ino 1 off 1246232576 (dev /dev/sdo1 sector 2434048) Jan 1 11:06:28 Tower kernel: BTRFS: read error corrected: ino 1 off 1246236672 (dev /dev/sdo1 sector 2434056) Jan 1 11:06:28 Tower kernel: BTRFS: read error corrected: ino 1 off 1246240768 (dev /dev/sdo1 sector 2434064) Jan 1 11:06:28 Tower kernel: BTRFS: read error corrected: ino 1 off 1246244864 (dev /dev/sdo1 sector 2434072) Jan 1 11:08:33 Tower kernel: BTRFS (device sdn1): parent transid verify failed on 1289977856 wanted 303764 found 300178 Jan 1 11:08:33 Tower kernel: BTRFS: read error corrected: ino 1 off 1289977856 (dev /dev/sdo1 sector 2519488) Jan 1 11:08:33 Tower kernel: BTRFS: read error corrected: ino 1 off 1289981952 (dev /dev/sdo1 sector 2519496) Jan 1 11:08:33 Tower kernel: BTRFS: read error corrected: ino 1 off 1289986048 (dev /dev/sdo1 sector 2519504) Jan 1 11:08:33 Tower kernel: BTRFS: read error corrected: ino 1 off 1289990144 (dev /dev/sdo1 sector 2519512) Quote Link to comment
montery Posted January 8, 2016 Author Share Posted January 8, 2016 You're running out of ram on /var/ filesystem. Ah, well, maybe that's because in my go file has a line in there: # resize tmpfs mount -o remount,size=1024m /var/log That was put in there due to an earlier suggestion to increase it as I have been having this problem for awhile. Here's the output from df -h and df -h / root@Tower:~# df -h Filesystem Size Used Avail Use% Mounted on tmpfs 1.0G 94M 931M 10% /var/log /dev/sda1 7.5G 179M 7.3G 3% /boot /dev/md1 2.8T 149G 2.6T 6% /mnt/disk1 /dev/md2 1.9T 99G 1.8T 6% /mnt/disk2 /dev/md3 1.9T 593G 1.3T 32% /mnt/disk3 /dev/md4 1.9T 225G 1.6T 13% /mnt/disk4 /dev/md5 1.9T 243G 1.6T 13% /mnt/disk5 /dev/md6 1.9T 1.5T 345G 82% /mnt/disk6 /dev/md7 1.9T 1.3T 593G 69% /mnt/disk7 /dev/md8 1.9T 1.6T 314G 84% /mnt/disk8 /dev/md9 1.9T 1.6T 315G 84% /mnt/disk9 /dev/md10 1.9T 1.5T 341G 82% /mnt/disk10 /dev/md11 1.9T 1.5T 328G 83% /mnt/disk11 /dev/md12 1.9T 1.8T 64G 97% /mnt/disk12 /dev/sdo1 224G 70G 154G 32% /mnt/cache shfs 23T 12T 11T 52% /mnt/user0 shfs 23T 12T 12T 52% /mnt/user /dev/loop0 10G 2.1G 6.3G 25% /var/lib/docker root@Tower:~# df -h / Filesystem Size Used Avail Use% Mounted on - 7.9G 1.9G 6.0G 24% / The latest diagnostics dump is located here: https://drive.google.com/file/d/0B_6rmB4u7fHESGY4T2Y4N1RYMVk/view?usp=sharing I notice that my syslog file is growing hugely, and I'm uncertain as to why. Would certainly contribute to a lack of space! But certainly the /var is something that I need to correct. How do I get Docker to use /var on the cache drive? Thanks! Quote Link to comment
montery Posted January 8, 2016 Author Share Posted January 8, 2016 Hm, looks like my log filled up due to Supermicro's IMPI fighting with Sensors module of the kernel: Jan 7 05:23:09 Tower kernel: w83795 0-002f: Failed to read from register 0x011, err -6 Jan 7 10:00:41 Tower kernel: i801_smbus 0000:00:1f.3: Timeout waiting for interrupt! Jan 7 10:00:41 Tower kernel: i801_smbus 0000:00:1f.3: Transaction timeout Jan 7 10:00:41 Tower kernel: i801_smbus 0000:00:1f.3: Failed terminating the transaction Jan 7 10:00:41 Tower kernel: i801_smbus 0000:00:1f.3: SMBus is busy, can't use it! Jan 7 10:00:41 Tower kernel: w83795 0-002f: Failed to read from register 0x010, err -16 Jan 7 10:00:41 Tower kernel: i801_smbus 0000:00:1f.3: SMBus is busy, can't use it! I googled around, and the W83795 is the chip my supermicro board uses, and feeds into libsensors. The filling up of the log with the "Failed to read from register xxxxx" error is, according to this a german site: ... because of the built- in attempts Supermicro IPMI simultaneously on the I2C ( SMBus ) access . Either disable the sensor module or IPMI . Haven't seen this before in the logs, so I'll remove these lines from my go script since I'm not using the sensors plugin anymore: # Added for Sensors: modprobe coretemp modprobe jc42 modprobe w83795 /usr/bin/sensors -s We'll see what happens after this! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.