Jump to content

John_M

Members
  • Posts

    4,727
  • Joined

  • Last visited

  • Days Won

    12

Everything posted by John_M

  1. Do they show up in your diagnostics, or in response to lspci ?
  2. If you post your configuration or your docker run command someone might spot something that isn't quite right.
  3. That depends on the current settings for the Docker vdisk location. If it's currently set to /mnt/user/system/docker/docker.img (the default) then you don't need to change it but if it specifically references disk1 then you'll need to change it to reference disk3 instead. So, for example, /mnt/disk1/system/docker/docker.img would need to change to /mnt/disk3/system/docker/docker/img. I don't have any suggestions for a x4 controller but for a x8 one I'd suggest an LSI SAS controller. Alternatively, you can get a two port SATA controller based on the ASMedia ASM1061 or 1062 that fits in a x1 slot. You could then move a disk or two to the new controller and free up a motherboard port or two for SSDs.
  4. First thing is to test the RAM for 24 hours or so. If the RAM is bad you can't do anything else.
  5. Well, it isn't your syslog that's growing out of control so it's probably worth checking those container mappings again. You could have misspelt one or got the case wrong. Suppose you have a user share called "Downloads" but your container mapping points to /mnt/user/downloads instead of /mnt/user/Downloads - that will write to the root FS instead of to disk.
  6. It's a hardware fault that takes place as the CPU is starting up: Nov 19 06:38:07 Finalizer kernel: smpboot: CPU0: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (family: 0x6, model: 0x3c, stepping: 0x3) Nov 19 06:38:07 Finalizer kernel: mce: [Hardware Error]: Machine check events logged Nov 19 06:38:07 Finalizer kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 3: fe00000000800400 Nov 19 06:38:07 Finalizer kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffffa00ac3d3 MISC ffffffffa00ac3d3 Nov 19 06:38:07 Finalizer kernel: mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1574163469 SOCKET 0 APIC 0 microcode 27 It might indicate a faulty CPU but I'd do a MemTest first (select it from the boot menu if legacy booting) to try to eliminate faulty RAM. If the RAM passes the test you could use the Nerd Tools plugin to install mcelog, which might reveal more information. It looks like you also have cache pool corruption Nov 19 06:39:49 Finalizer kernel: BTRFS warning (device sdn1): csum failed root 5 ino 144339143 off 1912832 csum 0x3d38702b expected csum 0xf33e576a mirror 1 with one of the devices showing read errors: Nov 19 06:39:49 Finalizer kernel: BTRFS info (device sdn1): read error corrected: ino 144339143 off 1916928 (dev /dev/sdp1 sector 3106512)
  7. It does, but there's currently no nice GUI support for it so you would have to control it via the command line.
  8. Yes, you can move the docker.img if you want to - stop the docker service first - but it probably won't make much difference to performance. For a real performance gain and reduced hard disk activity consider getting a cache SSD.
  9. To use any NIC you need its driver to be installed, but Unraid doesn't support Fibre Channel anyway.
  10. Yes, you need the licence key from the backup of your config folder or from the original email when you bought it. It will be called Basic.key or Plus.key or Pro.key, depending on which level you purchased.
  11. Early in the startup a lease is requested but none is offered so a link local address is chosen: Nov 18 20:17:40 Tower kernel: igb 0000:01:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX Nov 18 20:17:40 Tower dhcpcd[1816]: br0: carrier acquired Nov 18 20:17:40 Tower kernel: bond0: (slave eth0): link status definitely up, 1000 Mbps full duplex Nov 18 20:17:40 Tower kernel: bond0: (slave eth0): making interface the new active one Nov 18 20:17:40 Tower kernel: device eth0 entered promiscuous mode Nov 18 20:17:40 Tower kernel: bond0: active interface up! Nov 18 20:17:40 Tower kernel: IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready Nov 18 20:17:40 Tower kernel: br0: port 1(bond0) entered blocking state Nov 18 20:17:40 Tower kernel: br0: port 1(bond0) entered forwarding state Nov 18 20:17:40 Tower dhcpcd[1816]: br0: soliciting a DHCP lease Nov 18 20:17:45 Tower dhcpcd[1816]: br0: probing for an IPv4LL address Nov 18 20:17:50 Tower dhcpcd[1816]: br0: using IPv4LL address 169.254.144.133 Nov 18 20:17:50 Tower dhcpcd[1816]: br0: adding route to 169.254.0.0/16 Nov 18 20:17:50 Tower dhcpcd[1816]: br0: adding default route Nov 18 20:17:50 Tower dhcpcd[1816]: forked to background, child pid 1872 Nov 18 20:17:50 Tower rc.inet1: ip link set br0 up A little later, dhcpd is woken up by the offer of a lease, so the link local address is replaced by the leased one: Nov 18 20:18:40 Tower dhcpcd[1872]: br0: offered 192.168.0.132 from 192.168.0.1 Nov 18 20:18:40 Tower dhcpcd[1872]: arp_new: Socket operation on non-socket Nov 18 20:18:40 Tower dhcpcd[1872]: br0: probing address 192.168.0.132/24 Nov 18 20:18:40 Tower dhcpcd[1872]: arp_probe: Socket operation on non-socket Nov 18 20:18:46 Tower dhcpcd[1872]: br0: leased 192.168.0.132 for 7200 seconds Nov 18 20:18:46 Tower avahi-daemon[2787]: Registering new address record for 192.168.0.132 on br0.IPv4. Nov 18 20:18:46 Tower dhcpcd[1872]: br0: adding route to 192.168.0.0/24 Nov 18 20:18:46 Tower dhcpcd[1872]: br0: changing default route via 192.168.0.1 Nov 18 20:18:46 Tower dhcpcd[1872]: arp_tryfree: Socket operation on non-socket Nov 18 20:18:46 Tower dhcpcd[1872]: br0: deleting route to 169.254.0.0/16 Then someone logs in from a host on the same network and appears to take the interface down: Nov 18 20:19:24 Tower webGUI: Successful login user root from 192.168.0.32 Nov 18 20:19:43 Tower ool www[3267]: /usr/local/emhttp/plugins/dynamix/scripts/netconfig 'eth0' Nov 18 20:19:43 Tower rc.inet1: ip -4 addr flush dev eth0 Nov 18 20:19:43 Tower rc.inet1: ip link set eth0 down Nov 18 20:19:43 Tower kernel: bond0: (slave eth0): link status definitely down, disabling slave Nov 18 20:19:43 Tower kernel: device eth0 left promiscuous mode Nov 18 20:19:43 Tower kernel: bond0: now running without any active interface! Nov 18 20:19:43 Tower kernel: br0: port 1(bond0) entered disabled state
  12. Check your DHCP server and cabling. Your NIC will assume a link local address if it doesn't succeed in soliciting an IP address from a DHCP server, which doesn't have much to do with whether or not bonding is used. Post your diagnostics (type diagnostics at the console).
  13. You can configure macOS not to store those files on network attached storage with defaults write com.apple.desktopservices DSDontWriteNetworkStores -bool true
  14. Ok. You might find it's better supported when you upgrade to Unraid version 6.8.
  15. Can you be more specific? Do you mean the Dynamix System Temperature plugin? Or something else?
  16. I see a completely normal startup. I think there's something corrupt in your config backup and if you copy it onto your USB flash, as you've done previously, it will cause crashing again. So don't do that. I see you have three 4TB disks and an NVMe but none are allocated, as you would expect from a vanilla installation. The data on your array will still be there and that can be sorted out once the licence is installed. Have you previously registered this USB flash device? Were you previously running on a trial licence or had you paid for one? You could copy it into the config folder on the flash if you can find the original email, or copy it from your backed up config, but don't copy anything else from that backup.
  17. Your settings look good and I've checked that your four selected time servers respond at least to a ping. I'm seeing a lot of this in your syslog: Nov 15 16:49:36 WeltoRaid emhttpd: shcmd (2897): /etc/rc.d/rc.ntpd stop Nov 15 16:49:36 WeltoRaid root: Stopping NTP daemon... Nov 15 16:49:36 WeltoRaid emhttpd: shcmd (2898): date -s '2019-11-15 19:37:16' Nov 15 19:37:16 WeltoRaid root: Fri Nov 15 19:37:16 MST 2019 Nov 15 19:37:16 WeltoRaid emhttpd: shcmd (2899): hwclock --utc --systohc --noadjfile Nov 15 19:37:19 WeltoRaid emhttpd: req (6): timeZone=America%2FPhoenix&USE_NTP=no&newDateTime=2019-11-15+19%3A37%3A16&setDateTime=Apply&csrf_token=**************** Nov 15 19:37:19 WeltoRaid emhttpd: shcmd (2900): ln -sf /usr/share/zoneinfo/America/Phoenix /etc/localtime-copied-from Nov 15 19:37:19 WeltoRaid emhttpd: shcmd (2901): cp /etc/localtime-copied-from /etc/localtime Nov 15 19:37:19 WeltoRaid emhttpd: shcmd (2902): /usr/local/emhttp/webGui/scripts/update_access Nov 15 19:37:19 WeltoRaid sshd[1834]: Received signal 15; terminating. Is that you manually setting the time? It appears to be being done via an ssh connection, rather than through the GUI. Each time it happens the ntp daemon is stopped. Normally the daemon runs continuously, gently adjusting the local time to keep it in step with your chosen time servers. It can't do that if it keeps being stopped. You can check your server's time relationship with the chosen time servers (peers) by issuing the ntpq -pn command at the console: root@Mandaue:~# ntpq -pn remote refid st t when poll reach delay offset jitter ============================================================================== 127.127.1.0 .LOCL. 10 l 12d 64 0 0.000 0.000 0.000 +193.150.34.2 74.156.100.179 2 u 892 1024 377 11.125 3.108 2.431 *178.79.160.57 85.199.214.98 2 u 637 1024 377 11.877 4.189 3.639 +129.250.35.251 249.224.99.213 2 u 280 1024 377 9.139 4.037 3.405 +212.71.248.69 206.189.118.143 3 u 374 1024 377 10.701 3.439 2.243 root@Mandaue:~# If you use watch ntpq -pn instead you can monitor the situation in real time, updated every two seconds. Press CTRL-C to exit. The -p option means show the state of the connections with the peers and the -n option means use IP addresses instead of domain names. The five rows are one for the local host and one each for the chosen peers or ntp servers. The one with the * in the first column is the peer to which we are most closely synchronised and the ones with a + are under consideration as potential replacements. The column marked 'st' indicates the stratum or layer the peer occupies in relation to other time servers. Left to its own devices our local host would sit in stratum 10 - a rather lowly position, as its internal clock is unreliable. Three of the four peers are in stratum 2 and the fourth is in stratum 3 so by being synchronised with a startum 2 peer, 178.79.160.57 our local host is elevated to stratum 3, the layer below that peer. The stratum 2 peer will, in turn, be referenced to a stratum 1 peer - probably with a GPS receiver. The column marked "when" shows how many seconds ago the peer was last polled. The column marked 'reach' is an important one. The number 377 is an octal value, representing eight bits, all set to 1: 11111111. When the peer is polled those eight bits are shifted one place to the left and the least significant one is replaced with a 1 if the poll was successful, a 0 if it was unsuccessful. So ideally, you would always see 377 in that column, except for a short time after the daemon has started up. Once it has been running for a while anything less than 377 indicates one or more polls that failed to get a response. How often is a peer polled? Well, that depends on the column marked 'poll'. The peer won't be polled again until the value in the 'when' column exceeds the value in the 'poll' column. So what I would do is stop manually setting the time for a while, open a console or ssh session and run watch ntp -qn and check for something connecting and remotely setting the time and stopping the ntp daemon.
  18. The trouble is, it doesn't work reliably. Marvell stuff worked well when Unraid used a 32-bit kernel - unfortunately, the compatibility guide is very out of date. The ASM1061 works fine out of the box controlling one or two disks. The ASM1806 is a PCI bridge that lets a card manufacturer hang mulltiple controllers across a single lane. Here's the original thread that brought the Marvell problem to light. Since then it has just got worse, with the workrounds failing too - if you do a search you'll find that many people have had problems.
  19. That's a nasty combination of a buggy chip and a port multiplier, all on a single PCIe lane. It will cause you a lot of problems as they are known to drop disks at random times. Either get an LSI-based SAS controller (which will need a x8 slot) or use all the motherboard SATA ports and buy an ASMedia 1061 or 1062-based dual port SATA controller for the extras.
  20. This thread might help with the PCIe Bus Error:
  21. You're welcome. Post your diagnostics if it happens again and you need help.
  22. In the BIOS make sure the Power Supply Idle Control setting is changed to Typical Current Idle, not the default Low Current Idle. It can be tricky to find, so look for Advanced -> AMD CBS -> Power Supply Idle Control.
  23. If you click on the blue "Disk 10" text you'll open that disk's Settings page. One of the items is "File system type". The easiest way to format the disk is to change it to something else and then change it back again. To change it you'll need to stop the array first. When you restart the array it will be formatted, with appropriate warnings. So, stop, change file system type, start, wait for format, stop, change format type back again, start, wait for format. However, the system shouldn't hang like you described. You may well have some underlying problem that needs fixing.
  24. I'm a big fan of simple solutions. We could have tried to unpick the changes you made to the XML that you posted but it's better to let the computer do the work when it can.
×
×
  • Create New...