Jump to content

tbone

Members
  • Posts

    15
  • Joined

  • Last visited

Everything posted by tbone

  1. I'm not sure if this is the problem, but I seem to recall having a similar problem with an older version of unRAID. In my case, it appeared that docker had a problem mapping directories from volumes that didn't exist when the docker service was started. Stopping and restarting the docker service (instead of just the container in question) fixed it for me.
  2. Well, I just turned off the C-States in the BIOS to test it like that for a while. Offhand it looks like the system has the same power consumption when idle as before, so there's not a downside there.
  3. Interesting; it looks like that is enabled by default on this board, so I can try turning it off. Were you also using a X8DTH board? Meanwhile, I did further testing; the machine was sitting idle with the array stopped, with the unraid dashboard page open, and pinging the machine every second. It failed after about 13 hours, so I pulled all the PCIe cards out of it; no GPUs or extra HBAs, and booted without plugins. Then it failed after about 5 hours. Next I turned off bridging and bonding of the NICs and made no other changes. It ran for 3.5 days, at which point unRAID 6.9.2 was released. I updated to that, put the cards back in it, and turned the bridging/bonding settings back on. It's still up after almost 3 days so far. So it could still be random and intermittent, but it is tracking with those network features. Seems weird since most reports of that bug dealt with using docker or KVM bridging interfaces, but who knows. I think if it stays up another day or so, I'll try switching services over to the new board and try it out for real.
  4. I recently picked up a Supermicro X8DTH-6 system to upgrade my unRAID server, and I'm running into problems where it locks up within a few days. No events get logged by IPMI, nothing gets sent to my syslog server, and nothing shows up on the console. It's unresponsive over the network or via the console when this happens. I've tested maxing out the CPU cores for a while, and sensors show no problems with heat or the power supply. The last time this happened, I tried doing the Alt-SysRq trick to see if I could get more diagnostic info out of it, but that didn't seem to do anything. I was hopeful after reading the thread about kernel panics caused by a bug in the macvlan, but it looked like that maybe only (or mostly?) affected people running docker containers with static IPs on br0. This machine isn't running docker or VMs yet, because there's no license on this flash drive (my licensed copy is running on a temporary server while I try to get this sorted out). The machine is obviously idle, but it's still hanging every couple of days. The machine itself is an X8DTH-6 with dual Xeon-E5645 CPUs, 96GB of ECC DDR3 RAM, a built-in LSI 2008-based SAS HBA, and an additional LSI 2008-based HBA in a PCIe slot. They're each flashed to IT mode with the most recent firmware. They're each connected to a SAS expander (one LSI, one HP) with some random drives attached for testing. There's an Nvidia Quadro P400 and a Geforce 750Ti in it, and a generic USB3 card. It has two built-in NICs running in active-backup mode (besides the dedicated IPMI NIC). It has an IPMI BMC with remote KVM, and the sensors all look okay. I'm not sure what the next step should be here. I can try rolling back to 6.8, or turning off bridging on the NIC(s), or installing a separate copy of linux onto a drive and running various burn-in tests. Am I missing something? tower2-diagnostics-20210405-0845.zip
  5. I noticed some stuff on my cache drive that was unexpected, and decided to track it down. First, if you have a file in a share that's cache-only, and move it to a share that's no-cache, it leaves the file on the cache directory. e.g. /mnt/user/Downloads == cache-only share /mnt/user/Books == no-cache share 'move /mnt/user/Downloads/foo /mnt/user/Books/' will move /mnt/cache/Downloads/foo to /mnt/cache/Books/foo By itself that wouldn't be a big deal, but it looks like the mover is specifically written to look for files in shares that are cache:yes, so it never gets around to moving those files off of the cache drive.
  6. I'm not blaming unraid for the clone disk having the same UUID, but it's not that bizarre a situation. The issue is that having a clone UUID on an unrelated spare disk broke unraid's process for adding a cache drive, in a way that could have caused some real problems. I would just suggest that they sanity-check the output of the btrfs command before using it as the input for another btrfs operation.
  7. I had a weird failure a while back when I tried to add a second drive to my cache pool, but I didn't have a chance to really track it down until today. The initial symptom was that I stopped the array, added the (blank) drive to the cache pool, and after starting the array, the main page of the web interface reported that both drives were in a pool, but no balance operation was happening, and no IO was happening with the new drive. My existing cache volume was /dev/sdh1, and I was trying to add /dev/sdb1 to that. The weird thing was what was in the logs for that operation: Jul 8 16:59:26 Tower emhttpd: shcmd (8880): /sbin/btrfs replace start /dev/sdq1 gen 1540 but found an existing device /dev/sdp1 gen 2898 /dev/sdb1 /mnt/cache & sdp1 and sdq1 have no relation to the cache drive; sdp1 is an existing drive in the array, and sdq1 is a spare drive. My guess is that unraid is parsing the output of a btrfs command (like "btrfs filesystem show") to obtain a volume UUID, but it was confused by the extra warning displayed in this case: root@Tower:~# btrfs fi show /dev/sdp1 WARNING: adding device /dev/sdq1 gen 1540 but found an existing device /dev/sdp1 gen 2898 ERROR: cannot scan /dev/sdq1: File exists Label: none uuid: 62f74f7e-1c39-4371-8e82-93c52566224a Total devices 1 FS bytes used 2.70TiB devid 1 size 2.73TiB used 2.73TiB path /dev/md13 The warning was there because sdq is a leftover 2TB data disk that was replaced with sdp (3TB), so both drives had the same btrfs UUID. Note that if you were trying to obtain the UUID of a drive, you could normally take the first line of the output and discard the first three words to get it. But in this case, instead of a UUID, you get that last chunk of that warning message, which it then included in the btrfs replace command. Once I figured out what was going on, I reset the UUID on sdq ("btrfstune -u /dev/sdq1"), stopped the array, removed the second drive, started it, stopped it, and re-added the second drive. The command was issued correctly, and it's now mirroring like a champ. tower-diagnostics-20190709-0143.zip
  8. I just poked at the "ERROR: List of process IDs must follow -p." issue, and found the problem. On line 460 of the cache_dirs script, there's a typo. Replace "/ver/run/mover.pid" with "/var/run/mover.pid". That should clear it up.
×
×
  • Create New...