tbone

Members
  • Posts

    15
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

tbone's Achievements

Noob

Noob (1/14)

0

Reputation

  1. I'm not sure if this is the problem, but I seem to recall having a similar problem with an older version of unRAID. In my case, it appeared that docker had a problem mapping directories from volumes that didn't exist when the docker service was started. Stopping and restarting the docker service (instead of just the container in question) fixed it for me.
  2. Well, I just turned off the C-States in the BIOS to test it like that for a while. Offhand it looks like the system has the same power consumption when idle as before, so there's not a downside there.
  3. Interesting; it looks like that is enabled by default on this board, so I can try turning it off. Were you also using a X8DTH board? Meanwhile, I did further testing; the machine was sitting idle with the array stopped, with the unraid dashboard page open, and pinging the machine every second. It failed after about 13 hours, so I pulled all the PCIe cards out of it; no GPUs or extra HBAs, and booted without plugins. Then it failed after about 5 hours. Next I turned off bridging and bonding of the NICs and made no other changes. It ran for 3.5 days, at which point unRAID 6.9.2 was released. I updated to that, put the cards back in it, and turned the bridging/bonding settings back on. It's still up after almost 3 days so far. So it could still be random and intermittent, but it is tracking with those network features. Seems weird since most reports of that bug dealt with using docker or KVM bridging interfaces, but who knows. I think if it stays up another day or so, I'll try switching services over to the new board and try it out for real.
  4. I recently picked up a Supermicro X8DTH-6 system to upgrade my unRAID server, and I'm running into problems where it locks up within a few days. No events get logged by IPMI, nothing gets sent to my syslog server, and nothing shows up on the console. It's unresponsive over the network or via the console when this happens. I've tested maxing out the CPU cores for a while, and sensors show no problems with heat or the power supply. The last time this happened, I tried doing the Alt-SysRq trick to see if I could get more diagnostic info out of it, but that didn't seem to do anything. I was hopeful after reading the thread about kernel panics caused by a bug in the macvlan, but it looked like that maybe only (or mostly?) affected people running docker containers with static IPs on br0. This machine isn't running docker or VMs yet, because there's no license on this flash drive (my licensed copy is running on a temporary server while I try to get this sorted out). The machine is obviously idle, but it's still hanging every couple of days. The machine itself is an X8DTH-6 with dual Xeon-E5645 CPUs, 96GB of ECC DDR3 RAM, a built-in LSI 2008-based SAS HBA, and an additional LSI 2008-based HBA in a PCIe slot. They're each flashed to IT mode with the most recent firmware. They're each connected to a SAS expander (one LSI, one HP) with some random drives attached for testing. There's an Nvidia Quadro P400 and a Geforce 750Ti in it, and a generic USB3 card. It has two built-in NICs running in active-backup mode (besides the dedicated IPMI NIC). It has an IPMI BMC with remote KVM, and the sensors all look okay. I'm not sure what the next step should be here. I can try rolling back to 6.8, or turning off bridging on the NIC(s), or installing a separate copy of linux onto a drive and running various burn-in tests. Am I missing something? tower2-diagnostics-20210405-0845.zip
  5. I noticed some stuff on my cache drive that was unexpected, and decided to track it down. First, if you have a file in a share that's cache-only, and move it to a share that's no-cache, it leaves the file on the cache directory. e.g. /mnt/user/Downloads == cache-only share /mnt/user/Books == no-cache share 'move /mnt/user/Downloads/foo /mnt/user/Books/' will move /mnt/cache/Downloads/foo to /mnt/cache/Books/foo By itself that wouldn't be a big deal, but it looks like the mover is specifically written to look for files in shares that are cache:yes, so it never gets around to moving those files off of the cache drive.
  6. I'm not blaming unraid for the clone disk having the same UUID, but it's not that bizarre a situation. The issue is that having a clone UUID on an unrelated spare disk broke unraid's process for adding a cache drive, in a way that could have caused some real problems. I would just suggest that they sanity-check the output of the btrfs command before using it as the input for another btrfs operation.
  7. I had a weird failure a while back when I tried to add a second drive to my cache pool, but I didn't have a chance to really track it down until today. The initial symptom was that I stopped the array, added the (blank) drive to the cache pool, and after starting the array, the main page of the web interface reported that both drives were in a pool, but no balance operation was happening, and no IO was happening with the new drive. My existing cache volume was /dev/sdh1, and I was trying to add /dev/sdb1 to that. The weird thing was what was in the logs for that operation: Jul 8 16:59:26 Tower emhttpd: shcmd (8880): /sbin/btrfs replace start /dev/sdq1 gen 1540 but found an existing device /dev/sdp1 gen 2898 /dev/sdb1 /mnt/cache & sdp1 and sdq1 have no relation to the cache drive; sdp1 is an existing drive in the array, and sdq1 is a spare drive. My guess is that unraid is parsing the output of a btrfs command (like "btrfs filesystem show") to obtain a volume UUID, but it was confused by the extra warning displayed in this case: root@Tower:~# btrfs fi show /dev/sdp1 WARNING: adding device /dev/sdq1 gen 1540 but found an existing device /dev/sdp1 gen 2898 ERROR: cannot scan /dev/sdq1: File exists Label: none uuid: 62f74f7e-1c39-4371-8e82-93c52566224a Total devices 1 FS bytes used 2.70TiB devid 1 size 2.73TiB used 2.73TiB path /dev/md13 The warning was there because sdq is a leftover 2TB data disk that was replaced with sdp (3TB), so both drives had the same btrfs UUID. Note that if you were trying to obtain the UUID of a drive, you could normally take the first line of the output and discard the first three words to get it. But in this case, instead of a UUID, you get that last chunk of that warning message, which it then included in the btrfs replace command. Once I figured out what was going on, I reset the UUID on sdq ("btrfstune -u /dev/sdq1"), stopped the array, removed the second drive, started it, stopped it, and re-added the second drive. The command was issued correctly, and it's now mirroring like a champ. tower-diagnostics-20190709-0143.zip
  8. Actually, I guess the simplest thing at this point would be to toss a couple more drives on the expander, and try the test with and without the second SAS cable connecting it.
  9. Well, the combined raw speed of the drives that are attached to it at the moment is around 1250MB/s, and it doesn't seem to have a problem reaching that, so that's a good sign. Although, I thought the v1.52 firmware that added support for SATA-2 drive speeds (on the green HP expanders) also added support for 6G SAS links. In that case, wouldn't the single-link performance between that and the LSI card be closer to 2400MB/s?
  10. I've been running some of my drives in an external chassis, connected via an LSI 9200-8e HBA. I just picked up an HP SAS Expander, partly so I'd be able to put more than 8 drives in that chassis, and partly so that the external SAS cables would be using SAS signaling rather than SATA. I have both ports on the LSI connected to the expander, in hopes of dual-linking it for greater bandwidth. So far everything's working, but I don't see any sign that the thing is dual-linked. The expander only shows up once in lsscsi, and nothing in the logs seems to indicate a link speed or number of lanes. Right now the number of drives that are connected wouldn't necessarily exceed the bandwidth of a single lane, so I can't just hammer on everything and benchmark to see whether it's working. Is there anything I can check to verify that it's doing what it should be doing?
  11. I don't suppose there's a way to bring up the array in a degraded state, without a replacement for the other dead disk? I have other drives that I could copy files over to, but not one of the same size.
  12. My server just experienced some trauma while moving, and two of my drives are toast. I happen to have a current dd clone of one of those disks (whew!), but I haven't been able to find a way to convince unRAID that the cloned disk is to be trusted, so the array can start up so I can get the contents off of the other dead drive. The management interface just says "WRONG DISK". I can't just run initconfig if I'm still down a drive. Any ideas?
  13. I just poked at the "ERROR: List of process IDs must follow -p." issue, and found the problem. On line 460 of the cache_dirs script, there's a typo. Replace "/ver/run/mover.pid" with "/var/run/mover.pid". That should clear it up.
  14. So I've just started using unRAID in the last week, and I'm running 4.5 beta 6, which appears to include kernel 2.6.29.1. Does anybody know if this means it's safe to use with an LSI 1068E SAS controller?