Jump to content

gizmo000

Members
  • Posts

    20
  • Joined

  • Last visited

Everything posted by gizmo000

  1. Looks like the re-import is covered here ... does the export save anything I need to keep track of, or just make sure to run the export command? The only other thing I changed was /dev/mapper/sdX1 (Y1) because I had the drives encrypted. zpool is doing its thing and resilvering now! Thanks so much @JorgeB for the assistance here ... I was about to lose my mind.
  2. So I have been playing with this overnight ... I "wrote" a new config using the `New Config` tool (pool devices only), and assigned the same cache disks to the same slots, with the exception of my two failed drives. `zpool import -d /dev/mapper` (and `zpool import` for that matter) shows my two degraded vdevs with most disks online: pool: mypool id: [redacted] state: DEGRADED status: One or more devices contains corrupted data. action: The pool can be imported despite missing or damaged devices. The fault tolerance of the pool may be compromised if imported. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J config: mypool DEGRADED raidz2-0 ONLINE sdaw1 ONLINE sdax1 ONLINE sday1 ONLINE sdaz1 ONLINE sdba1 ONLINE sdbb1 ONLINE sdbc1 ONLINE sdbd1 ONLINE sdbe1 ONLINE sdbf1 ONLINE sdbg1 ONLINE sdbh1 ONLINE sdbi1 ONLINE sdbl1 ONLINE sdbm1 ONLINE raidz2-1 ONLINE sde1 ONLINE sdf1 ONLINE sdg1 ONLINE sdh1 ONLINE sdi1 ONLINE sdj1 ONLINE sdk1 ONLINE sdl1 ONLINE sdm1 ONLINE sdn1 ONLINE sdo1 ONLINE sdp1 ONLINE sdq1 ONLINE sdr1 ONLINE sds1 ONLINE raidz2-2 DEGRADED sdt1 ONLINE sdu1 ONLINE sdv1 ONLINE sdw1 ONLINE sdx1 ONLINE sdy1 ONLINE 12281917106237315780 UNAVAIL invalid label sdaa1 ONLINE sdab1 ONLINE sdac1 ONLINE sdae1 ONLINE sdag1 ONLINE sdaf1 ONLINE sdah1 ONLINE sdai1 ONLINE raidz2-3 DEGRADED sdad1 ONLINE sdaj1 ONLINE sdak1 ONLINE sdal1 ONLINE sdam1 ONLINE sdan1 ONLINE sdao1 ONLINE sdap1 ONLINE sdaq1 ONLINE sdar1 ONLINE sdas1 ONLINE 11665832322838174263 FAULTED corrupted data sdat1 ONLINE sdau1 ONLINE sdav1 ONLINE The array comes online now, which is great ... however, in the gui the "size, used and free" columns for all my zpool cache drives all show `Unmountable: Unsupported or no file system`. Also, at the bottom of the Main page, Unraid is asking to format all those pool drives (since the filesystem is detected as unsupported). If I assign 'no device' to both slots, the behavior is the same: unsupported or no file system (but the main array still starts). The `invalid label` results from a manual `cryptsetup luksFormat /dev/sdz1` that I was trying to get unraid to recognize in earlier experimenting.
  3. It seems even with two new disks installed, I cannot start the array to resilver. Whenever I try to start my array now (with either 1 or both new disks assigned in the zfs pool), I get "too many wrong or missing devices." My main unraid array will not even start either. I'm DIW, and starting to stress out... Since I have two drive failures, there is no way for me to get back to just 1 cache drive wrong or missing. If there's a way to do this via command line, I'm comfortable doing so, but I don't want to jeopardize the 300+ TiB I have on there right now. Any recommendations on how to proceed properly to restore my zpool? galileo-diagnostics-20231107-1454.zip
  4. Okay cool thanks for the quick replies My second drive failed after the reboot (separate vdev) so unraid is preventing me from starting anything (due to 2 missing disks in zpool).
  5. Jorge, thank you for referencing the post, but the troubleshooting there is from 2016...certainly Unraid's ZFS implementation has made improvements to the replacement strategy by now? I recently experienced 2 HDD failures in a 60-drive array (4 vdevs @15 each, RAIDZ2). Luckily, the failures occurred in separate vdevs. However, from the looks of the post you referenced ... if the 2 failures were in the same vdev, I'd have no way to replace/rebuild that vdev, given that RAIDZ2 provides that redundancy? What is the point of this implementation if we cannot recover from 2 failures in a single vdev with RAIDZ2? Also, I happened to notice that if I removed one failed HDD from one of my vdevs, the zpool kept functioning, albeit in a degraded status. I am waiting for replacement HDDs to arrive, and I was forced to reboot my server (array would not stop on its own so I could pull the failed disks). But now, I cannot even start the array (unraid array either, but specifically: the zpool), because I'm missing 1 disk from a vdev that has RAIDZ2 redundancy? Shouldn't I be able to start the zpool in a degraded status? This is a little bit inconvenient because now I cannot start any array until I have replacement disks, but I would think we should be able to at least start the zpool in a degraded status...certainly we should be able to start the unraid array, but I cannot seem to do that either while waiting for zpool replacement disks...
  6. 'latest' image still refuses to start up: 2023-10-02 11:05:38,666 INFO Set uid to user 0 succeeded 2023-10-02 11:05:38,670 INFO supervisord started with pid 16 2023-10-02 11:05:39,673 INFO spawned: 'dbus' with pid 17 2023-10-02 11:05:39,675 INFO spawned: 'avahi-daemon' with pid 18 2023-10-02 11:05:39,677 INFO spawned: 'squeezeboxserver' with pid 19 2023-10-02 11:05:39,688 INFO exited: avahi-daemon (exit status 255; not expected) 2023-10-02 11:05:39,789 INFO exited: squeezeboxserver (exit status 0; not expected) 2023-10-02 11:05:40,791 INFO success: dbus entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2023-10-02 11:05:40,794 INFO spawned: 'avahi-daemon' with pid 21 2023-10-02 11:05:40,797 INFO spawned: 'squeezeboxserver' with pid 22 2023-10-02 11:05:40,907 INFO exited: squeezeboxserver (exit status 0; not expected) 2023-10-02 11:05:42,718 INFO success: avahi-daemon entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2023-10-02 11:05:43,722 INFO spawned: 'squeezeboxserver' with pid 24 2023-10-02 11:05:43,841 INFO exited: squeezeboxserver (exit status 0; not expected) 2023-10-02 11:05:46,847 INFO spawned: 'squeezeboxserver' with pid 26 2023-10-02 11:05:46,966 INFO exited: squeezeboxserver (exit status 0; not expected) 2023-10-02 11:05:47,968 INFO gave up: squeezeboxserver entered FATAL state, too many start retries too quickly
  7. Ran this LMS container for years under 6.9 and recently upgraded Unraid to 6.12. Ever since switching to 6.12, I've had the "persistent process restarting" forever and LMS refuses to start and present itself to my logitech devices. Tried chmod'ing and chown'ing directories in the data folder to nobody:users & 755 as recommended earlier in this forum, but nothing seems to work. The most recent update pushed 3 days ago now presents another interesting problem ... docker run -d --name='LogitechMediaServer-latest' --net='proxynet' -e TZ="America/New_York" -e HOST_OS="Unraid" -e HOST_HOSTNAME="myunraid" -e HOST_CONTAINERNAME="LogitechMediaServer-latest" -e 'test'='yes' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.webui='http://[IP]:[PORT:9000]/' -l net.unraid.docker.icon='https://i.imgur.com/PoUuA3k.png' -p '3483:3483/tcp' -p '3483:3483/udp' -p '5354:5353/udp' -p '9000:9000/tcp' -p '9090:9090/tcp' -v '/mnt/user/music/SBox Server/My Music/':'/music':'rw' -v '/var/run/dbus':'/var/run/dbus':'rw' -v '/mnt/user/appdata/LogitechMediaServer-latest':'/config':'rw' 'snoopy86/logitechmediaserver' 9454622d76ca5c50871e83875238fee4c172867918cae0ebcaefeb15e87e2679 But this never shows up either ... logs: usermod: no changes 2023-09-17 11:54:36,230 INFO Set uid to user 0 succeeded 2023-09-17 11:54:36,234 INFO supervisord started with pid 17 2023-09-17 11:54:37,238 INFO spawned: 'dbus' with pid 18 2023-09-17 11:54:37,241 INFO spawned: 'avahi-daemon' with pid 19 2023-09-17 11:54:37,245 INFO spawned: 'squeezeboxserver' with pid 20 2023-09-17 11:54:37,392 INFO exited: squeezeboxserver (exit status 0; not expected) 2023-09-17 11:54:38,252 INFO success: dbus entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2023-09-17 11:54:38,252 INFO success: avahi-daemon entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2023-09-17 11:54:39,256 INFO spawned: 'squeezeboxserver' with pid 22 2023-09-17 11:54:39,378 INFO exited: squeezeboxserver (exit status 0; not expected) 2023-09-17 11:54:41,383 INFO spawned: 'squeezeboxserver' with pid 24 2023-09-17 11:54:41,502 INFO exited: squeezeboxserver (exit status 0; not expected) 2023-09-17 11:54:44,508 INFO spawned: 'squeezeboxserver' with pid 26 2023-09-17 11:54:44,629 INFO exited: squeezeboxserver (exit status 0; not expected) 2023-09-17 11:54:45,631 INFO gave up: squeezeboxserver entered FATAL state, too many start retries too quickly Not sure what else to try, as this container worked perfectly for so long. I'm also not sure what is using 5353, as the original docker run showed a conflict (but none was listed in the utilization list). I also thought about "rolling back" to a previous version, but can only seem to find the 'latest' tag on docker hub, with no ability to select a previous image
  8. I spoke too soon ... after another reboot, the WebUI came up with the appropriate IP address. Diagnostics are attached. diagnostics-20230316-1649.zip
  9. FWIW, the ASRock BIOS/splash screen seems to hang with "Update FRU System device..." before booting Unraid. Not finding much on this error either...
  10. Actually with the 10Gbe NIC installed, and eth0 properly assigned (5c is the HP 4-port Ethernet NIC): # PCI device 0x8086:0x150e (igb) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="ac::5c", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0" # PCI device 0x8086:0x150e (igb) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="ac::5d", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1" # PCI device 0x8086:0x150e (igb) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="ac::5e", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth2" # PCI device 0x8086:0x150e (igb) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="ac::5f", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3" # PCI device 0x8086:0x1563 (ixgbe) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="d0::18", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth4" # PCI device 0x8086:0x1563 (ixgbe) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="d0::19", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth5" # PCI device 0x8086:0x10fb (ixgbe) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="80::08", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth6" # PCI device 0x8086:0x10fb (ixgbe) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="80::09", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth7" Unraid is still failing to assign an IP on the ::5c MAC (eth0) interface. I wish I could get an IP so I could get you some diagnostics.
  11. Turns out my boot order was mangled in one of the reboots ... no idea why or how that would occur without me changing it. So the server was just sitting there not knowing what to boot. My BIOS won't seem to pick the right boot drive (tries to boot UEFI, but I boot Unraid legacy). If I manually boot, it starts just fine. So, I'll have to dig into the BIOS settings to see if I can't force it to skip the UEFI checks (I didn't see any obvious way to force this at first glance). It's an ASRockRack ROMED8-2T. So, I tried to re-install the 10Gbe adapter, which is utilizing the `ixgbe` driver that I happened to notice wouldn't get IP addresses in Unraid before. I already have an identical NIC in a workstation running linux (ixgbe) that picks up the IP address just fine. Is there a known bug in Unraid that prevents ixgbe from obtaining IP addresses?
  12. Link lights show up on the NIC, but IP addresses never register with my router. Ironically, when I had the 10Gbe NIC plugged in, the same behavior occurred ... I would get a good link, but no IP address. I have over 50 devices on the network, so I know DHCP is working properly ... swapping the hardware NIC was the only change(s) I made to the Unraid server.
  13. In one of my iterations, I had deleted both `network.cfg` and `network-rules.cfg`, but the system still failed to respond to anything (never showed up on my network).
  14. Please ... I just want to get my server back to its working state, but it currently won't boot. I was trying to upgrade the NIC to 10Gbe (which obviously failed). Now, removing the old card and just inserting the old HP 4-port NIC which had always worked ... Unraid will no longer boot. I've tried deleting /conf/network.cfg ... And I modified /conf/network-rules.cfg to: # PCI device 0x8086:0x1563 (ixgbe) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="d0::18", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth4" -- # PCI device 0x8086:0x1563 (ixgbe) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="d0::19", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth5" -- # PCI device 0x8086:0x150e (igb) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="ac::5c", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0" # PCI device 0x8086:0x150e (igb) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="ac::5d", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1" # PCI device 0x8086:0x150e (igb) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="ac::5e", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth2" # PCI device 0x8086:0x150e (igb) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="ac::5f", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3" The first two are my motherboard NIC(s) ... and they never worked. The remaining 4 are the HP 4-port NIC which always did work ... except they were renamed eth6-9 after attempting to add the 10Gbe NIC. So, I renamed them back to eth0-3. I've read in other posts that the GUI always starts on eth0, so I was hoping this would clue Unraid in to look at that ::5c MAC address, which is where I had always plugged a single Gigabit ethernet cable into. But, nothing ever appears to boot, my router never registers a new IP (or ... the reserved IP that ::5c MAC should get), and I'm stuck >12 hours now without my server Any help is appreciated getting my beast back up and running.
  15. Thanks! Found JTok's vmbackup plugin in CA ... also his github page is very informative.
  16. This appears to have solved my issue ... I'm back up with the correct drive assigned! And the VM boots up just fine. And since I've averted disaster, do you have any recommendations on how to cleanly or safely backup my VM disk images to the array? For example, a recommended plugin? I assume the VM would have to be shut down in order to back it up as well... Sorry for posting such an elementary issue, but you've really helped galileo-diagnostics-20230102-1135.zip
  17. Well the good news is that I have pulled the affected NVME and plugged it into another computer ... and the vdisk image is still there. So now, I just need to find out why unraid won't see the drive.
  18. I did in fact mean `vms_nvme` ... they are (were) both on the same type (and size) NVME SSD. To clarify ... `cache_nvme` has its own 1TiB SSD, and `vms_nvme` has its own 1TiB SSD.
  19. root@galileo:~# ls -lah /mnt/disk3/domains/ total 0 drwxrwxrwx 2 root root 6 Jan 2 09:23 ./ drwxrwxrwx 18 nobody users 286 Jan 2 09:33 ../
  20. My apologies for not grabbing the diagnostics before rebooting A few weeks ago, I set up a Win10 VM for my dad to manage some financials on ancient accounting software. I left the VM running after we set it up so he could practice VPNing in and connecting to the VM (which he did manage to do successfully without any problems). I didn't shut the VM down, so it sat there running for a few weeks. We tried to connect to the VM over Christmas ... even though it was listed as "started" in the VM tab, the VM no longer had an IP address and wasn't responding to pings. I tried to stop the VM from the VM tab, but it was more or less unresponsive. So, I resorted to the "force stop" option with the bomb icon (not sure what the wording actually was), and left it to troubleshoot when I returned home (now). Trying to restart the VM resulted in an execution error (`cannot read header for vdisk1.img` and `Input/output error`). Analyzing 'fix common problems' revealed that the drive containing my VM was either full or read only. As far as I could tell, this only happened after force stopping the VM. Its status on the main page indicated it was far from full (~49GiB used on a 1TiB nvme drive), so I elected to reboot the server (before collecting diagnostics -- my bad). The drive in question is a WD 1TB Blue SN550 Gen3 M.2 SSD. It no longer shows up as even being available to assign to `vms_nvme` which was originally in my pool devices. I suspect this drive has actually failed since unraid can no longer see it, but I'm curious if that could actually be the issue (the drive was less than a year old when I found the problem). I'm hesitant to pull it from the system without further advice, as I'm no longer sure that my vdisk1.img is even safe (I'm hoping the image file was saved and I can just replace with a new SSD to recreate). Any advice at this point is welcome ... I know this is probably a very basic problem. I've been so happy with unraid since abandoning Synology galileo-diagnostics-20230102-0924.zip
×
×
  • Create New...