thither

Members
  • Posts

    51
  • Joined

  • Last visited

Everything posted by thither

  1. As a historical note, I downgraded back to 6.11.5 and have had 7 days of uninterrupted uptime since then.
  2. Answering my own question about the docker images, what I did was to use "Add Container" from the docker page for every image I wanted to add, then under "templates" I selected the template from the "users" section, created new containers with names in the form "sonarr-6-11", and set them to autostart instead of the old ones. The "add container" form had all the right data prepopulated correctly and so far everything has worked without a hitch.
  3. Hi. Unraid 6.12 has been nothing but frustration for me, and after enduring several weeks of needing to power-cycle it once a day I've downgraded my server back down to 6.11.5. Some of the stuff I use unraid for is still working fine, notably the SMB shares and my Plex and Calibre docker images, but the majority of my docker images won't start; trying to run them from the docker panel gives me an error message "Execution error - Server error". (These were all working fine in 6.12.) I get this block of logs in the unRaid logs when this happens: Jan 24 10:48:33 Eurydice kernel: docker0: port 3(vethc95c9cf) entered blocking state Jan 24 10:48:33 Eurydice kernel: docker0: port 3(vethc95c9cf) entered disabled state Jan 24 10:48:33 Eurydice kernel: device vethc95c9cf entered promiscuous mode Jan 24 10:48:33 Eurydice kernel: docker0: port 3(vethc95c9cf) entered blocking state Jan 24 10:48:33 Eurydice kernel: docker0: port 3(vethc95c9cf) entered forwarding state Jan 24 10:48:33 Eurydice kernel: docker0: port 3(vethc95c9cf) entered disabled state Jan 24 10:48:33 Eurydice kernel: cgroup: Unknown subsys name 'elogind' Jan 24 10:48:33 Eurydice kernel: docker0: port 3(vethc95c9cf) entered disabled state Jan 24 10:48:33 Eurydice kernel: device vethc95c9cf left promiscuous mode Jan 24 10:48:33 Eurydice kernel: docker0: port 3(vethc95c9cf) entered disabled state I'm not really sure what do here, the elogind error doesn't kick up anything for me in a web search. Should I uninstall and reinstall these images? Fix Common Problems also gives me a warning about a docker patch that I need to download, but I can't download it, because Fix Common Problems sends me to the Community Apps page, which tells me I need to download 6.12 before I use it. Is there anywhere I can manually download and apply this patch?
  4. Sorry to piggyback on the thread, but is there any solution for re-downloading the 6.11 version of community applications? I've also downgraded from 6.12 back to 6.11 (to fix some server crashes that started during 6.12) and I'm seeing the same thing.
  5. Thinking about this a little more, the weird thing is that it seems like something is just eating up all the CPU, whereas I would expect a hardware fault to result in a kernel panic or something. Is there something I can run that will give me a graph or log of historical CPU load, or maybe load per docker container or something?
  6. I'm not running NFS, no. Actually I just downgraded back to 6.12.4 and while the system was stable for longer, it's now frozen up again. I'm guessing this points to some kind of hardware fault in my server, which just happened to rear its head after the upgrade.
  7. Well, the system froze again and I restarted it. Here's my syslog and syslog-previous from the flash drive. I restarted at around 4:00am, and then again this morning at 10:30. I personally can't see much suspicious here, I have community apps set to auto-update nightly and that runs. It does seem a little weird that the system boots at 4:08 and the "Unraid API started" message doesn't appear until 7:30, but I don't know how normal that is. Anyways, I'm going to roll back to 6.12.4 and will report back on whether that seems more stable. syslog.txt syslog-previous.txt
  8. Thanks! I ran memtest off the boot drive and it seemed to freeze after printing "Loading memtest... ok". I'll poke around to see what that's about; I don't see my exact motherboard listed here but there are some very similar ones that seem to have issues. Copying the 6.12.4 files into a "previously" folder did give me an option to rollback through the web GUI again, but I'm going to hold off to see if I can get useful syslog info out of the current distribution before I do so. If I do roll back, I assume it will still use my old configs, right? (So I won't need to set up my dockers and array again?)
  9. I'm seeing server crashes roughly once a day after going from 6.12.4 to 6.12.6. My symptoms are exactly the same as in this bug report: I can ping the machine and the console is responsive, but when I try to log in it just freezes after I enter the username. I get "504 Gateway Time-out" nginx errors on the web interface, and none of the dockers are responsive. Diagnostics attached. As far as I can tell I don't have realtek or adaptec hardware. My docker is running ipvlan, not macvlan (it was set to that in 6.12.4 already, and was working fine). (Edit to add: I don't have any VMs running, unlike in the above bug report, and "Fix common problems" doesn't show anything except a warning about syslog being mirrored to flash.) I set syslog to mirror to a cache directory, but I don't see any logs there. I've just checked the box to get it to mirror to flash, so if it happens again hopefully I'll have something useful (but I haven't see anything relevant in the logs I've looked at in the past, it just looks as if the server is working normally). I have some other very odd and vexing behavior which started happening at the time I upgraded, too, but it's all at the BIOS level, so it kind of seems impossible that the upgrade would have caused it. The first time the server went down, after I restarted the BIOS wouldn't recognize my flash drive as a bootable device (it just didn't appear in the menus at all). The drive was attached to an internal USB header. Eventually I reinstalled from an online backup onto a new thumb drive, re-registered it (blacklisting the old drive), and was able to boot from it. Things were working fine, but when the server went down again and I had to cold-restart it, the new thumb drive wasn't available as a bootable device. Through trial and error, and an eventual CMOS reset, I discovered that after rebooting the server I need to physically unplug the USB drive and plug it back in in order for it to be recognized and bootable. I have no idea what that's about. It's happening before the OS loads, so it doesn't seem possible that Unraid could be affecting it. I did verify that I'm running the latest firmware for my motherboard (an ASRock z170 Extreme 7+). I haven't seen this behavior before but I also have rarely needed to cold-boot my server in the past. At this point I'd like to roll back to Unraid 6.12.4, since it didn't have these issues, but since I'm on a new USB stick, I don't have the option to just roll back to it from Tools / Update OS. Are there files I can copy from the old USB stick to the new one that will let me do that? Or is there some other way to downgrade? eurydice-diagnostics-20240101-0940.zip
  10. It seems that the nzbget developers have gotten tired of working on it and archived all the github repos, so I'm guessing fixes for the VideoSort thing aren't likely from upstream (mine seems to have broken again, personally). Guess it's time to look into SABnzbd again...
  11. I can confirm that 6.5.0 is broken for me with the same error, and rolling back to 6.4.0 fixed it. Don't see anything obvious about this on the linuxserver.io github page.
  12. I've noticed similar behavior where my unRaid (6.10.3) box will periodically seem to drop all inbound network traffic. This happens once every few weeks. Plugging in a monitor to it, I see the console still prompting me for a login, no kernel panic message or similar. (I wasn't able to hook up a keyboard to log in at the console, long story, and have wound up just cutting the power, though now I have a new keyboard ready for next time.) Weirdly, although inbound connections fail (HTTP to the web console / docker ports, SMB connections to shares, ICMP ECHO pings), from looking at my NZBGet history, it appears as though outbound traffic is still working - I see downloads that completed successfully during the time that I could not ping the box. I have just enabled syslog and will report back here with logs if I see the problem reoccurring. Edit to add that this is on a regular old Intel PC, I'm not running QNAP hardware.
  13. Just wanted to say thanks for the instructions, this has been working great for me!
  14. Ok, well after a good deal of messing around I created a new config and am rebuilding parity now. I definitely lost a bunch of data, and without disk3 being readable it's a little hard to say what exactly went away, but the system is stable again and I definitely learned something through this whole process. Thanks very much for helping me out with this @JorgeB!
  15. Sorry about that. I first got a notification that disk3 was out on 2022-01-15, and I definitely didn't intentionally write anything to the disk after that. Most of the data going into the array since that time would be automatic downloads (from Sonarr etc) which are not super important and could be redownloaded if needed. After running for quite a while, ddrescue from disk2 to my new replacement for it succeeded, rescuing 99.99% of the data, and after running a xfs_repair on the replacement it mounts fine, with just a few random files (6GB or so) winding up in lost+found. Do you think it would be worthwhile to try to add in this replacement disk back into the array as disk2, and then once it's in there to try to rebuild my replacement for disk3 from parity, or would I just be risking 6GB or more of corrupt data on the replacement (since that would be changed on disk2 since parity was computed)? Or should I just start over with a new config and live without whatever was on disk3?
  16. Ah, right. Well I just wanted to see whether I could get any data off of it at all, but it seemed to be totally unresponsive. I've got ddrescue running right now on disk2 (current remaining time: 222d 15h, though I'm hopeful that will improve). Am I correct in saying that at this point my existing parity drive isn't useful any more, since it's been trying to check two faulty drives at once and whatever parity information is on it is unreliable now? So my best course of action is just to recover as much stuff as I can from disk2 and then recompute parity from scratch with whatever recovered data I can get off of it?
  17. As a brief update, I removed disk3 (the disabled one) from the array and tried to add it back in to rebuild, but it started throwing SMART errors, and then finally wouldn't mount at all. xfs_repair told me to run it again with -L, which I may try to do, but in theory everything in there should be rebuildable from parity, so I'll likely just get rid of the disk. I'll be trying a ddrescue from disk2 to the new disk as soon as the new disk's extended SMART test is complete.
  18. Thanks for the link to ddrescue. So If I'm understanding correctly, my next steps would be: install a new disk as disk5 try to ddrescue as much stuff as I can from disk2 to disk5 remove disk2 from the array, add disk5 Rebuild parity based on the recovered data, probably with some data loss Is there any chance of un-disabling disk3? Should I just uninstall the disk and trash it?
  19. Ok, so The disk2 and disk3 reports completed. disk3, the disabled one, shows the test completing without error in that same section of the report: SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 21383 - # 2 Short offline Completed without error 00% 21365 - # 3 Short offline Completed without error 00% 21328 - disk2, which is doomed, shows 4218 errors (and unraid shows the test as "completed: read failure"). SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 46611 43138432 # 2 Extended offline Completed: read failure 90% 46566 3145072 # 3 Short offline Completed without error 00% 46557 - So it looks like only disk2 fails the SMART tests, and if I'm lucky I'll be able to swap it out and rebuild from parity. One thing I still don't understand is how I can get disk3 back into the array. Just starting the array doesn't seem to do it. Do I need to erase the disk or something? Remove it from the array and re-add it? Will a cold reboot do it? (I've rebooted, but haven't turned the power all the way off.) Relatedly, I would think it would be best to get disk3 back online before I swap out disk2 for a fresh drive, but is that the wrong order to do it in? I would think that as long as disk2 is unreliable, the system as a whole wouldn't be able to reliably compute the parity. disk3-eurydice-smart-20220225-0900.zip disk2-eurydice-smart-20220225-0038.zip
  20. Thanks for taking a look. I've got SMART tests running on disk2 and disk3 and will post them once they're done. The disk1 test finished and reported a passing test, as far as I can make out from the logs ("SMART overall-health self-assessment test result: PASSED"), but I'm not super familiar with what I should be looking for in there. disk1-eurydice-smart-20220224-2352.zip
  21. Oh, one other thing: this was probably a mistake, but I ran a parity check after I noticed things were failing. I got a lot of errors: Does this mean I'm screwed for data recovery? Like, if the parity disk couldn't read a bit from the failing disk, it wouldn't be able to compute the parity for that bit across all four disks, would it?
  22. Hello! I seem to be in a bit of a pickle. Two of the data disks in my 4-disk array are reporting SMART errors. One seems to pass xfs_repair with no problems, while with the other one xfs_repair fails on an i/o error. xfs_repair on disk #2 give me this: Phase 1 - find and verify superblock... - block cache size set to 1478872 entries Phase 2 - using internal log - zero log... zero_log: head block 1285582 tail block 1285582 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 xfs_repair: read failed: Input/output error can't read block 0 for directory inode 2096964 no . entry for directory 2096964 no .. entry for directory 2096964 problem with directory contents in inode 2096964 cleared inode 2096964 xfs_repair: read failed: Input/output error cannot read inode 3144960, disk block 3144992, cnt 32 (That's xfs_repair -v, I get errors just running -n though.) Meanwhile, a third disk is marked "disabled" although it reports no SMART errors, and will not seem to come back into the array even after stopping and restarting the array. I've run xfs_repair on it and it doesn't seem to have any errors. My shares are acting a bit strange, with two of them refusing to respond to an `ls` command: root@Eurydice:/mnt/user/Video/Television# ls /bin/ls: reading directory '.': Input/output error From looking at the xfs_repair output and smart tests, it seems clear that disk2 is a goner and will need to be replaced. I've actually got two new disks that I can swap in right now, but I'd like some advice before I do it. Right now my shares are all marked as "unprotected" which makes me worry that I'll lose data if I just clear a disk. So my questions are: - What should I do about disk #2, which has the i/o failures shown above? - Disk #1 shows a little SMART thumbs-down icon in the dashboard, but the last time I ran self-tests it was fine. Does hitting "acknowledge" clear the thumb-down icon? Should I consider this disk compromised as well? - What can I do to get disk #3, which shows status "disabled", back into the array? - What can I do do try to minimize data loss before I pull disk #2 (and maybe the others) out of the array? I'll include a diagnostics zip file (if I can manage to upload it, I was having trouble in Firefox). Thanks! eurydice-diagnostics-20220223-1737.zip
  23. There seem to be a lot of problems with the front-end for this image: - The "search" button on the ebooks/audiobooks page doesn't do anything, just refreshes the page - The search box on the "series" page returns a 404 because the URL parameter is called "searchbox" (manually changing the URL to use "name" seems to work) - Changing of the "status" dropdowns on the manage page seems to just refresh the page and only ever show things in the "wanted" state Anyone else seeing this? I don't see any js errors in my console and the behavior is the same between Firefox and Chrome.
  24. This is maybe a dumb question, I just can't remember. Is the source code for videosort/VideoSort.py included in this docker image, or is it something that is downloaded and installed separately from the docker image itself? (I installed nzbget a long time ago and several versions of Unraid back, so I honestly can't remember.) If it's in the image it should be patchable (by running it through the 2to3 script as @Merson suggested, at a minimum). If not, does anyone have a link to wherever the official source is, or else an alternative script written in Python 3? The reason for my question is that I'm also getting the same "VideoSort: SyntaxError: invalid syntax" failure messages. I tried adding the ".py=whatever" line and that didn't fix it, which makes sense now that I'm thinking about it since there isn't a python2 binary anywhere in the image that could execute the videosort file. From the error message, I'm guessing all that needs to be done is changing the "<>" operators to "!=" (as "<>" was deprecated in python3, and presumably removed in whatever point release the base image was updated to recently).
  25. Hi all, I have a question. I've got a machine that I put together back in 2016 which used this now-discontinued 660W PSU, the Seasonic SS-660XP2. The PSU recently failed, and I'm wondering if I might have been running too much hardware on it. When I first set this up, it was running 4 WD Reds (6GB) plus a NVidia GeForce GTX 970 (not used often, just for gaming in a VM). Everything worked fine. Then a while back my motherboard failed and I wound up uninstalling the GPU. Recently after a disk failure I decided to add in a 5th HDD and pop the GPU back in. I started it up and everything seemed to be working fine, but then after several days of happiness it just shut down, and refused to POST, and after a bit of troubleshooting I realized the PSU had failed. I've RMAed the power supply, but I'm wondering if I might have been hovering just on the edge of how much power I needed, and if adding in the 5th HDD might have pushed it over the edge and caused the PSU to fail. So I thought I'd get some advice. Does 660W sound like a big enough power supply for that amount of hardware? Should I upgrade to a beefier model now and save whatever replacement Seasonic sends me for a different machine?