firrae

Members
  • Posts

    37
  • Joined

  • Last visited

Posts posted by firrae

  1. Just happened to me on Unraid server Pro, version 6.10.3. Couldn't event just restart NGINX as it would disconnect my SSH client (my Mac) a moment after I connected every time. Ended up having to hard shut it down by the power button. When it came back up now a drive is being "emulated" for apparently no reason as it's SMART test says it's fine and so does Unraid other than it didn't want to bring it up apparently.

  2. 13 minutes ago, itimpi said:

    That message is quite normal and very rarely leads to data loss - and despite its ominous wording even when it does it only affects the last file written.    You need to try again but add the -L option.

    Cool, thanks. Will do tomorrow after work. With its ominous wording I figured I'd make sure before doing it ha. Sometimes it actually means something.

  3. Hi there,

     

    I'm in a weird issue. the UnRAID disk check says my drive has filesystem corruption, but I find it interesting that the system is fine until I've written 1 or 2 GBs of data to it. At that point it becomes I/O errors galore. So clearly it's messed up. When I went through the wiki article here on repairing drives with issues: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui. It returns the following through both the UI and through a SSH session:

    xfs_repair -v /dev/md6
    Phase 1 - find and verify superblock...
            - block cache size set to 1507224 entries
    Phase 2 - using internal log
            - zero log...
    zero_log: head block 2248359 tail block 2247490
    ERROR: The filesystem has valuable metadata changes in a log which needs to
    be replayed.  Mount the filesystem to replay the log, and unmount it before
    re-running xfs_repair.  If you are unable to mount the filesystem, then use
    the -L option to destroy the log and attempt a repair.
    Note that destroying the log may cause corruption -- please attempt a mount
    of the filesystem before doing this.

    So now I'm not sure what that means for me. I don't think there's anything awfully important on the drive and am about to boot it up as normal to double check, but like I said I can read off it perfectly fine, even after writes start to fail, so if there is anything important on it I can presumably recover it.

     

    Now though is the crux of my question, what is the best method? I have a spare 4TB drive sitting here if replacing it 1:1 will work and letting the parity re-build (there's maybe 20GB on the drive so far as I can remember) so not much, otherwise do I drop that `-L` flag on it and hope it fixes things (from the wiki the tool seems unreliable for positive outcomes at best)?

     

    I'm going to check what's on the drive now aby spinning it back up and hopefully someone can help me go from there.

     

    If you need anything else to help out let me know.

    tower-smart-20201119-2112.zip tower-diagnostics-20201119-2112.zip

  4. 9 hours ago, chesh said:

    This doesn't fix your underlying problem, but there is an extension for Chrome (delugesiphon) that can work with .torrent links.  Will add them to your deluge instance by just clicking the download link for the .torrent file. I use it with my private trackers so I don't have to download the .torrent file and then upload it to Deluge.

     

    While this was not the fix, it did help me generate a useful error in the logs. Seems like something went weird on one of my disks causing I/O locks in specific containers and even some whole shares (the downloads folder being one of them). After a reboot and some clean-up this seems to be working again, though I will need to keep an eye on the drive which is normal procedure I guess (it's one of my oldest).

     

    Though thanks for trying and showing me that Chrome add-on, I'm totally using it!

  5. Hi there, maybe someone could help me out. I posted the issue on the GitHub issue tracker here because that's my default place to put issues as a developer lol: https://github.com/binhex/arch-delugevpn/issues/224.

     

    Synopsis is that the container is running, VPN is seemingly connected fine, web UI shows up, but when I try to add a torrent by Torrent file (.torrent) I get "Failed to upload torrent" and there doesn't seem to be a log in sight about it. Since the UI works otherwise I can only assume the setting are correct. I've tried turning off Privoxy AND VPN all together and still get the same issue. Magnet links work and the stuff begins to download fine, but adding .torrent files is a complete no go, and as I only use torrents from a private tracker, they don't offer magnets so this is a deal breaker for me.

     

    At this point if I can't figure it out, I'm back on the hunt for a VPN protected torrent container after spending too much time on this one already sadly.

  6. 6 minutes ago, johnnie.black said:

    BTW no point in rebuilding a disk with multiple disk errors.

    What would be the path forward do you think then? I'm not sure what I should do. I have multiple disks reporting read errors, but none show issues other than the CRC errors in SMART. Should I stop the rebuild, flash the firmware, and then... what? Rebuild if a parity check goes well?

  7. 4 minutes ago, johnnie.black said:

    Always, but they weren't reported before, the attribute wasn't monitored.

    Interesting.

    Otherwise, if you don't me asking, but does the system look fine at a cursory glance? I do still have UNRAID reporting high read errors as well. Other than the glaring "its rebuilding a drive" thing of course.

  8. Quick update. After I finished writing this I noticed that my SATA cable based drives were also getting these errors, but not all of them. The Parity drive is reporting 0 over its entire life, but the drive nearest it is showing an increasing, but slower than other drive on the SAS to SATA breakouts, number of CRC errors. This maybe leads to a combination of the cables and the cages? I really don't know at this point.

  9. Hi there,


    After digging around on Google and the forums I believe the issues with my array come down to the issue that I am getting UDMA CRC errors on a number of my drives, but honestly I'm not sure where to begin looking at the cause. In my eyes, and from reading, I believe it could be one or a combination of 3 things:

    1. My SAS to SATA cables (maybe they are cross-talking and the likely candidate?) - I've tried 2 different brands but still get the issue, though both brands the cables looked the same, just slightly different colours. - https://www.amazon.ca/gp/product/B0736J45V2/
    2. My drive cages: I have a Rosewill RSV-L4412 which came with 3 drive cages (can't remember the part number for them) - https://www.rosewill.com/product/rosewill-rsv-l4412-4u-rackmount-server-case-or-chassis-12-sata-sas-hot-swap-drives-5-cooling-fans-included/
    3. My SAS controller which is a Fujitsu (?) card flashed to be an LSI 9211-8i in "IT" mode

     

    At this point I believe the cables but I'd be interested in hearing what others think. 8 of my disks use these breakout cables as the way they connect, the other 4 go directly to the motherboard SATA ports. What I find interesting is it seems like the drives on these breakout cables have the issue much worse, though this is only so far a short term observation since I read about this, and the cage that's wired directly currently only has 3 drives in it, the rest are fully loaded with 4.

    I'm curious if people think I'd be better served with which of the potential options to try and solve this:

    1. Get different breakout cables.
    2. Get new drive cages.
    3. change out the controller.

    In any case I'd be interested in seeing the recommendations people have on this.

    This all comes from my seeing what I think are VERY high read error counts as I'm rebuilding my array after changing out a drive. Attached is my diagnostics file from the server. Its in the middle of building that drive as I mentioned, so whatever decision I make I'm a couple of days away from actually implementing at least assuming I can eve get the parts to do it at this point.

    I'm interested to see what people think. Thanks!

    tower-diagnostics-20200318-1415.zip

    • Check for BIOS updates, especially if you're running VM's with passthrough

    You may have found it. I thought I had updated it, and while I need to go into the BIOS to be sure this is one of the update features on the second to last update they gave:
     

    Quote

    Fix cold reset when VT is enabled

     

    I have VT enabled. I will try these BIOS updates if the server crashes otherwise I'll try them tomorrow night by gracefully shutting it down.

  10. Just now, Squid said:

    No errors, just an up and reboot would imply

    • Check for BIOS updates, especially if you're running VM's with passthrough
    • Power Supply
    • Mismatched RAM
    • RAM not on the MB approved list
    • Cooling issues
    • Very excessive dust bunnies

    But, realistically, try googling for an updated Memtest, as the included one with unRaid is rather dated and doesn't actually catch all errors (make a live linux USB stick or something for Memtest)

     

    1) It seems the BIOS is fully updated, but I'll check again.
    2) brand new PSU, replaced the original PSU. Tested the PSU on my main PC and it ran it fine for 3 days.
    3) All the RAM is identical and was purchased at the same time.
    4) RAM meets the mobo's requirements and is within their spec.
    5) This is the one I can't decide if it is the problem. I have 4 fans in the case, one over the HDDs that's an intake, 1 more intake on the side, and 2 exhaust (back and top). I've had this issue happen with the fans in place and the case closed and with the side fan removed and the case fully open.
    6) I cleaned pretty well everything before I put it in there. I moved the old PC into a new case and took that time to basically clean everything via compressed air before putting it back in.

    I'll follow up with the BIOS though and for heat, I figured there'd be some sort of warning or error log somewhere, but I can't find anything that indicates that.

  11. Just now, Squid said:

    While probably very difficult to arrange, the more interesting information would be on the locally attached monitor (if you have one).  Major PITA as you'd basically need a video camera pointed at it for hours on end.  (unless you set up your phone or something to record, run your 6+ containers that cause it to shortly crash)

     

    I have a monitor hooked up to it and was looking at it once when it happened, there was no shutdown procedure so it was a hard power off, the screen just went black and then the BIOS boot screen, that's where my thought of it being the PSU came from originally.

  12. Hi there,

     

    I've been trying to solve an issue with my server that seems to baffle everyone I speak to and at this point, I'm running out of possibilities. As the title says my server randomly reboots and there's seemingly no reason. I've captured logs via telnet and looked at them after the crash and found absolutely nothing. The log just stops as if I pulled the power plug. I checked my docker containers and each time there's no consistent action happening that I can point to. Below is a synopsis of my setup:

    Intel i7-920
    EVGA X58 motherboard
    12GB (6 2GB sticks) of DDR3 RAM
    4 HDDs (2 4TB and 2 2TB)
    1 SSD (an older 128GB Sandisk)
    600W EVGA 80+ Bronze PSU (brand new as I thought this might be the issue originally)

    I do live in an apartment so I have a 1500VA APC UPS between the server and the wall.

    As I said I can't find any clear thing that causes the reboot and the only reason I know it happened is PLEX is no longer available or I hear the beep from the POST succeeding. I have found some potential contributing factors though:

    1) When I'm running no docker containers the server seems fairly stable and was on for 24 straight hours where the reboot usually happens after 6hrs (rarely less, but it has happened after only 2 hours before).
    2) At about 6 containers it seems to lag the web UI and then crash shortly after.
    3) SabNZB seems to cause it when it's the only container fairly quickly.

    During all this, I am watching the system stats on the dashboard and only a few cores ever spike to 100% and memory never passed 40%, but there's seemingly no consistency on CPU and RAM usage and the rebooting.

    Finally, I have run memtest86 on it and after leaving the test to run for 2 straight days it never found an error so I've basically ruled out memory. I now have an error with community applications, likely corruption (I think this happened when it rebooted in the middle of trying to create a container), but even at the start, when CA was working, I was having this issue.

    Any help is appreciated as this is basically making it unusable.

    Edit: I have a telnet session into the server now to try and capture if/when the server reboots, this could take a while though.

  13. 20 minutes ago, trurl said:

    Extremely unlikely a bad flash would cause reboots. unRAID probably isn't even accessing the flash in the middle of the night, since the OS is loaded into RAM at boot, and after that flash is typically only accessed when you save changes in the GUI.

     

    Do you have an UPS?

     

    Yes, it never registered any power issues and my other PC that is also connected to it was perfectly fine.

    To note, it is an APC 1500 VA so it should be more than enough.

    At this point, I'm at a loss if it's not likely the USB. I've changed out the PSU, run memory checks, benchmarked the CPU and all the drives are reporting good health...

  14. 1 hour ago, Squid said:

    Same error?  

     

    If so, then your flash drive is corrupted.  Toss it into a computer and check the file system on it.

     

    Could this also be causing my issue where the server randomly restarts? Also is there a way to redo the flash drive but keep all my current settings and data?

    EDIT: Also yes the same error.

  15. 2 hours ago, Squid said:

    Sounds like the templates stored on the flash drive (config/plugins/dockerMan/templates-user) are corrupted, or the flash drive dropped offline.

     

    Probably the latter.  Is your syslog (tools - syslog) filled with "Bread" errors?


    Wish I could tell you. After I made that post I went to bed and now I've awoken to the server having rebooted itself uncleanly again. Could that be caused by a bad USB stick? This has been happening for a while now and I've tested everything I can think of in the hardware and it all comes up fine so I'm running out of options on why this server is so unstable...

  16. Hey @Squid,

     

    I had initially messaged about an issue with CA here:

    but the scope of the issue seems wider than just the cleanup. Whenever I try to install a new application from the repository I get the following error:

     

    Warning: simplexml_load_file(): /boot/config/plugins/dockerMan/templates-user/my-plexrequests.xml:1: parser error : Document is empty in /usr/local/emhttp/plugins/dynamix.docker.manager/include/CreateDocker.php on line 418 Warning: simplexml_load_file(): in /usr/local/emhttp/plugins/dynamix.docker.manager/include/CreateDocker.php on line 418 Warning: simplexml_load_file(): ^ in /usr/local/emhttp/plugins/dynamix.docker.manager/include/CreateDocker.php on line 418 Fatal error: Uncaught Error: Call to a member function xpath() on boolean in /usr/local/emhttp/plugins/dynamix.docker.manager/include/CreateDocker.php:419 Stack trace: #0 /usr/local/emhttp/plugins/dynamix.docker.manager/include/CreateDocker.php(448): getXmlVal(false, 'Name') #1 /usr/local/emhttp/plugins/dynamix.docker.manager/include/CreateDocker.php(675): getUsedPorts() #2 /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(383) : eval()'d code(17): require_once('/usr/local/emht...') #3 /usr/local/emhttp/plugins/dynamix/include/DefaultPageLayout.php(383): eval() #4 /usr/local/emhttp/plugins/dynamix/template.php(61): require_once('/usr/local/emht...') #5 /usr/local/src/wrap_get.php(16): include('/usr/local/emht...') #6 {main} thrown in /usr/local/emhttp/plugins/dynamix.docker.manager/include/CreateDocker.php on line 419

    I'd prefer to not have to re-do my Docker setup as I've just seemingly got it stable. For some reason, some of the containers in the repository seem to cause my server to hard reboot with no reason given in the logs. That is a different issue for a different thread though.
     

    Am I just missing something?

  17. Hi @Squid,

     

     

    This was working well for me until today when I encountered the following:
     

    Warning: DOMDocument::loadXML(): Empty string supplied as input in /usr/local/emhttp/plugins/ca.cleanup.appdata/include/xmlHelpers.php on line 195 Fatal error: Uncaught Exception: [XML2Array] Error parsing the XML string. in /usr/local/emhttp/plugins/ca.cleanup.appdata/include/xmlHelpers.php:197 Stack trace: #0 /usr/local/emhttp/plugins/ca.cleanup.appdata/include/exec.php(43): XML2Array::createArray('') #1 /usr/local/src/wrap_post.php(27): include('/usr/local/emht...') #2 {main} thrown in /usr/local/emhttp/plugins/ca.cleanup.appdata/include/xmlHelpers.php on line 197

    Any help is appreciated as I can't currently easily clean up, this plugin was working too well ;p

    EDIT: Actually digging deeper, I'm getting similar issues even trying to add new applications. I can search but when I go to install it throws a similar error message.

  18. Whenever I install this image I get the following in the browser:

     

    Quote

    apps directory not found! Please put the Nextcloud apps folder in the Nextcloud folder or the folder above. You can also configure the location in the config.php file.

     

    Did I do something wrong?

    EDIT: After re-creating the container it seems to be working now. Not sure what happened though.