SpyisSandvich

Members
  • Posts

    21
  • Joined

  • Last visited

Everything posted by SpyisSandvich

  1. EDIT: I've solved this by completely recreating everything, changed owners, and somehow it's working properly now. Couldn't tell you why. Hey all, trying to figure out the permission issue for the `prometheus` image. Marzel's post from 2022 didn't do anything for me. I made a `prometheus.yml` file myself, I assume one would be generated for me if permissions were working. Other than that, it doesn't seem like it can persist anything. Appdata folder is completely owned by `nobody:users` with permissions on `775`. prometheus.log prometheus.yml
  2. Unfortunately your config doesn't really help. What I ultimately did was downgrade my server version to 13.0.0, which seems to be the last good version that continues to work with my current setup. Looks like there was an announcement for this on their main site.
  3. Yeah, I've been running like this since I got it set up. One of the slots is bad, but I've never run into memory capacity issues, so I kinda just left it. I could do some troubleshooting and potentially reconfigure it to see all 4, just haven't seen the need.
  4. Okay, quickly breaking at my 6.9.2 test as I'm immediately seeing a flood of this message in my syslog: I'm going to halt the software regression test because it seems like my system can't handle going back this far. EDIT 2: This is still happening even now that I've gone back to 6.11.5, should I restore from my flash backup, or is this something deep in the system configuration that the flash wouldn't touch? I want to revisit the RAM speed, since I was told previously that might be an issue with servers. This system runs DDR4-2400 with unbuffered ECC. UEFI was already set to use that speed, and from what I gathered for this configuration, this should be okay. I could try bringing the speed down and seeing if that improves stability? EDIT: Downclocking my RAM only seems to have made things worse. My computer gets into this weird loop where it'll spin the fans for a few seconds, then power back down. It'll do this 3 times then stabilize, where I'm assuming it falls back to last good configuration. Even does this when setting the speed back to normal. It will eventually boot though. Attached logs from the 6.9.2 test in case they're helpful. ca-server-diagnostics-20221221-0832.zip
  5. Is there a better way to do this than the built-in Update OS screen? That screen only allows me to go back one version. EDIT: I suppose I can follow this nugget. I'll take a backup of my flash drive first though. EDIT 2: Currently testing unRAID 6.10.3, seemed to take the downgrade just fine. EDIT 3: 6.10.3 crashed last night. Testing 6.9.2 now. If this fails, then that should remove the software as the possible culprit because I should have easily been past this point when it was working well.
  6. There really isn't much here to go off of, it frequently crashes several hours ahead of the last logs. The part that gets me here is that this was stable until I updated unRAID a month or so ago. I realize it's possible that some of the BIOS settings got messed with, but I've been able to confirm that this hasn't happened. I would rather not downgrade my unRAID version, and it would be difficult to switch to comparable hardware I'm currently running. How does one troubleshoot this without any information?
  7. I checked again this morning. Global C-State Control is "Disabled", and Power Supply Idle Control is "Typical Current Idle". Tried with a local syslog server and it didn't capture anything around the time of the last crash (It crashed around 22:30 and the last log message was two hours prior). I also briefly tried remote syslog with a virtual Debian machine, but switched away from it because for most of the 11th, I only ever saw the line that indicated syslogging was started. Should I be mirroring to the flash drive, or should I try the remote solution again? syslog-192.168.2.251.log
  8. Reopening this again, as the hard crashes have started back up with an Unraid update (not sure exactly which but it was good until at the very earliest the beginning of October). I double-checked the UEFI settings and confirmed that the C-States options and RAM speed configurations have not changed.
  9. My Piwigo server stopped working at some point in the last few months. I noticed in the logs that it was complaining about a couple of the config files: /nginx/nginx.conf /nginx/site-confs/default.conf I tried renaming these with the suffix .old and the server started to function again, but it took me to the setup screen and I'm not sure exactly what in the configs was pointing everything to my existing library. Any thoughts? Attached are the current sample configs (.conf) and my old non-working configs (.conf.old) default.conf default.conf.old nginx.conf nginx.conf.old
  10. UPDATE: 4 days 1 hour up and counting. I ended up needing to change both "Power Supply Idle Control" and global C-states, but it appears to be stable now! Thanks for pointing that out! Will do. I was a little thrown since I've run unRAID without issue on this board before when I was running a 1900X, but it makes sense that something may have changed when I upgraded.
  11. Hello I'm experiencing an issue where my server will randomly crash completely. No web UI, no SAMBA, everything hard crashes and the box requires a hard restart. It sounds like a kernel panic but I can't confirm this. At first the server would last between 12-24 hours before crashing, but recently this window has been cut to around 2-6 hours. I was originally under the impression that I had a failing flash drive. Sometimes before it would crash, I would see the "License file not found" error and that the boot USB device had been moved to Unassigned Devices. However, I've already swapped to a brand new drive and have been moving it around to different USB ports and controllers, and this hasn't fixed anything. ca-server-diagnostics-20220915-0935.zip
  12. I'm going to continue to use the server despite the unmountable drive. As long as it's not being mounted and, crucially, I exclude all shares from using that drive, it should be okay, right? Right?
  13. I've updated my previous post, the check finished predictably. In the meantime, what's the risk of running the server provided I don't use the broken drive? The data on there isn't the most critical on my server, but keeping it in Maintenance mode (or off, since I'm still unsure about what to do next) has obvious impacts to usability.
  14. It does give me the option to cancel, so I did. Removed the -n flag and running it again. EDIT 2: The check finished, looks like it didn't find anything. unable to verify superblock, continuing... (this line is continuously expanding with `.` characters) Sorry, could not find valid secondary superblock Exiting now.
  15. I figured as much. I didn't remove the -n flag. Will update when it finally finishes.
  16. Hello I have a disk showing up in my server as Unmountable. It's otherwise still showing green and doesn't have any errors listed in the Main tab. I've started xfs-repair through the unRAID GUI. It's still running but it's stuck on the following step: Phase 1 - find and verify superblock... bad primary superblock - bad magic number !!! attempting to find secondary superblock... ...found candidate secondary superblock... unable to verify superblock, continuing... (this line is continuously expanding with `.` characters) I've collected the diagnostics and attached them. I'll update when I notice the check finish, but it seems to me like it's probably going to run through the entire disk before failing to find a superblock. Assuming this is the eventual outcome, how might I go about fixing this drive? Some context, I needed to shut down the server yesterday because CPU consumption somehow spiked to 100% on all cores and even the web GUI wasn't being responsive. There's a strong chance that this caused the issue on this drive. ca-server-diagnostics-20220326-1222.zip
  17. I'm aware that -n meant that changes wouldn't be committed, I didn't just want to run it blindly. I've run the repair, discarding the log with -L, and these were the results. Beyond the CRC error near the top of the log, it doesn't look like there were any other issues, and there aren't any further instructions, so I remounted the array, and it looks like everything's in its place and accessible! It was a 5TB so I can't speak for everything, but the few things I tried are all in working order, and I'm confident enough to resume running the array. Thanks a ton, I thoroughly appreciate the direction. Second-guessing what I thought the server can take and getting anxious when something doesn't go as expected really puts up blinders to the solution.
  18. It wasn't far along, so I stopped it. Here's the check results with `-nv` flags used: diskchk.txt
  19. It looks like I need to start in Maintenance mode for that, but it's still running the parity sync on the disk that I accidentally removed. Should I wait for that to complete?
  20. I have my server outfitted with several disks of varying size, and two 8TB parity disks. Recently, one of my data drives failed, and I bought a replacement. When I went to put it in, I also inadvertently took out another data disk that I previously assumed wasn't being used in the array, and didn't realize that until the array was rebuilding. I worried since that left the array unprotected in case another drive failed, but I let the rebuild complete. Now that it's done, I went back and re-added the drive I accidentally pulled out to rebuild that one (it was an empty drive at least), but now the replacement disk is showing as unmountable. Another detail I noticed was that, when the failed disk was rebuilding, it hid a bunch of the shares that existed only on that drive from the Shares tab. They reappeared after the disk reconstruction. Not sure if this is relevant. Did I lose that disk? Or is it possible to salvage it?