boy914

Members
  • Posts

    48
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

boy914's Achievements

Rookie

Rookie (2/14)

1

Reputation

  1. Thanks @JorgeB! Between chkdsk and those instructions, I was able to get the server back up and running. I need to fix a few things, but so far it looks stable.
  2. Hi all, it's been a while... While updating my sever from 10.x (can't remember which version) to 11.5. After the update the server wouldn't boot, and it appears that my flash drive is corrupted. Of course, I didn't follow the instructions and failed to create a flash backup before the upgrade, so I'm wondering if the upgrade process does anything to automatically back up the flash drive contents somewhere. It's probably a long shot, but my other choice is to restore to the last flash backup I have from 2018 and try to remember what I changed since then. If anyone has any tips for accessing a flash drive that Windows doesn't even recognize ("D:\ is not accessible. The file or directory is corrupted and unreadable."), I'm all ears. Attached is what I see on the console during server boot:
  3. I just set up a backup for my cache-only "Apps" share using this method. A few "gotchas" I discovered (unRAID 6.1.6): 1. If you're backing up your Plex docker files, it's a good idea to use a disk share instead of a user share as the destination (i.e. "disk2" instead of "user" in the path). The latter works, but you'll get unnerving messages in the syslog about missing files. 2. If you want to include a notification, it appears that the "notify" command has moved:
  4. Just an FYI for anyone using the MySQL docker to run the Kodi database: MySQL was recently updated to version 5.7 as the "official" version. As such, my Docker updated to 5.7 a few days ago and Kodi started having issues, as described here: http://forum.kodi.tv/showthread.php?tid=247907 Rather than hack a MySQL 5.7 install to "act" like 5.6, I decided to try and get the Docker to pull down the 5.6 latest version instead of 5.7. Turns out it's as simple as setting the "Repository" value to: ...in the Docker's advanced settings view. BACK UP YOUR DATABASE FIRST, since you'll probably end up having to do a fresh MySQL install. Also, if you do a full dump backup and you're already on 5.7, you'll need to remove any entries that refer to the "sys" or "mysql" databases in the backup file.
  5. Thanks again johnnie.black! I'm going to mark this as "RESOLVED" and stick with XFS for my cache drive from now on. I'm also going to implement some sort of backup schedule for my cache-only share so I don't need to redo my Docker settings if this ever happens again (e.g. Sickbeard DB... ugh.). I understand the benefit of a cache pool, but I'd rather stick with something that runs less of a risk of losing everything due to a RAM failure.
  6. Thanks johnnie.black (my favorite blended Scotch, FWIW)! I agree that the speed on the RAM was underclocked, but the timings seemed aggressive. Or maybe just bad luck fried the RAM. Either way, I'm ordering two more sticks of the same so I can have a spare on hand if the other goes down. I'll rebuild my Docker image as well. I'm starting to get good at that. Final question and I'll mark this as "RESOLVED": Is there a known issue that btrfs is especially susceptible to corruption when there's bad RAM in the system? I ask because through all of the crashes and surprise reboots, my reiserfs data/parity drives had no problems, and I'm not seeing any issues with my newly XFS-formatted cache drive. However, when the issue first occurred it completely wiped my btrfs cache pool, and now there's an issue in my Docker image (which I believe is also btrfs, right?).
  7. Thanks buxton, you are correct: I had already started running Memtest after a hard restart and I can't say for sure, but to me this doesn't look good. The RAM is this, purchased in July: http://www.newegg.com/Product/Product.aspx?Item=N82E16820233536 I noticed the timings on the screen don't match the specs on the RAM. RAM is 9-10-9-27 @ 933 MHz, but my system was running it at 9-9-9-24 @ 666 MHz! I assumed the mobo would detect and automatically assign the correct settings (ASRock Z97 Extreme6, DRAM timings were all "Auto")... I certainly didn't intend to run my RAM out of spec! I manually reset the timings and clock speed, ran another memtest with both sticks, and still got a ton of failures. I tested each stick individually and narrowed it down to one bad stick. I ruled out mobo issues by testing each stick in a couple slots. Bad stick failed everywhere, good stick nowhere. I assume it's ok to hobble the system along on one stick until I can replace it, correct?
  8. Sorry for all of the posts to my own thread, but I'm afraid I REALLY NEED HELP! I ended up rebooting the server using the reset button on the case, since I couldn't access it via the web gui, telnet or even directly after plugging in a keyboard and monitor. Once the system came back up, I proceeded to rebuild my dockers (MySQL, Plex, Sickbeard, SAB). At one point during the process, the server started having issues. Syslog is attached. I was able to recover with a "powerdown -r" and everything seemed fine so I continued working on getting my Dockers set up again. After a few more hours of this I kicked off a parity check and went to bed. The previous parity check after the hard reboot was halfway done with 0 errors when I had to run powerdown, so I wanted to run another just to be safe. This morning my server was once again unresponsive. No access to webgui, Telnet or directly via keyboard/monitor. I can't get a log file out so I'm going to have to do another hard reboot. I really don't understand what is going on. I have been running unraid for years and I've never had any problems like this. I've been running v6 since August and I've been absolutely loving it. These issues have come out of nowhere. I'm starting to suspect some sort of hardware failure (RAM maybe?), but I don't know what tools I can run to investigate this, and I can't figure it out from the syslog files either. I would really appreciate any help or advice or troubleshooting steps. Thank you! depot-syslog-20160101-2140.zip
  9. I just returned from a few days away, and once again my system is unresponsive. This time, I cannot navigate to the web gui, and I cannot connect via Telnet (after entering the "root" login, the screen just hangs). However, I can still access files via my PC and the shares are not read-only. Before I left, I formatted one of my two SSDs as XFS and changed my cache settings to use it as the lone cache drive. I didn't set anything else up (no dockers, no VMs, no files in the cache-only "Apps" share), and left my PC running with the syslog page loaded. Thankfully, because the syslog was on the screen when I left, I can see the last few entries. It looks like OpenVPN Client disconnected at 7am (per cron job), and a parity check kicked off at 11pm. Also at 11pm, the OpenVPN Client reconnected (per cron job). I can't remember if the parity check was supposed to start at 11pm or not, so I'm not sure if some event caused this or if it's the normal monthly check. It's important to note that nothing has changed in my setup (other than the cache drive) for almost 5 months. The cron jobs that start/stop OpenVPN Client have been this way for a long time. Since I can't access the web gui and I can't telnet to the server, how do I proceed? Syslog excerpt below:
  10. Just tried stopping/starting array with each of my SSDs connected as the only device, and in both cases I got this in the syslog: (second time the device ID referred to the other SSD). So, it looks like I'm toast and just need to reformat the cache drive and start over. For now I'm going to just use one of the SSDs as XFS and just do the minimal set up to get my Dockers up and running again. If anyone has the time, I would still very much like to know what went wrong here. If it really is just a btrfs issue, then I'd be curious to know what I'd be walking away from if I leave my cache as XFS.
  11. Ok, I'll try that next and update the thread. In the meantime: Can someone look at the attachments and let me know what happened? I really don't think I was doing anything out of the ordinary apart from having my cache drive occasionally fill up. Also, if I decide to move to XFS for the cache drive, am I painting myself into any corners? Obviously the redundancy of the pool didn't seem to help much, but I don't want to start down a path that will limit future options. Either way, once I get things rebuilt I plan on coming up with a strategy to back up my cache drive so this won't be such a pain if it happens again.
  12. Interesting... Of course, that eliminates the possibility of a pool, which in my case didn't seem to help anyway.
  13. Update:Based on a few posts, I tried running the "scrub" command on each SSD but received this message each time: Then I tried running a "restore" command on each of the two SSDs and received this message: In my reading I found other cases where removing one of the drives from the pool seemed to help, but before I try anything like that, I'd really like to get some advice from someone who knows what they're doing. The last thing I want is to ruin any slim chance of recovering my "Apps" share. ...and if recovery isn't an option, I'd definitely appreciate advice on backing up Dockers and VMs!
  14. ...second half of the syslog, showing all of the errors. syslog_20151228_Edited2.txt
  15. tl;dr: It ended up being a bad RAM stick that took my btrfs cache pool on the first crash and my Docker image file on a subsequent crash. Original post: Version: 6.1.6 (Plus) Parity: 4TB Data: Disk 1 (3TB), Disk 2 (3TB), Disk 3 (1TB) - reiserfs (upgraded from v5, haven't migrated file systems yet) Cache: 2x120TB Kingston SSD in a cache pool This afternoon I noticed that all of my shares were read-only in Windows, and when I tried to access the web gui to investigate, it was completely hung up. I connected via telnet, downloaded the syslog and noticed some nasty errors ("general protection fault", "BTRFS warning", etc.). I tried everything I could think of to get the system to do a clean reboot, but in the end I stopped all of my dockers (which were still running) and issued a "reboot" command. When the system came back up, my Dockers were gone, my VM was stopped, and one of the two SSDs in my cache pool (ID ending in "F2FE") was listed at "unmountable" and I am given an option to reformat it. My cache pool still shows as a "Pool of two devices" but the word "Unmountable" is next to the first drive on the list. Prior to this mess, my system had a cache-only "Apps" share containing all of the files for my Dockers (e.g. Plex library files, MySQL data files, etc.). Between the files in this share, the docker file itself and the VM image, my cache drive would tend to fill up, especially when adding a lot of files to the array at once. I'm not sure if this is related to the issue or not, but I thought I would point it out. Questions: 1. What happened? Do the logs indicate some problem with the SSD, or something else? In the attached diagnostics, both SSDs are showing as healthy in their SMART reports, so I can't tell if this is a hardware failure or not. However, I am unable to extract the "Kinston_DT_101_G2_.....txt" file from the .zip due to "destination file could not be created". I'm not sure what this file is. (EDIT: That's my flash drive. 7-zip was able to open it.) 2. Is there any way to recover my cache pool? My cache-only "Apps" share didn't have any data (other than my backed-up MySQL databases), but setting up all of my Dockers is going to be a massive undertaking that I'd like to avoid if possible. 3. What is the next step at this point? Should I take the option of reformatting the SSD or try something else first? 4. Is it bad for the system to have the cache drive fill up? It's not every day, but somewhat regular. 5. If I am out of luck with recovering the pool, how do I go about backing up my docker and VM image files (i.e. do they need to be stopped first, etc) on a regular basis once they're rebuilt? Does anyone have a script that does this? Note about the syslog: I removed some mover logger entries that included file paths, and replaced a few other paths with "REDACTED". The repeating "shfs_open" ... Read-only file system" error is a path that is accessed by the BTSync docker. The attached "syslog_Edited" is the pre-reboot log, and the diagnostics zip file is post-reboot. ...looks like I need to split the log into two files. First file is attached, I'll post the second half next. depot-diagnostics-20151228-1650.zip syslog_20151228_Edited1.txt