December 9, 20169 yr Hello, This problem began about a week ago. I upgraded to unRaid 6 about a year ago, mostly with no issues. I am running RCX, with docker images of Plex, Sabnzbd, Sonarr, and CouchPotato. Last week, I lost access to my Plex server -- it was not accessible from the clients, and I could not get to the webconfig page through the unraid config page either. I tried restarting it to no avail. I tried reinstalling the image, but received errors because the current version could not be stopped. This occurred even after reboots of the server. At this point, other services, such as Sonarr were still running, as was the unRaid config page. A day or two later, I noticed I could no longer access the unRaid page -- but my shares were still available. Snce then I have lost access to everything. The server is present on the network, according to my router, and it responds to pings as well. Short of options, I tried a hard reboot, with no change. I connected it to a monitor, tried unmounting drives and shares as described in a help doc, and then tried "reboot". I got a message that it was shutting down, and then nothing. After an hour of waiting, I pushed reset button. Then I started in GUI mode. The webpage was still inaccessible even from the machine. I pulled the diagnostics file (attached) and then tried"powerdown". Again, no response. Any help or insight would be appreciated. If I need to reinstall a clean version of unRaid on the stick, I can do that, I just really don't want to have to reformat drives and lose my data. I should probably note that when I stuck the stick in this machine to pull the diagnostics file off, I did get a "scan and fix" warning from windows 10. I figured this may be a filesystem thing, so I didn't do it, but could this indicate a USB drive failure? Thanks. tower-diagnostics-20161208-1707.zip
December 9, 20169 yr When you got the warning from Windows 10, did you let Windows 10 'scan and fix' the flash drive? If you didn't, pull it out and see if Windows 10 finds a problem and corrects it. (This was called chkdsk in earlier versions of Windows. I am not sure what MS named in Win10.) While you got it out, I would make a backup of that Flash Drive. It may be corrupted or unrepairable and may require a reformatting but the configuration files (in the root of the config folder) may be intact. They can be a BIG help because the base configuration files contains things like server name, fixed IP address, users, passwords, disk assignment (every important), etc. And, even if they are toast, We can help you recover ALL of your data from the array!
December 9, 20169 yr Author Frank, Thanks for taking a look. I attempted to create an image with ImageUSB, but it failed part of the way through due to a read error. I copied the files anyway, and then tried the scan disk tool. That passed. Tried the image copy again and received the same read error. It does sound at this point like the USB drive is the culprit. Should burning a fresh copy on a fresh USB stick be my next step? Can I just copy the config folder over and hope for the best? Thanks again, Keith
December 9, 20169 yr Frank, Thanks for taking a look. I attempted to create an image with ImageUSB, but it failed part of the way through due to a read error. I copied the files anyway, and then tried the scan disk tool. That passed. Tried the image copy again and received the same read error. It does sound at this point like the USB drive is the culprit. Should burning a fresh copy on a fresh USB stick be my next step? Can I just copy the config folder over and hope for the best? Thanks again, Keith I suggest you read this article about upgrading. http://lime-technology.com/forum/index.php?topic=39032.0 Notice the only things you need to copy out of the config folder are in step 4. (I would not copy over any .cfg files for dockers or VM's.) This will get your server up-and-running basic unRAID with your data intact. No plugins, dockers or VM's. Then reinstall the things that you want and need. Since you don't know what is corrupted on that Flash, copying more over may not get you running! You could try a complete 'Full' format of the Flash Drive rather than a 'Quick' format. (Full formatting is suppose to lock out bad sectors.) If that doesn't work, you can use a new flash drive and get a replacement key as outlined here: https://lime-technology.com/replace-key/ Oh, and once you get it working to your satisfaction. Stop the array and make a copy of the entire Flash Drive. I, personally, would not bother with an image but you could make one of those also.
December 10, 20169 yr Author Ok. Bought a new flash drive. Loaded the latest version, and copied the config files, minus the go file and the docker file, plus the shares folder. Booted up, and I was able to access the webconfig page once again. Problem solved! Except ... before loading the plugins described in your doc, I noticed that the drives were not mounted because an unclean shutdown was detected... stupidly, I tried to mount them. That was about an hour ago, and the server has been hanging since. Page is spinning, not timing out, but not refreshing. Is this purely a goof in my order of ops, an indication of a corrupted file in that config folder, or an issue with my drives do you think?
December 10, 20169 yr Did you get a new Key File (or apply for a Trial registration)? Did you just start the Array by "Main' >>> 'Array Operation' --- Then clicked "Start"? If you did this, then a parity check would have started since an unclean shutdown was detected. But you should still should have been able to navigate to other tabs on the GUI and there should be a progress line for the check on the 'Array Operation' page. Can you get to the command line (either on a monitor and keyboard attached to the server or with PuTTY)? If so, type diagnostics at the prompt. That will write the diagnostics file to your Flash Drive. It will be in the root or the logs folder. Attach that file to your next post.
December 10, 20169 yr Author Got the new/replacement key file in place. Started at Main >> Array Operation. Can't navigate away from the page though, and there is no indication of a parity check like there normally is. I can go for the diagnostics file, but it looks like that will require a hard shutdown -- it has been "mounting disks..." for 2 hours now with no changes. Is it safe to go ahead and reboot manually?
December 10, 20169 yr Author Didn't think that one through. Was able to ssh in and grab the diagnostics file without rebooting. tower-diagnostics-20161210-0521.zip
December 10, 20169 yr If you look at the end of the syslog.txt file in the logs folder, you will see a couple of stack dumps and a line that disk 3 did not mount. These are entries one doesn't want to find in a syslog. However, I am not enough of an expert to be able to tell you how to proceed from here. Hopefully, one of the real Linux Gurus can jump in and provide you with some guidance! I did look at the SMART reports and I didn't see anything there.
December 10, 20169 yr If you look at the end of the syslog.txt file in the logs folder, you will see a couple of stack dumps and a line that disk 3 did not mount. These are entries one doesn't want to find in a syslog. However, I am not enough of an expert to be able to tell you how to proceed from here. Hopefully, one of the real Linux Gurus can jump in and provide you with some guidance! I did look at the SMART reports and I didn't see anything there. Its disk 3 causing itDec 10 03:55:46 Tower kernel: XFS (md3): _xfs_buf_find: Block out of range: block 0x7fffffff8, EOFS 0x74706d70 Need to run check the file system on disk3
December 10, 20169 yr Author Got it. So I read through the guide here (https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems). It's been a while and I don't remember what file system I originally set up on them, and I can't see in the GUI, because they won't mount. Looking over the error you excerpted, I made the assumption that it's xfs, and so I ran the following command through an ssh terminal. xfs_repair -v /dev/md3 But, the system has not responded. I saw that it can take up to 30 minutes, but the doc also indicated there would be a running report output. Instead, I have waited 30 mins, and nothing has happened. The GUI is still refusing to load, so I don't know if it's still trying to mount the drives, and that's what's preventing the check?
December 10, 20169 yr I looked at the instructions and saw this line: Start the array in Maintenance mode, by clicking the Maintenance mode check box before clicking the Start button. This starts the unRAID driver but does not mount any of the drives. I suspect that you are going to have to force a reboot and this time, follow the instructions explicitly.
December 10, 20169 yr Author Unfortunately, I can't get the GUI to load again when I forced a reboot. I can still get into terminal though and the following disks are mounted: Linux 4.4.30-unRAID. root@Tower:~# df -k Filesystem 1K-blocks Used Available Use% Mounted on rootfs 4045632 332540 3713092 9% / tmpfs 4081520 200 4081320 1% /run devtmpfs 4045644 0 4045644 0% /dev cgroup_root 4081520 0 4081520 0% /sys/fs/cgroup tmpfs 131072 1828 129244 2% /var/log /dev/sdb1 15626216 134992 15491224 1% /boot /dev/md1 976732736 955613816 21118920 98% /mnt/disk1 /dev/md2 976732736 954022044 22710692 98% /mnt/disk2 /dev/sdc1 156290872 20993892 134251628 14% /mnt/cache root@Tower:~# Will umounting disk1, disk2, as well as sdb1, and sdc1 (if necessary) be equivalent/sufficient?
December 10, 20169 yr Disable array autostart by editing disk.cfg on your flash drive (in the config folder): change startArray="yes" to "no" Reboot and run xfs_repair in maintenance mode.
December 10, 20169 yr Author Thanks Mr. Black, that did the trick. Here's the output from the check: root@Tower:~# xfs_repair -v /dev/md3 Phase 1 - find and verify superblock... - block cache size set to 744072 entries Phase 2 - using internal log - zero log... zero_log: head block 897871 tail block 865875 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. It seems we can be pretty confident that the mount will not work, since we've tried several times. -L sounds risky but it sounds like thats the next step. If it does cause corruption, would I still be able to just rebuild the disk from parity?
December 10, 20169 yr If it does cause corruption, would I still be able to just rebuild the disk from parity? Parity can't help with filesystem corruption but for most (every?) case recently where -L had to be used there was no data loss, so you should be ok.
Archived
This topic is now archived and is closed to further replies.