August 20, 20196 yr Firstly a little about my setup: - HP N40L Microserver running unRAID 6.7.2 with 6 Data Drives (8Tb Parity, 1 x 8Tb Data + 4 4Tb Data) and 1 Cache Drive (500Gb) - System is used simply to download and store media files on my local network, and is attached to APC UPS and hardwired directly to Wi-Fi Modem/Router, otherwise all access is via Wi-Fi via various devices (iOS) - Dockers: just the basics - Plex Media Server (LimeTech), ruTorrent (Linuxserver.io) and Sickbeard (all up to date with latest versions) - Uptime prior to Crash was 52 Days, and Parity was last checked only 2 Days ago, with 0 errors. System has been rock-solid and I have never encountered any major problems. The Incident: Last night I arrived home from work, opened up ruTorrent and added a couple of magnet links. After adding the links ruTorrent became unresponsive. Seeing that there have been issues with various versions having 100% CPU usage and I often see the docket reboot itself I was not too concerned. I waited a couple of minutes and tried again but could not reach ruTorrent so I tried unRAID Home with the same result - no GUI access. I fired up Termius and logged in via Telnet successfully, so issued the 'Poweroff' command so shut down the server (stupidly forgetting to save a Syslog or Diagnostics as I did not initially think much of the freeze). On reboot I was able to access the GUI, the Array was valid and all Array Drives present but I noticed that my Cache drive was missing from list of available Drives. I initially thought that maybe my old spinning rust disk had died, being the cause of the crash. I tried opening the syslog from the GUI but it had become unresponsive again. I figured that the 'dead' disk was causing slow reads/errors so logged in again via Telnet successfully and shut down again. I opened up my server and replaced the Cache Drive with a (Brand) New SSD I had in the cupboard and booted back up. When I first booted I opened the GUI successfully but noticed multiple Array Drives missing. I quickly shut the server down (via GUI), opened it backup and checked/re-seated all connections. On rebooting I was sucessfully able to access the GUI and confirm that all Disks were present, Array was valid, and new Cache Drive detected. I assigned the new Cache Drive and clicked Start to start the Array but the GUI did not respond. At this point I was becoming seriously concerned. I thought maybe my motherboard had grenaded and was dropping disks, so tried to connect via Telnet but this did not respond initially either. After trying again I was able to access the server via Telnet and confirm that all Disks were present and detected using the ls -l /dev/disk/by-id command. I then shut down again, and did some research online about what steps to take. I rebooted the server (again) and this time was unable to access the GUI at all (previous times I had been able to load the unRAID Home page, but it had not responded to commands), but could still Telnet in. I saved a copy of the Syslog and Diagnostics, which I have attached to this post (Note: on issuing the Diagnostics command I received a few lines of errors referencing missing info and giving line references to dynamix files etc, but the command completed successfully). I am seeking the help of the unRAID community in diagnosing/fixing this issue. I have always found the community super-helpful and have been able to resolve any minor issues I have had in the past. At this point I am not sure what hardware is to blame (Mobo, RAM, USB/Boot Drive), or if it could be a Software issue with a corrupt file(s). Could someone more knowledgeable than me take a look at the attached Diagnostics, help diagnose the cause(s) and/or let me know if there are any other steps I should take. syslog.txt watchtower-diagnostics-20190820-2108.zip Edited August 21, 20196 yr by blu3_v2 Solved
August 21, 20196 yr Author After blowing apart the server again, rechecking and re-seating all drives and connections I sat down and read through any logs from the Diagnostics.zip that I could decipher. I could not find any errors in SMART reports etc. and the syslog captured showed no errors, and the server had started up successfully and was ready to start with a Valid Array prior to logging in via Telnet and issuing the Poweroff command. I then set about troubleshooting any other potential issues and thought I would try reboot my Modem/Router, and all wireless iOS devices that I use to access the server on my LAN. After booting up the server (again) I was able to access the GUI, Start the Array, and access all Dockers etc. (all using my original Cache Drive so all Dockers and Data were preserved including in-progress downloads). So it appears that it was simply a LAN issue (IP conflict, hardware issue with Router???).
Archived
This topic is now archived and is closed to further replies.