November 14, 2025Nov 14 I'm able to SSH on to the server and the web UI works, but everything else seems to time out. I can't access files etc. across the network, and if I try and trigger any commands via the GUI they just time out as well. Reboot, same. Stop array, same.I couldn't see anything obvious in /var/log/syslog that indicated any issues. Running top doesn't show anything with high mem or CPU usage, but some of the cores keep spiking/staying at 100% utilisation.Are there any logs I can provide to work out what might be going on here?Is there a way I can force the array offline in a 'safe' manner and then trigger a reboot perhaps? Any other thoughts as to how I should approach this?root@NAS01:~# tail /var/log/syslogNov 14 16:06:46 NAS01 emhttpd: Sync filesystems...Nov 14 16:06:46 NAS01 emhttpd: shcmd (214519): syncNov 14 16:06:47 NAS01 sshd-session[2206811]: Postponed keyboard-interactive/pam for root from 192.168.0.132 port 58796 ssh2 [preauth]Nov 14 16:06:47 NAS01 sshd-session[2206811]: Accepted keyboard-interactive/pam for root from 192.168.0.132 port 58796 ssh2Nov 14 16:06:47 NAS01 sshd-session[2206811]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)Nov 14 16:06:47 NAS01 elogind-daemon[1785]: New session 2 of user root.Nov 14 16:06:47 NAS01 sshd-session[2206811]: User child is on pid 2207105Nov 14 16:06:47 NAS01 sshd-session[2207105]: Starting session: shell on pts/0 for root from 192.168.0.132 port 58796 id 0Nov 14 16:09:28 NAS01 nginx: 2025/11/14 16:09:28 [error] 6488#6488: *17943 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.0.132, server: , request: "POST /update.htm HTTP/2.0", upstream: "http://unix:/var/run/emhttpd.socket:/update.htm", host: "nas01.local", referrer: "https://nas01.local/Main"Nov 14 16:14:28 NAS01 nginx: 2025/11/14 16:14:28 [error] 6488#6488: *17943 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.132, server: , request: "POST /update.htm HTTP/2.0", upstream: "http://unix:/var/run/emhttpd.socket/update.htm", host: "nas01.local", referrer: "https://nas01.local/Settings/DiskSettings" Edited November 14, 2025Nov 14 by flashback
November 14, 2025Nov 14 Community Expert Try to get the complete syslog:cp /var/log/syslog /boot/syslog.txt
November 14, 2025Nov 14 Author 5 minutes ago, JorgeB said:Try to get the complete syslog:cp /var/log/syslog /boot/syslog.txtThat's done, where's the best place to upload? It's 1.3mb so PasteBin fails (512kb limit). Edited November 14, 2025Nov 14 by flashback
November 14, 2025Nov 14 Community Expert You should be able to attach it to the forum zip fist if it's too large.
November 14, 2025Nov 14 Community Expert Nov 14 15:13:44 NAS01 kernel: <TASK> Nov 14 15:13:44 NAS01 kernel: unraidd+0xe00/0x13d0 [md_mod] Nov 14 15:13:44 NAS01 kernel: ? preempt_latency_start+0x2b/0x50 Nov 14 15:13:44 NAS01 kernel: ? md_thread+0xf1/0x120 [md_mod] Nov 14 15:13:44 NAS01 kernel: ? kthread_should_park+0x12/0x30 Nov 14 15:13:44 NAS01 kernel: md_thread+0xf1/0x120 [md_mod] Nov 14 15:13:44 NAS01 kernel: ? __pfx_autoremove_wake_function+0x10/0x10 Nov 14 15:13:44 NAS01 kernel: ? __pfx_md_thread+0x10/0x10 [md_mod]Unraid driver crashed. This is almost always a hardware problem. Reboot the server; you likely will need to force it, then post the diagnostics to see the hardware used.
November 14, 2025Nov 14 Author 10 minutes ago, JorgeB said:Nov 14 15:13:44 NAS01 kernel: <TASK> Nov 14 15:13:44 NAS01 kernel: unraidd+0xe00/0x13d0 [md_mod] Nov 14 15:13:44 NAS01 kernel: ? preempt_latency_start+0x2b/0x50 Nov 14 15:13:44 NAS01 kernel: ? md_thread+0xf1/0x120 [md_mod] Nov 14 15:13:44 NAS01 kernel: ? kthread_should_park+0x12/0x30 Nov 14 15:13:44 NAS01 kernel: md_thread+0xf1/0x120 [md_mod] Nov 14 15:13:44 NAS01 kernel: ? __pfx_autoremove_wake_function+0x10/0x10 Nov 14 15:13:44 NAS01 kernel: ? __pfx_md_thread+0x10/0x10 [md_mod]Unraid driver crashed. This is almost always a hardware problem. Reboot the server; you likely will need to force it, then post the diagnostics to see the hardware used.Thanks, I suspected as much as I saw some kernel messages in there. I'm trying to do a pre-reboot dump of diagnostics, it's been running a few minutes now - I assume it shouldn't take that long and therefore may not be able to be grabbed pre-reboot then? I've attached the ZIP it's generated so far, there also seems to be one generated earlier today which I've also attached.For the force reboot, do I need to just shutdown -r via terminal, or physical reset it using the NAS power button? fwlonnas01-diagnostics-20251020-1252.zip fwlonnas01-diagnostics-20251020-2004.zip
November 15, 2025Nov 15 Community Expert 12 hours ago, flashback said:I assume it shouldn't take that long and therefore may not be able to be grabbed pre-reboot then?Diags can be after a reboot, just to see the hardware. No hardware is known to have issues AFAIK, only one stick of RRAM, so you cannot test with one at a time, try running memtest, but note that that's only definitive if it finds errors.
November 15, 2025Nov 15 Author Just now, JorgeB said:Diags can be after a reboot, just to see the hardware. No hardware is known to have issues AFAIK, only one stick of RRAM, so you cannot test with one at a time, try running memtest, but note that that's only definitive if it finds errors.OK, should I just run "/sbin/reboot" from the command line or "powerdown -r"? Searching the forum gave multiple viewpoints. Note that the array is still 'stuck':NAS01:~# grep -E '^(mdState|fsState)=' /var/local/emhttp/var.inimdState="STARTED"fsState="Stopping" Edited November 15, 2025Nov 15 by flashback
November 15, 2025Nov 15 Author I ran powerdown, which it noted is deprecated but it started the process (like I had also tried from the GUI). That seems to do the same, whereby it says it is rebooting........ but doesn't actually reboot. Looks like I'll need to physically reboot via the power button. Note: I checked /mnt/disk1 and /mnt/disk2 and both seemed to display the data OK using ls -lah.:/mnt# powerdown -rpowerdown: /usr/local/sbin/powerdown has been deprecatedBroadcast message from root@NAS01 (pts/0) (Sat Nov 15 07:50:38 2025):The system is going down for reboot NOW!
November 15, 2025Nov 15 Author 2 minutes ago, JorgeB said:You will likely need to force a reboot.Rebooted, syslog and diagnostics attached.fwlonnas01-diagnostics-20251115-0759.zipsyslog20251115.zip
November 15, 2025Nov 15 Author I can change settings via the GUI now, so that's working after the reboot. I ran the syslog though ChatGPT as well as having a skim myself, which came out with the following:CategorySeverityNotesACPI BIOS ErrorsMediumFix with BIOS update if availableMemory Controller (EDAC) ErrorMedium–HighRun memory test; monitor ECC eventsNVMe Missing SUBNQNLowCommon, usually harmlessPCI Resource Mapping WarningLow–MediumLikely harmless; BIOS update may helpIf I start my array now (as I have it set to manual after reboot), I should uncheck write corrections to parity, as a safety measure, right? Or should I not try and start the array yet?
November 15, 2025Nov 15 Author 12 minutes ago, JorgeB said:Have you already run memtest?I can only do that from the boot menu, can't I? If so, it'll need to wait till I'm back in town so I can physically access the machine...
November 16, 2025Nov 16 Author I thought I might do what I could as much as possible before I get back, and triggered an online memtest of 29G out of the 32G RAM, results as follows:[16.11.2025 04:34:40 GMT] /usr/bin/memtester 28G 2 ============================================================================================= memtester version 4.6.0 (64-bit) adapted for use on Unraid installations Copyright (C) 2001-2020 Charles Cazabon Copyright (C) 2024 desertwitch (modifications for Unraid) THIS PROGRAM IS PROVIDED AS IS AND WITHOUT ANY WARRANTIES It is licensed under GNU General Public License Version 2 pagesize is 4096 pagesizemask is 0xfffffffffffff000 want 28672MB (30064771072 bytes) got 28672MB (30064771072 bytes), trying mlock ...locked. Loop 1/2: Stuck Address : Testing... ok Random Value : Testing... ok Compare XOR : Testing... ok Compare SUB : Testing... ok Compare MUL : Testing... ok Compare DIV : Testing... ok Compare OR : Testing... ok Compare AND : Testing... ok Sequential Increment : Testing... ok Solid Bits : Testing... ok Block Sequential : Testing... ok Checkerboard : Testing... ok Bit Spread : Testing... ok Bit Flip : Testing... ok Walking Ones : Testing... ok Walking Zeroes : Testing... ok 8-bit Writes : Testing... ok 16-bit Writes : Testing... ok Loop 2/2: Stuck Address : Testing... failed Random Value : Testing... ok Compare XOR : Testing... ok Compare SUB : Testing... ok Compare MUL : Testing... ok Compare DIV : Testing... ok Compare OR : Testing... ok Compare AND : Testing... ok Sequential Increment : Testing... ok Solid Bits : Testing... ok Block Sequential : Testing... ok Checkerboard : Testing... ok Bit Spread : Testing... ok Bit Flip : Testing... failed Walking Ones : Testing... ok Walking Zeroes : Testing... ok 8-bit Writes : Testing... ok 16-bit Writes : Testing... ok Done. ============================================================================================= [16.11.2025 07:16:21 GMT] The operation has finished with errors. [16.11.2025 07:16:21 GMT] Code: 6 - Error during Stuck Address Test + Other Test(s).And:[16.11.2025 04:34:40 GMT] /usr/bin/memtester 28G 2 ============================================================================================= The detailed error output is enabled, watch this panel for any occurring errors. FAILURE: possible bad address line at offset 0x000000036eef1328. FAILURE: 0xffffffffff7fffff != 0xfffffbffff7fffff at offset 0x00000000fb7b35d8.What is interesting is the timing of this, I installed this a month ago in my comms room downstairs and it's been running fine. On Wednesday I moved it up to my loft as my sparky came and extended my network up there - it's cooler up there (although at idle the system runs at 18-25 degrees C and the drives around the same), but also the power socket has a max limit of 5A. Hanging off the power socket are the UPS (Eaton 3S850B 3S Gen2 Desktop UPS Uninterruptible Power Supply (510W/850VA)) which connects my 2.5GbE switch and then this NAS. Red herring and just coincidental timing?In the meantime, should I just power off the NAS till I get back do you think and raise a warranty replacement request in the meantime? Then there's the question around the timing of the relocation, is it possible the lower temps and/or the power running to it are what's maybe caused the issue.
November 16, 2025Nov 16 Community Expert 2 hours ago, flashback said:In the meantime, should I just power off the NASYes, you should not run the server with known bad RAM.
November 19, 2025Nov 19 Author TerraMaster support isn't responding and as Amazon just fulfills for them, they can't provide a straight replacement and will only refund. The price has gone up by £183, which is rather annoying.I think I may need to look at alternative replacements. In the meantime, what's the best way for me to recover the data from the drives that Unraid was managing? Put them into a USB caddy, and then is there a way to read in Windows or will I need to boot up into a Linux live environment or similar due to XFS?
November 19, 2025Nov 19 Community Expert You can just boot with an Unraid trial flash drive in any PC to access them, another Linux distro will also work, like an Ubuntu live flash drive
November 19, 2025Nov 19 Author 9 minutes ago, JorgeB said:You can just boot with an Unraid trial flash drive in any PC to access them, another Linux distro will also work, like an Ubuntu live flash driveI might have to just try and grab the data not copied (it's only 50-100gb that hadn't backed up by the time it started failing) by booting it back up, as I don't have another PC to fire up - only laptops.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.