System becomes unresponsive during transfer

brent3000 · October 13, 2020

Hi All,

Hoping to get some help or if there is some log i can do to find whats causing the issues with my unraid sever,

Recently i have noticed that the system just randomly goes offline and the only way to bring it back online is a power cycle.

When does it happen? When I'm transferring to the share drives (using Cache) doing a data transfer of aprox 100GB the system will just go offline and even the connected monitor just goes blank and no response from a keyboard either. The switch its attached to shows the ports online but I'm unable to ping the unit or access it via the browser.

When I reboot the system it all comes back online fine, parity check is passed and no issues and transfer works again.

First time it happened I thought I just did something funky with the transfer or the system was unstable at all (but it was running for a solid week of just use reading, updates, dockers etc .

Then when I transferred another bunch of data it went offline again, is there a log or something I can dig into abit further to find what may be causing it post reboot? The log file itself (from the menu) seems to just show the current logs not the history from the previous boot,

JorgeB · October 13, 2020

Try this and then post that log after a crash.

brent3000 · October 19, 2020

Ok so it locked up again, and it was only working with the Cache drive so I thought maybe an issue with the drive writes but the share i was using was cache only,

See attached

syslog-192.168.2.1 - Copy.log

Edited October 19, 2020 by brent3000
Found more info

JorgeB · October 19, 2020

Oct 16 18:14:16 GLaDOS kernel: ata1.00: configured for UDMA/133
Oct 16 18:14:16 GLaDOS kernel: ata1: EH complete
Oct 16 18:16:59 GLaDOS kernel: ata1.00: exception Emask 0x10 SAct 0x10000 SErr 0x400100 action 0x6 frozen
Oct 16 18:16:59 GLaDOS kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Oct 16 18:16:59 GLaDOS kernel: ata1: SError: { UnrecovData Handshk }
Oct 16 18:16:59 GLaDOS kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Oct 16 18:16:59 GLaDOS kernel: ata1.00: cmd 61/00:80:c0:d4:af/0a:00:12:00:00/40 tag 16 ncq dma 1310720 ou
Oct 16 18:16:59 GLaDOS kernel:         res 40/00:80:c0:d4:af/00:00:12:00:00/40 Emask 0x10 (ATA bus error)
Oct 16 18:16:59 GLaDOS kernel: ata1.00: status: { DRDY }
Oct 16 18:16:59 GLaDOS kernel: ata1: hard resetting link

Check/replace cables on this drive, if you don't how to find it please post complete diags.

trurl · October 19, 2020

1 hour ago, JorgeB said:

please post complete diags.

You should post them anyway since they would give more context and might even contain additional information that points to the issue which is causing lockups.

Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread.

brent3000 · October 20, 2020

See attached,

Regarding the cables i actually have replaced the cables before (3rd time actually) and it was happening all three times (i changed Sata cables for all my drives from normal, to ultra thin, to shorter thin ones due to case limits)

Also ATA1 is that referencing the port on the Mobo or a specific slot? cuz that seems to be one of the cache drives (i have two in raid)

glados-diagnostics-20201020-1307.zip

JorgeB · October 20, 2020

4 hours ago, brent3000 said:

cuz that seems to be one of the cache drives

In this case it's SATA port1, currently connected to cache1.

brent3000 · October 20, 2020

1 hour ago, JorgeB said:

In this case it's SATA port1, currently connected to cache1.

So is that deff the issue? Why would it cause the whole system to lock up and not just disconnect the sata cable?

Where would i see the port or this info un the UI (or only in the logs?

The way the system is setup is the following

Mobo -> Sata Cable -> HDD BackPlane -> HotSwap pot -> SSD

(I have a DS380 case)

Now I have already replaced the sata cables in the whole unit so I dont think thats the issue, if I switch Cache 1 and Cache 2 in the bays that would mean that cache 2 should start reporting errors (meaning its the not the SSD and its isolated to the connections along that specific chain)

Would then switching the back-plane ports over (aka Port 1 to Port 2 and 2 to 1 etc) this will show if its between the Mobo and the cable or the backplane and the drive

Does that sound like a good diagnosis? Also was this outlined just from the initial log or from the diagnostics report? (aka if it happens again and i wanna check what file do i look in to see the issue? )

JorgeB · October 20, 2020

26 minutes ago, brent3000 said:

So is that deff the issue?

It's an issue, that should be fixed.

JorgeB · October 20, 2020

27 minutes ago, brent3000 said:

if I switch Cache 1 and Cache 2 in the bays that would mean that cache 2 should start reporting errors (meaning its the not the SSD and its isolated to the connections along that specific chain)

Yes, swap cables/bays with another device.

brent3000 · October 20, 2020

OK illl start some testing, but should it be crashing the system the way it does? If its the cache drive in raid should it not just fail that drive?

May be awhile till i reply to this with soem testing but il try do some data dumping to see if i can make it trigger any sooner hahah

JorgeB · October 20, 2020

ATA errors can cause timeouts that can make the system unresponsive for several minutes and sometimes appear to have crashed.

brent3000 · October 20, 2020

So quick one, I swapped the drives over in the hot swap bays and its still showing the same error but can someone confirm is it Sata Port 1 or Sata Port 0?

As the log shows ATA1 but there is refrence to SAT0PR0

Quote

Oct 20 21:07:47 GLaDOS kernel: ata1.00: exception Emask 0x10 SAct 0x1e000000 SErr 0x400100 action 0x6 frozen
Oct 20 21:07:47 GLaDOS kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Oct 20 21:07:47 GLaDOS kernel: ata1: SError: { UnrecovData Handshk }
Oct 20 21:07:47 GLaDOS kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Oct 20 21:07:47 GLaDOS kernel: ata1.00: cmd 61/00:c8:c0:16:f1/0a:00:13:00:00/40 tag 25 ncq dma 1310720 ou
Oct 20 21:07:47 GLaDOS kernel: res 40/00:c8:c0:16:f1/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
Oct 20 21:07:47 GLaDOS kernel: ata1.00: status: { DRDY }
Oct 20 21:07:47 GLaDOS kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Oct 20 21:07:47 GLaDOS kernel: ata1.00: cmd 61/00:d0:c0:20:f1/0a:00:13:00:00/40 tag 26 ncq dma 1310720 ou
Oct 20 21:07:47 GLaDOS kernel: res 40/00:c8:c0:16:f1/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
Oct 20 21:07:47 GLaDOS kernel: ata1.00: status: { DRDY }
Oct 20 21:07:47 GLaDOS kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Oct 20 21:07:47 GLaDOS kernel: ata1.00: cmd 61/00:d8:c0:2a:f1/0a:00:13:00:00/40 tag 27 ncq dma 1310720 ou
Oct 20 21:07:47 GLaDOS kernel: res 40/00:c8:c0:16:f1/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
Oct 20 21:07:47 GLaDOS kernel: ata1.00: status: { DRDY }
Oct 20 21:07:47 GLaDOS kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Oct 20 21:07:47 GLaDOS kernel: ata1.00: cmd 61/00:e0:c0:34:f1/0a:00:13:00:00/40 tag 28 ncq dma 1310720 ou
Oct 20 21:07:47 GLaDOS kernel: res 40/00:c8:c0:16:f1/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
Oct 20 21:07:47 GLaDOS kernel: ata1.00: status: { DRDY }
Oct 20 21:07:47 GLaDOS kernel: ata1: hard resetting link
Oct 20 21:07:47 GLaDOS kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Oct 20 21:07:47 GLaDOS kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
Oct 20 21:07:47 GLaDOS kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._GTF, AE_NOT_FOUND (20180810/psparse-514)
Oct 20 21:07:47 GLaDOS kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
Oct 20 21:07:47 GLaDOS kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._GTF, AE_NOT_FOUND (20180810/psparse-514)

My next thing is to switch over port 1 and 2 (aka 0 and 1) on the backplane to see if the errors stay the same or if they switch (which should allow me to remove the backplane from the issues list and then focus on the cable/mobo port)

JorgeB · October 20, 2020

ATA is the first MB port, some boards call it port0, others port1.

brent3000 · October 25, 2020

SO i did an initial swap on the cables to the HDD bay and the error is still the same, so I'm going to change the ports over now but quick question, as i have two cache drives, would the error change in any way to advise which actual drive is causing the issue or would it only show the Sata port error?

Quote

Oct 25 20:16:17 GLaDOS kernel: ata1.00: exception Emask 0x10 SAct 0xe000 SErr 0x400100 action 0x6 frozen
Oct 25 20:16:17 GLaDOS kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Oct 25 20:16:17 GLaDOS kernel: ata1: SError: { UnrecovData Handshk }
Oct 25 20:16:17 GLaDOS kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Oct 25 20:16:17 GLaDOS kernel: ata1.00: cmd 61/00:68:40:3d:42/0a:00:1e:00:00/40 tag 13 ncq dma 1310720 ou
Oct 25 20:16:17 GLaDOS kernel: res 40/00:68:40:3d:42/00:00:1e:00:00/40 Emask 0x10 (ATA bus error)
Oct 25 20:16:17 GLaDOS kernel: ata1.00: status: { DRDY }
Oct 25 20:16:17 GLaDOS kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Oct 25 20:16:17 GLaDOS kernel: ata1.00: cmd 61/00:70:40:47:42/0a:00:1e:00:00/40 tag 14 ncq dma 1310720 ou
Oct 25 20:16:17 GLaDOS kernel: res 40/00:68:40:3d:42/00:00:1e:00:00/40 Emask 0x10 (ATA bus error)
Oct 25 20:16:17 GLaDOS kernel: ata1.00: status: { DRDY }
Oct 25 20:16:17 GLaDOS kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Oct 25 20:16:17 GLaDOS kernel: ata1.00: cmd 61/00:78:40:51:42/0a:00:1e:00:00/40 tag 15 ncq dma 1310720 ou
Oct 25 20:16:17 GLaDOS kernel: res 40/00:68:40:3d:42/00:00:1e:00:00/40 Emask 0x10 (ATA bus error)
Oct 25 20:16:17 GLaDOS kernel: ata1.00: status: { DRDY }
Oct 25 20:16:17 GLaDOS kernel: ata1: hard resetting link
Oct 25 20:16:17 GLaDOS kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Oct 25 20:16:17 GLaDOS kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
Oct 25 20:16:17 GLaDOS kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._GTF, AE_NOT_FOUND (20180810/psparse-514)
Oct 25 20:16:17 GLaDOS kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
Oct 25 20:16:17 GLaDOS kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._GTF, AE_NOT_FOUND (20180810/psparse-514)
Oct 25 20:16:17 GLaDOS kernel: ata1.00: configured for UDMA/133
Oct 25 20:16:17 GLaDOS kernel: ata1: EH complete

I was also wondering if its possible to go from a RAID type setup on the Cache drives to a non raid and not have to re-install or change anything on the docker files or VM's etc?

brent3000 · October 25, 2020

Also changed the cables over and same error codes seems to come through,

Any ideas ? or was there anything else in the logs? Or is it looking like a Mobo fault cant say ive ever had a port cause me issues on a board before :?

Quote

Oct 25 21:22:12 GLaDOS root: Fix Common Problems: Other Warning: Background notifications not enabled
Oct 25 21:25:09 GLaDOS kernel: ata1.00: exception Emask 0x10 SAct 0x1e0 SErr 0x400100 action 0x6 frozen
Oct 25 21:25:09 GLaDOS kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Oct 25 21:25:09 GLaDOS kernel: ata1: SError: { UnrecovData Handshk }
Oct 25 21:25:09 GLaDOS kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Oct 25 21:25:09 GLaDOS kernel: ata1.00: cmd 61/00:28:40:8e:34/0a:00:24:00:00/40 tag 5 ncq dma 1310720 ou
Oct 25 21:25:09 GLaDOS kernel: res 40/00:28:40:8e:34/00:00:24:00:00/40 Emask 0x10 (ATA bus error)
Oct 25 21:25:09 GLaDOS kernel: ata1.00: status: { DRDY }
Oct 25 21:25:09 GLaDOS kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Oct 25 21:25:09 GLaDOS kernel: ata1.00: cmd 61/80:30:40:98:34/09:00:24:00:00/40 tag 6 ncq dma 1245184 ou
Oct 25 21:25:09 GLaDOS kernel: res 40/00:28:40:8e:34/00:00:24:00:00/40 Emask 0x10 (ATA bus error)
Oct 25 21:25:09 GLaDOS kernel: ata1.00: status: { DRDY }
Oct 25 21:25:09 GLaDOS kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Oct 25 21:25:09 GLaDOS kernel: ata1.00: cmd 61/80:38:c0:a1:34/09:00:24:00:00/40 tag 7 ncq dma 1245184 ou
Oct 25 21:25:09 GLaDOS kernel: res 40/00:28:40:8e:34/00:00:24:00:00/40 Emask 0x10 (ATA bus error)
Oct 25 21:25:09 GLaDOS kernel: ata1.00: status: { DRDY }
Oct 25 21:25:09 GLaDOS kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Oct 25 21:25:09 GLaDOS kernel: ata1.00: cmd 61/80:40:40:ab:34/09:00:24:00:00/40 tag 8 ncq dma 1245184 ou
Oct 25 21:25:09 GLaDOS kernel: res 40/00:28:40:8e:34/00:00:24:00:00/40 Emask 0x10 (ATA bus error)
Oct 25 21:25:09 GLaDOS kernel: ata1.00: status: { DRDY }
Oct 25 21:25:09 GLaDOS kernel: ata1: hard resetting link
Oct 25 21:25:10 GLaDOS kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Oct 25 21:25:10 GLaDOS kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
Oct 25 21:25:10 GLaDOS kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._GTF, AE_NOT_FOUND (20180810/psparse-514)
Oct 25 21:25:10 GLaDOS kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
Oct 25 21:25:10 GLaDOS kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._GTF, AE_NOT_FOUND (20180810/psparse-514)
Oct 25 21:25:10 GLaDOS kernel: ata1.00: configured for UDMA/133
Oct 25 21:25:10 GLaDOS kernel: ata1: EH complete

brent3000 · October 25, 2020

Attached is the two lattest log packs, if anything else i should try or should i verify by taking the cache into single drives and test the drives based on the ports?

glados-diagnostics-20201025-2150.zip syslog-192.168.2.1.log

JorgeB · October 25, 2020

If you haven't yet nonnect a different device to that port, if errors persist it's likely a board problem.

brent3000 · December 10, 2020

Sorry to bring this one back but getting the RMA board processed took abit longer than normal,

Soooo i ended up getting the board switched and moved everything around however doing a test just now gets me this, the exact same error,

Dec 10 14:29:28 GLaDOS kernel: ata1.00: exception Emask 0x10 SAct 0x20000000 SErr 0x400100 action 0x6 frozen
Dec 10 14:29:28 GLaDOS kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Dec 10 14:29:28 GLaDOS kernel: ata1: SError: { UnrecovData Handshk }
Dec 10 14:29:28 GLaDOS kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Dec 10 14:29:28 GLaDOS kernel: ata1.00: cmd 61/00:e8:40:74:17/0a:00:1f:00:00/40 tag 29 ncq dma 1310720 ou
Dec 10 14:29:28 GLaDOS kernel: res 40/00:e8:40:74:17/00:00:1f:00:00/40 Emask 0x10 (ATA bus error)
Dec 10 14:29:28 GLaDOS kernel: ata1.00: status: { DRDY }
Dec 10 14:29:28 GLaDOS kernel: ata1: hard resetting link
Dec 10 14:29:28 GLaDOS kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 10 14:29:28 GLaDOS kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
Dec 10 14:29:28 GLaDOS kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._GTF, AE_NOT_FOUND (20180810/psparse-514)
Dec 10 14:29:28 GLaDOS kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.PRT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
Dec 10 14:29:28 GLaDOS kernel: ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._GTF, AE_NOT_FOUND (20180810/psparse-514)
Dec 10 14:29:28 GLaDOS kernel: ata1.00: configured for UDMA/133
Dec 10 14:29:28 GLaDOS kernel: ata1: EH complete
Dec 10 14:30:12 GLaDOS ntpd[1809]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

Now the one final thing i have thought off but i did rule out (but i dont think i did enough switching to get it done fully)

I'm using a Silverstone DS380 (which has a MOBO back-plane installed) would it be possible the fault is with the backplane at all?

Tests i have done, (note both cache drives are in Bay 1, SATA0 and Bay 2, SATA1)

When i swapped the cache drives around, as they are running in a 'raid' type format the same data is always hitting both drives (aka ATA0 and ATA1 still received the same traffic) which is also the case when i switched them back and forth on the physical backplane.

my TLDR question, to avoid me ripping out the system and moving cables around, can I turn off RAID on the cache drives, do some testing without the cache drives being in RAID, (Write tests to each drive individually) find if the backplane is the issue, then simply re-enable the RAID?

JorgeB · December 10, 2020

If it's a raid1 pool you can remove one of the devices, then test with the remaining one.

brent3000 · December 23, 2020

Im back after a bunch more testing and overall pulling my hair out, kind of a new issue now,

So the system still crashs and with a Mobo replacement and some other things (cables moved/changed etc) now the system just goes offline and the logs show nothing vs before it listed the Sata port fault, not sure if I'm missing something so asking for some help on this one

Attached is the report and below is the running syslog, i posted two showing that doing the same thing (data dump on one of the drives it just goes offline)


Dec 23 20:57:11 GLaDOS unassigned.devices: Don't spin down device '/dev/sdc'.
Dec 23 20:57:11 GLaDOS unassigned.devices: Removing SMB share 'Seagate_BarraCuda_120_SSD_ZA500CM10003_7QV03Q1R'
Dec 23 20:57:11 GLaDOS unassigned.devices: Unmounting disk 'Seagate_BarraCuda_120_SSD_ZA500CM10003_7QV03Q1R'...
Dec 23 20:57:11 GLaDOS unassigned.devices: Unmounting '/dev/sdc1'...
Dec 23 20:57:11 GLaDOS unassigned.devices: Unmount cmd: /sbin/umount '/dev/sdc1' 2>&1
Dec 23 20:57:11 GLaDOS kernel: XFS (sdc1): Unmounting Filesystem
Dec 23 20:57:11 GLaDOS unassigned.devices: Successfully unmounted '/dev/sdc1'
Dec 23 20:57:11 GLaDOS unassigned.devices: Disk with serial 'Seagate_BarraCuda_120_SSD_ZA500CM10003_7QV03Q1R', mountpoint 'Seagate_BarraCuda_120_SSD_ZA500CM10003_7QV03Q1R' removed successfully.
Dec 23 20:57:11 GLaDOS emhttpd: shcmd (119): /etc/rc.d/rc.samba stop
Dec 23 20:57:11 GLaDOS emhttpd: shcmd (120): rm -f /etc/avahi/services/smb.service
Dec 23 20:57:11 GLaDOS emhttpd: Stopping mover...
Dec 23 20:57:11 GLaDOS emhttpd: shcmd (123): /usr/local/sbin/mover stop
Dec 23 20:57:11 GLaDOS root: mover: not running
Dec 23 20:57:11 GLaDOS emhttpd: Sync filesystems...
Dec 23 20:57:11 GLaDOS emhttpd: shcmd (124): sync
Dec 23 21:01:46 GLaDOS cache_dirs: Arguments=-l off
Dec 23 21:01:46 GLaDOS cache_dirs: Max Scan Secs=10, Min Scan Secs=1
Dec 23 21:01:46 GLaDOS cache_dirs: Scan Type=adaptive
Dec 23 21:01:46 GLaDOS cache_dirs: Min Scan Depth=4
Dec 23 21:01:46 GLaDOS cache_dirs: Max Scan Depth=none
Dec 23 21:01:46 GLaDOS cache_dirs: Use Command='find -noleaf'
Dec 23 21:01:46 GLaDOS cache_dirs: ---------- Caching Directories ---------------

Dec 23 21:01:51 GLaDOS dnsmasq[7446]: reading /etc/resolv.conf
Dec 23 21:01:51 GLaDOS dnsmasq[7446]: using nameserver 192.168.1.1#53
Dec 23 21:01:51 GLaDOS dnsmasq[7446]: read /etc/hosts - 2 addresses
Dec 23 21:01:51 GLaDOS dnsmasq[7446]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Dec 23 21:01:51 GLaDOS dnsmasq-dhcp[7446]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Dec 23 21:01:51 GLaDOS kernel: virbr0: port 1(virbr0-nic) entered disabled state
Dec 23 21:01:51 GLaDOS kernel: L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
Dec 23 21:02:05 GLaDOS unassigned.devices: Adding disk '/dev/sdb1'...
Dec 23 21:02:05 GLaDOS unassigned.devices: Mount drive command: /sbin/mount -t xfs -o rw,noatime,nodiratime,discard '/dev/sdb1' '/mnt/disks/Seagate_BarraCuda_120_SSD_ZA500CM10003_7QV03Q1R'
Dec 23 21:02:05 GLaDOS kernel: XFS (sdb1): Mounting V5 Filesystem
Dec 23 21:02:05 GLaDOS kernel: XFS (sdb1): Ending clean mount
Dec 23 21:02:05 GLaDOS unassigned.devices: Successfully mounted '/dev/sdb1' on '/mnt/disks/Seagate_BarraCuda_120_SSD_ZA500CM10003_7QV03Q1R'.
Dec 23 21:02:05 GLaDOS unassigned.devices: Adding SMB share 'Seagate_BarraCuda_120_SSD_ZA500CM10003_7QV03Q1R'.
Dec 23 21:02:05 GLaDOS unassigned.devices: Don't spin down device '/dev/sdb'.
Dec 23 21:06:55 GLaDOS ntpd[1806]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Dec 23 21:11:00 GLaDOS root: Fix Common Problems Version 2020.12.19
Dec 23 21:11:13 GLaDOS root: Fix Common Problems: Other Warning: Background notifications not enabled
Dec 23 21:33:20 GLaDOS cache_dirs: Arguments=-l off
Dec 23 21:33:20 GLaDOS cache_dirs: Max Scan Secs=10, Min Scan Secs=1
Dec 23 21:33:20 GLaDOS cache_dirs: Scan Type=adaptive
Dec 23 21:33:20 GLaDOS cache_dirs: Min Scan Depth=4
Dec 23 21:33:20 GLaDOS cache_dirs: Max Scan Depth=none
Dec 23 21:33:20 GLaDOS cache_dirs: Use Command='find -noleaf'
Dec 23 21:33:20 GLaDOS cache_dirs: ---------- Caching Directories ---------------

glados-diagnostics-20201223-2158.zip

JorgeB · December 23, 2020

Don't see any issues in the snippets posted, or in the syslog.

brent3000 · December 23, 2020

Well thats not good

I assume its part of the fact its a headless system normally but is there a reason the display just shows a black screen on a crash? I assume unraid dosnt have a MS BSOD type system for any on-screen errors?

JorgeB · December 24, 2020

One thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

brent3000 · December 26, 2020

Ok so for the good part of the day I have been ramming data onto the system to try and stress it with both transfer and handbreak encoding (trying to do a bunch of things to hit the drive)

Still randomly after a good day of hitting it hard it out of no where drops off, whats odd is the time it happened, around 2am (or close to it as i dont have a timestamp) which is around when the mover is due to kick off nomrally,

Any thoughts?

There was no other actions happening apart from a file transfer (100gb at the time) but as part of the data for the day i have been hitting it with around 2tb of data transactions to test some recent changes (HW replacements)

Dec 27 00:33:42 GLaDOS kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethe106026: link becomes ready
Dec 27 00:33:42 GLaDOS kernel: docker0: port 1(vethe106026) entered blocking state
Dec 27 00:33:42 GLaDOS kernel: docker0: port 1(vethe106026) entered forwarding state
Dec 27 00:51:27 GLaDOS kernel: veth80a2f59: renamed from eth0
Dec 27 00:51:27 GLaDOS kernel: docker0: port 1(vethe106026) entered disabled state
Dec 27 00:51:27 GLaDOS kernel: docker0: port 1(vethe106026) entered disabled state
Dec 27 00:51:27 GLaDOS kernel: device vethe106026 left promiscuous mode
Dec 27 00:51:27 GLaDOS kernel: docker0: port 1(vethe106026) entered disabled state
Dec 27 01:43:15 GLaDOS crond[1825]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Dec 27 01:45:40 GLaDOS emhttpd: shcmd (240): /usr/local/sbin/mover &> /dev/null &
Dec 27 02:09:10 GLaDOS cache_dirs: Arguments=-l off
Dec 27 02:09:10 GLaDOS cache_dirs: Max Scan Secs=10, Min Scan Secs=1
Dec 27 02:09:10 GLaDOS cache_dirs: Scan Type=adaptive
Dec 27 02:09:10 GLaDOS cache_dirs: Min Scan Depth=4
Dec 27 02:09:10 GLaDOS cache_dirs: Max Scan Depth=none
Dec 27 02:09:10 GLaDOS cache_dirs: Use Command='find -noleaf'
Dec 27 02:09:10 GLaDOS cache_dirs: ---------- Caching Directories ---------------
Dec 27 02:09:10 GLaDOS cache_dirs: Movies
Dec 27 02:09:10 GLaDOS cache_dirs: Music
Dec 27 02:09:10 GLaDOS cache_dirs: Network Apps
Dec 27 02:09:10 GLaDOS cache_dirs: TV Shows
Dec 27 02:09:10 GLaDOS cache_dirs: appdata
Dec 27 02:09:10 GLaDOS cache_dirs: backups
Dec 27 02:09:10 GLaDOS cache_dirs: domains
Dec 27 02:09:10 GLaDOS cache_dirs: iso archives
Dec 27 02:09:10 GLaDOS cache_dirs: isos
Dec 27 02:09:10 GLaDOS cache_dirs: system
Dec 27 02:09:10 GLaDOS cache_dirs: torrent
Dec 27 02:09:10 GLaDOS cache_dirs: zCacheStore
Dec 27 02:09:10 GLaDOS cache_dirs: zPublic
Dec 27 02:09:10 GLaDOS cache_dirs: zTehPurge
Dec 27 02:09:10 GLaDOS cache_dirs: ----------------------------------------------
Dec 27 02:09:10 GLaDOS cache_dirs: Setting Included dirs: 
Dec 27 02:09:10 GLaDOS cache_dirs: Setting Excluded dirs: 
Dec 27 02:09:10 GLaDOS cache_dirs: min_disk_idle_before_restarting_scan_sec=60
Dec 27 02:09:10 GLaDOS cache_dirs: scan_timeout_sec_idle=150
Dec 27 02:09:10 GLaDOS cache_dirs: scan_timeout_sec_busy=30
Dec 27 02:09:10 GLaDOS cache_dirs: scan_timeout_sec_stable=30
Dec 27 02:09:10 GLaDOS cache_dirs: frequency_of_full_depth_scan_sec=604800

glados-diagnostics-20201227-0209.zip

System becomes unresponsive during transfer

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation