Fiffty

Members
  • Posts

    2
  • Joined

  • Last visited

Fiffty's Achievements

Noob

Noob (1/14)

0

Reputation

  1. I have been running into some problems with my instance recently. I noticed the parity drive had been disabled, i ran a SMART test and the drive seemed to be in a good state, so I rebuilt the parity using the same drive after verifying the SATA cables were all properly connected (which I assumed to be the problem). This has worked okay for a couple of hours, but just now one of my data disks got disabled after a couple of read/write errors. Since this is the second time within a day, I am somewhat hesitant to just simply re-build the data disk. I already ordered a new set of SATA cables, but maybe there is something more to it that I'm not seeing? I can't run a SMART test at the moment since the device has dropped offline, but I have not shut the system down to check the cables once more. Smartctl open device: /dev/sde failed: No such device Excerpt from syslog: Jul 26 12:17:40 VAULT kernel: md: disk2 read error, sector=13262550992 Jul 26 12:17:40 VAULT kernel: md: disk2 read error, sector=13262551000 Jul 26 12:17:40 VAULT kernel: md: disk2 read error, sector=13262551008 Jul 26 12:17:40 VAULT kernel: md: disk2 write error, sector=13262550760 Jul 26 12:17:40 VAULT kernel: md: disk2 write error, sector=13262550768 Jul 26 12:17:40 VAULT kernel: md: disk2 write error, sector=13262550776 [...] Jul 26 12:17:53 VAULT kernel: ata6: hard resetting link Jul 26 12:17:53 VAULT kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 320) Jul 26 12:17:56 VAULT kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x100) Jul 26 12:17:56 VAULT kernel: ata6.00: revalidation failed (errno=-5) Jul 26 12:17:58 VAULT kernel: ata6: hard resetting link Jul 26 12:18:01 VAULT kernel: ata6: SATA link down (SStatus 1 SControl 320) Jul 26 12:18:01 VAULT kernel: ata6: limiting SATA link speed to 1.5 Gbps Jul 26 12:18:01 VAULT kernel: ata6: hard resetting link Jul 26 12:18:03 VAULT kernel: ata6: SATA link down (SStatus 1 SControl 310) Jul 26 12:18:03 VAULT kernel: ata6.00: disable device vault-diagnostics-20230726-1230.zip
  2. Hi everyone, I am having a few issues with my unraid server since I moved the disks into a new platform. The new system is X370/Ryzen 1st gen based and initially I had a problem with random system freezes, but since setting rcu_nocbs=0-5 (Ryzen 5 1600), as per this post the system does not hang anymore. However, now the array is randomly dropping offline every few hours (without the server hanging) and I am clueless as to why. I had a look in the logs and apparently the CPU is throwing some MCE codes which is of course rather disconcerting, but with it appearing after the above mentioned fix I am wondering if it is connected? Aug 20 06:14:45 VAULT kernel: mce: [Hardware Error]: Machine check events logged Aug 20 06:14:45 VAULT kernel: mce: [Hardware Error]: CPU 7: Machine Check: 0 Bank 5: bea0000000000108 Aug 20 06:14:45 VAULT kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff816560e2 MISC d012000200000000 SYND 4d000000 IPID 500b000000000 Aug 20 06:14:45 VAULT kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1566274467 SOCKET 0 APIC 3 microcode 8001137 System: Gigabyte Aorus X370 G5 Ryzen 5 1600 16 GiB DDR4-2666 3x Hdd / 2x SATA SSD / 1x nvme SSD unraid 6.7.2 2019-06-25 Plugins: Community Applications, Fix common problems, Dynamix SSD trim, Dynamix File Integrity, PreClear Disks, Nerdtools Docker: Plex, Transmission Diagnostics: here Edit: I just realized I had the rcu_nocbs=0-5 set for cores instead of threads. Now changed it to rcu_nocbs=0-11, will report back if problem goes away. Edit 2: Problem persists