Jump to content

JorgeB

Moderators
  • Posts

    67,397
  • Joined

  • Last visited

  • Days Won

    705

Everything posted by JorgeB

  1. Log is filled with these: Feb 5 04:20:02 TOWER rc.diskinfo[12971]: PHP Warning: exec(): Unable to fork [timeout -s 9 60 /bin/lsblk -nbP -o name,type,size,mountpoint,fstype,label 2>/dev/null] in /etc/rc.d/rc.diskinfo on line 361 Feb 5 04:20:03 TOWER rc.diskinfo[20296]: PHP Warning: exec(): Unable to fork [timeout -s 9 60 /bin/lsblk -nbP -o name,type,size,mountpoint,fstype,label 2>/dev/null] in /etc/rc.d/rc.diskinfo on line 361 Any idea what you have installed causing this? If not try booting in safe mode.
  2. Diags are after rebooting so we can't see what happened, but the disk looks fine, replace/swap cables and re-sync parity, if it happens again grab diags before rebooting.
  3. Is the RAM being overclocked? It's known to cause instability on some Ryzen systems, more info here.
  4. There are some users complaining of a slowdown with SMB, iperf should still give full speed, and until it does transfers won't be faster, or at least never faster than iperf results.
  5. Those look to be ECC corrected errors, it still means there's a problem, you don't want to have those errors regularly, try one DIMM at a time.
  6. Try editing /config/domain.cfg and delete this line: DISKDIR="/mnt/" Also, the image is 100GB, default is 1GB and plenty enough.
  7. Failure to unmount means something is still using the disk, diags that are saved to flash after a shutdown timeout might show the problem.
  8. Every time there's a read error Unraid first tries to recalculated that sector using parity and the other drives and then write it back, it this case it couldn't, so it didn't disable the disk, it's in the syslog. Looks like it, problems initializing all disks, check all connections/power: Feb 4 20:12:23 Tower kernel: sd 5:0:0:0: rejecting I/O to offline device Feb 4 20:12:23 Tower kernel: ldm_validate_partition_table(): Disk read failed. Feb 4 20:12:23 Tower kernel: sd 5:0:0:0: rejecting I/O to offline device Feb 4 20:12:23 Tower kernel: sdb: unable to read partition table Feb 4 20:12:23 Tower kernel: sd 5:0:0:0: [sdb] Attached SCSI disk Feb 4 20:12:23 Tower kernel: mpt2sas_cm0: log_info(0x31110630): originator(PL), code(0x11), sub_code(0x0630) Feb 4 20:12:23 Tower kernel: sd 5:0:1:0: Power-on or device reset occurred Feb 4 20:12:23 Tower kernel: sd 5:0:2:0: Power-on or device reset occurred Feb 4 20:12:23 Tower kernel: mpt2sas_cm0: log_info(0x31110630): originator(PL), code(0x11), sub_code(0x0630) ### [PREVIOUS LINE REPEATED 1 TIMES] ### Feb 4 20:12:23 Tower kernel: sd 5:0:1:0: [sdc] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Feb 4 20:12:23 Tower kernel: sd 5:0:1:0: [sdc] tag#1 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 Feb 4 20:12:23 Tower kernel: print_req_error: I/O error, dev sdc, sector 0 Feb 4 20:12:23 Tower kernel: Buffer I/O error on dev sdc, logical block 0, async page read Feb 4 20:12:23 Tower kernel: sd 5:0:1:0: rejecting I/O to offline device Feb 4 20:12:23 Tower kernel: ldm_validate_partition_table(): Disk read failed. Feb 4 20:12:23 Tower kernel: sd 5:0:1:0: rejecting I/O to offline device Feb 4 20:12:23 Tower kernel: sdc: unable to read partition table Feb 4 20:12:23 Tower kernel: sd 5:0:1:0: [sdc] Attached SCSI disk Feb 4 20:12:23 Tower kernel: sd 5:0:2:0: Power-on or device reset occurred Feb 4 20:12:23 Tower kernel: mpt2sas_cm0: log_info(0x31110630): originator(PL), code(0x11), sub_code(0x0630) Feb 4 20:12:23 Tower kernel: sd 5:0:2:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Feb 4 20:12:23 Tower kernel: sd 5:0:2:0: [sdd] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 Feb 4 20:12:23 Tower kernel: print_req_error: I/O error, dev sdd, sector 0 Feb 4 20:12:23 Tower kernel: Buffer I/O error on dev sdd, logical block 0, async page read Feb 4 20:12:23 Tower kernel: sd 5:0:2:0: rejecting I/O to offline device Feb 4 20:12:23 Tower kernel: ldm_validate_partition_table(): Disk read failed. Feb 4 20:12:23 Tower kernel: sd 5:0:2:0: rejecting I/O to offline device Feb 4 20:12:23 Tower kernel: sdd: unable to read partition table Feb 4 20:12:23 Tower kernel: sd 5:0:2:0: [sdd] Attached SCSI disk
  9. You can try this, listing hardware used or posting diags might also give some clues.
  10. Does it show up in the BIOS? Some newer disks have a PWDIS feature that won't power up if 3.3v is supplied, google "wd 3.3v sata pin"
  11. Very likely, diagnostics should show the problem.
  12. Yes. All should be back to normal, luckily Unraid failed to write back since multiple disks got disconnected at the same time, so no disk was disabled.
  13. That can't be, unless you're past the capacity on those disks, parity is calculated bit by bit, doesn't matter what data, filesystem or shares options are on those disks.
  14. Scrub verifies checksums for all blocks on a btrfs filesystem, option is available after clicking on that device on the main GUI page, both for cache and array device.
  15. I read the explanation some time ago but don't remember the details, I do remember it's not necessarily an error.
  16. Always yes, but the file might or not be OK, btrfs restore doesn't verify checksums, it's a recovery option, so priority is to recover anything.
  17. Power loss protection is present mostly on enterprise devices, array devices can also get filesystem corruption after an unclean shutdown/unmount, and parity can't help with that.
  18. Problem was a corrupt filesystem due to unclean unmount (in this case caused by the device dropping offline), and no, for this a mirror wouldn't help, an SSD with power loss protection might have. There's a VM backup plugin, you can also use snapshots together with send/receive, point is, anything important needs to be regularly backed up.
  19. Usually just correctly setting the power supply idle control is enough and it's the preferred solution, if it still crashes after doing that you can try the other options. IMHO servers and overclock don't really go well together, but at least recommend starting with stocks speeds, only if you can get it stable then think of overclocking, so you know if doing that is introducing stability issues.
  20. There's a connection problem with disk11, these are repeating constantly on the logs: Feb 3 15:09:06 nabit kernel: ata20.00: status: { DRDY } Feb 3 15:09:06 nabit kernel: ata20: hard resetting link Feb 3 15:09:16 nabit kernel: ata20: softreset failed (device not ready) Feb 3 15:09:16 nabit kernel: ata20: hard resetting link Feb 3 15:09:17 nabit kernel: ata20: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Feb 3 15:09:18 nabit kernel: ata20.00: configured for UDMA/133 Feb 3 15:09:18 nabit kernel: ata20: EH complete Feb 3 15:09:31 nabit kernel: ata20.00: exception Emask 0x10 SAct 0xffffffff SErr 0x190002 action 0xe frozen Feb 3 15:09:31 nabit kernel: ata20.00: irq_stat 0x80400000, PHY RDY changed Feb 3 15:09:31 nabit kernel: ata20: SError: { RecovComm PHYRdyChg 10B8B Dispar } Feb 3 15:09:31 nabit kernel: ata20.00: failed command: READ FPDMA QUEUED Feb 3 15:09:31 nabit kernel: ata20.00: cmd 60/08:00:00:6a:e4/00:00:67:01:00/40 tag 0 ncq dma 4096 in Feb 3 15:09:31 nabit kernel: res 40/00:00:68:0a:d6/00:00:29:01:00/40 Emask 0x10 (ATA bus error) Feb 3 15:09:31 nabit kernel: ata20.00: status: { DRDY } Feb 3 15:09:31 nabit kernel: ata20.00: failed command: READ FPDMA QUEUED Feb 3 15:09:31 nabit kernel: ata20.00: cmd 60/40:08:88:8e:e9/00:00:2c:02:00/40 tag 1 ncq dma 32768 in Feb 3 15:09:31 nabit kernel: res 40/00:00:68:0a:d6/00:00:29:01:00/40 Emask 0x10 (ATA bus error) Feb 3 15:09:31 nabit kernel: ata20.00: status: { DRDY } That won't make any difference in this case, since it doesn't affect data on disk11, the connection problems likely explain the issues you been having, replace both cables. File system is mounting correctly now, so nothing to fix about that for now. P.S. Unrelated to this but there are also a few out of memory errors on the logs.
  21. There are some btrfs recovery options here, try them in order, but for this case btrfs restore is likely the one that will work best.
  22. You should also read the release notes, at least for the major releases, like v6.7.0 and v6.8.0
  23. I know they were doing the work for xfs to support reflinking, not sure if it's already working, easy to try though, just copy a large file/vdisk, if the copy is instantaneous and no extra space is used it's already working. cp --reflink=always /path/to/souce /path/to/dest Note: filesystem needs to have been recently created, with Unraid 6.8.x
×
×
  • Create New...