Jump to content

JorgeB

Moderators
  • Posts

    67,572
  • Joined

  • Last visited

  • Days Won

    707

Everything posted by JorgeB

  1. Current pool is just one device (sdj), and it has read errors: Feb 9 07:26:12 Tower kernel: ata8.00: status: { DRDY ERR } Feb 9 07:26:12 Tower kernel: ata8.00: error: { UNC } Feb 9 07:26:12 Tower kernel: ata8.00: supports DRM functions and may not be fully accessible Feb 9 07:26:12 Tower kernel: ata8.00: disabling queued TRIM support Feb 9 07:26:12 Tower kernel: ata8.00: supports DRM functions and may not be fully accessible Feb 9 07:26:12 Tower kernel: ata8.00: disabling queued TRIM support Feb 9 07:26:12 Tower kernel: ata8.00: configured for UDMA/133 Feb 9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Feb 9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 Sense Key : 0x3 [current] Feb 9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 ASC=0x11 ASCQ=0x4 Feb 9 07:26:12 Tower kernel: sd 9:0:0:0: [sdj] tag#20 CDB: opcode=0x28 28 00 24 f3 91 e0 00 00 08 00 Feb 9 07:26:12 Tower kernel: print_req_error: I/O error, dev sdj, sector 619942370 Feb 9 07:26:12 Tower kernel: BTRFS error (device sdj1): bdev /dev/sdj1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 It's logged as an actual device error, run an extended SMART test on the SSD.
  2. Unraid sends the spin up command before unmount, the main issue appears to be these errors right after the spin up command: Feb 9 09:18:56 Odin emhttpd: Spinning up all drives... Feb 9 09:18:56 Odin emhttpd: spinning up /dev/sdm Feb 9 09:18:56 Odin emhttpd: spinning up /dev/sdj Feb 9 09:18:56 Odin emhttpd: spinning up /dev/sdg Feb 9 09:18:56 Odin emhttpd: spinning up /dev/sdr Feb 9 09:18:56 Odin emhttpd: spinning up /dev/sdf Feb 9 09:18:56 Odin emhttpd: spinning up /dev/sds Feb 9 09:18:56 Odin emhttpd: spinning up /dev/sdn Feb 9 09:18:56 Odin emhttpd: spinning up /dev/sdo Feb 9 09:18:56 Odin emhttpd: spinning up /dev/sdi Feb 9 09:19:12 Odin kernel: sd 1:0:16:0: attempting task abort!scmd(0x00000000826327ac), outstanding for 15458 ms & timeout 15000 ms Feb 9 09:19:12 Odin kernel: sd 1:0:16:0: [sdr] tag#2325 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e3 00 Feb 9 09:19:12 Odin kernel: scsi target1:0:16: handle(0x001a), sas_address(0x5003048001f32ea4), phy(36) Feb 9 09:19:12 Odin kernel: scsi target1:0:16: enclosure logical id(0x5003048001f32ebf), slot(20) Feb 9 09:19:12 Odin kernel: scsi target1:0:16: enclosure level(0x0000), connector name( ) Feb 9 09:19:12 Odin kernel: sd 1:0:16:0: device_block, handle(0x001a) Feb 9 09:19:14 Odin kernel: sd 1:0:16:0: device_unblock and setting to running, handle(0x001a) Feb 9 09:19:14 Odin kernel: sd 1:0:16:0: [sdr] Synchronizing SCSI cache Feb 9 09:19:14 Odin kernel: sd 1:0:16:0: [sdr] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00 Feb 9 09:19:14 Odin rc.diskinfo[9313]: SIGHUP received, forcing refresh of disks info. Feb 9 09:19:14 Odin kernel: scsi 1:0:16:0: task abort: SUCCESS scmd(0x00000000826327ac) Feb 9 09:19:14 Odin kernel: sd 1:0:17:0: attempting task abort!scmd(0x000000008c9f9c76), outstanding for 18152 ms & timeout 15000 ms Feb 9 09:19:14 Odin kernel: sd 1:0:17:0: [sds] tag#2446 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e3 00 Feb 9 09:19:14 Odin kernel: scsi target1:0:17: handle(0x001b), sas_address(0x5003048001f32ea6), phy(38) Feb 9 09:19:14 Odin kernel: scsi target1:0:17: enclosure logical id(0x5003048001f32ebf), slot(22) Feb 9 09:19:14 Odin kernel: scsi target1:0:17: enclosure level(0x0000), connector name( ) Feb 9 09:19:14 Odin kernel: mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x5003048001f32ea4) Feb 9 09:19:14 Odin kernel: mpt3sas_cm0: removing handle(0x001a), sas_addr(0x5003048001f32ea4) Feb 9 09:19:14 Odin kernel: mpt3sas_cm0: enclosure logical id(0x5003048001f32ebf), slot(20) Feb 9 09:19:14 Odin kernel: mpt3sas_cm0: enclosure level(0x0000), connector name( ) Feb 9 09:19:15 Odin kernel: sd 1:0:17:0: device_block, handle(0x001b) Feb 9 09:19:15 Odin kernel: sd 1:0:17:0: task abort: SUCCESS scmd(0x000000008c9f9c76) To me this looks more like an LSI or (LSI + those specific drives) issue, can you test with v6.8.3 to see if it doesn't happen?
  3. Since the array was kept in use old parity is probably no longer in sync, but disk6 looks fine, so I would recommend replacing cables on that disk then do a new config and try to re-sync the new parity again, if you think old parity is still valid do it in maintenance mode so there's a chance to use it if it errors out again.
  4. More recent NICs on newer motherboards require V6.9, no driver in v6.8.
  5. I would suspect more an issue with the combination of LSI + those Seagate drives + Sleep
  6. I would say it's a strong possibility, assuming the disk was spun down: Feb 9 09:19:37 Odin emhttpd: shcmd (11123): umount /mnt/disk11 Feb 9 09:19:37 Odin kernel: XFS (dm-8): Unmounting Filesystem Feb 9 09:19:37 Odin kernel: md: disk11 read error, sector=8589967488 Feb 9 09:19:37 Odin kernel: sd 1:0:17:0: Power-on or device reset occurred Feb 9 09:19:37 Odin kernel: md: disk11 write error, sector=8589967488 Feb 9 09:19:37 Odin kernel: md: disk11 read error, sector=32768 Feb 9 09:19:37 Odin kernel: md: disk11 write error, sector=32768 Feb 9 09:19:37 Odin emhttpd: shcmd (11124): rmdir /mnt/disk11 Error happened during unmount, although the error was immediate, i.e., it's not like the disk took long to respond, so it could be a compatibility issue, try spinning those disks up before shutdown, or if it's a possibility connected them to a different controller, like the onboard SATA ports.
  7. No necessarily a disk problem, could be other hardware issue, run a scrub on the other disks, if more corruption is found it's not likely the disk.
  8. If new cables didn't fix it could be be the board/controller.
  9. Try again, leave remote syslog running.
  10. There are some xfs related crashes, run a filesystem check (without -n) on disk2.
  11. Those appear to be the same you posted before, old cache is still connected causing a lot of ATA errors, remove old cache and post new diags.
  12. If there's data there, stop the array, unassign disk2, start array, see if the emulated disk2 mounts and data looks correct, if yes, re-assign the disk to rebuild on top.
  13. Changed Status to Closed Changed Priority to Other
  14. That errors means something changed in the partition/MBR and it no longer conforms to what Unraid expects, this should not happen out of the blue, but difficult to guess what caused it, if there was data on disk2 you could likely recover it by rebuilding the disk on top, that would re-create the partition, assuming parity is valid.
  15. Yes, and if the issues with the old cache device persist basically confirms that the SSD is the problem.
  16. Still constant ATA errors, it won't work correctly while those are going on.
  17. Rebooting should fix the problem for now, but it won't fix the underlying issue, looks like the enclose lost power or connection with the server.
  18. Your are using the P420i HP raid controller, that's not recommended, but to get SMART reports you need to change the default SMART controller type to "HP cciss", IIRC.
×
×
  • Create New...