Jump to content

JorgeB

Moderators
  • Posts

    67,459
  • Joined

  • Last visited

  • Days Won

    706

Everything posted by JorgeB

  1. Yes, just start the array as is, cache will still be unmountable and you can try the recovery options. Not without the diagnostics from when the problem started, also see here for better pool monitoring.
  2. Please post the diagnostics: Tools -> Diagnostics
  3. You had multiple disks in different controllers going offline at the same time: May 28 11:07:50 Tower kernel: sd 9:0:1:0: device_block, handle(0x000a) May 28 11:07:53 Tower kernel: sd 9:0:1:0: device_unblock and setting to running, handle(0x000a) May 28 11:07:53 Tower kernel: sd 9:0:1:0: [sdj] Synchronizing SCSI cache May 28 11:07:53 Tower kernel: sd 9:0:1:0: [sdj] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00 May 28 11:07:53 Tower kernel: mpt2sas_cm0: removing handle(0x000a), sas_addr(0x4433221106000000) May 28 11:07:53 Tower kernel: mpt2sas_cm0: enclosure logical id(0x500605b004dce890), slot(5) May 28 11:07:53 Tower rc.diskinfo[28375]: SIGHUP received, forcing refresh of disks info. May 28 11:08:07 Tower kernel: ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen May 28 11:08:07 Tower kernel: ata5: SError: { PHYRdyChg } May 28 11:08:07 Tower kernel: ata5.00: failed command: WRITE DMA EXT May 28 11:08:07 Tower kernel: ata5.00: cmd 35/00:08:c0:21:cf/00:00:b8:01:00/e0 tag 0 dma 4096 out May 28 11:08:07 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x14 (ATA bus error) May 28 11:08:07 Tower kernel: ata5.00: status: { DRDY } May 28 11:08:07 Tower kernel: ata5: hard resetting link May 28 11:08:08 Tower kernel: ata5: SATA link down (SStatus 0 SControl 300) May 28 11:08:14 Tower kernel: ata5: hard resetting link May 28 11:08:14 Tower kernel: ata5: SATA link down (SStatus 0 SControl 300) May 28 11:08:19 Tower kernel: ata5: hard resetting link May 28 11:08:20 Tower kernel: ata5: SATA link down (SStatus 0 SControl 300) May 28 11:08:20 Tower kernel: ata5.00: disabled May 28 11:08:20 Tower kernel: sd 4:0:0:0: [sdf] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 May 28 11:08:20 Tower kernel: sd 4:0:0:0: [sdf] tag#0 Sense Key : 0x5 [current] May 28 11:08:20 Tower kernel: sd 4:0:0:0: [sdf] tag#0 ASC=0x21 ASCQ=0x4 May 28 11:08:20 Tower kernel: sd 4:0:0:0: [sdf] tag#0 CDB: opcode=0x8a 8a 00 00 00 00 01 b8 cf 21 c0 00 00 00 08 00 00 May 28 11:08:20 Tower kernel: print_req_error: I/O error, dev sdf, sector 7395549632 May 28 11:08:20 Tower kernel: md: disk2 write error, sector=7395549568 May 28 11:08:20 Tower kernel: sd 4:0:0:0: rejecting I/O to offline device May 28 11:08:20 Tower kernel: sd 4:0:0:0: [sdf] killing request May 28 11:08:20 Tower kernel: sd 4:0:0:0: rejecting I/O to offline device ### [PREVIOUS LINE REPEATED 1 TIMES] ### May 28 11:08:20 Tower kernel: sd 4:0:0:0: [sdf] UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 May 28 11:08:20 Tower kernel: sd 4:0:0:0: rejecting I/O to offline device May 28 11:08:20 Tower kernel: ata5: EH complete May 28 11:08:20 Tower kernel: sd 4:0:0:0: [sdf] CDB: opcode=0x88 88 00 00 00 00 00 03 f5 30 a8 00 00 00 08 00 00 May 28 11:08:20 Tower kernel: print_req_error: I/O error, dev sdf, sector 66400424 May 28 11:08:20 Tower kernel: ata5.00: detaching (SCSI 4:0:0:0) May 28 11:08:20 Tower kernel: sd 4:0:0:0: [sdf] Synchronizing SCSI cache May 28 11:08:20 Tower kernel: sd 4:0:0:0: [sdf] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00 May 28 11:08:20 Tower kernel: sd 4:0:0:0: [sdf] Stopping disk May 28 11:08:20 Tower kernel: sd 4:0:0:0: [sdf] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00 May 28 11:08:20 Tower rc.diskinfo[28375]: SIGHUP received, forcing refresh of disks info. May 28 11:08:21 Tower kernel: ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen May 28 11:08:21 Tower kernel: ata6: SError: { PHYRdyChg } May 28 11:08:21 Tower kernel: ata6.00: failed command: READ DMA EXT May 28 11:08:21 Tower kernel: ata6.00: cmd 25/00:08:f0:58:df/00:00:2a:00:00/e0 tag 0 dma 4096 in May 28 11:08:21 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error) May 28 11:08:21 Tower kernel: ata6.00: status: { DRDY } May 28 11:08:21 Tower kernel: ata6: hard resetting link May 28 11:08:21 Tower kernel: md: disk2 read error, sector=7294697728 May 28 11:08:21 Tower kernel: md: disk2 read error, sector=7294697736 May 28 11:08:21 Tower kernel: md: disk2 read error, sector=7294697744 May 28 11:08:21 Tower kernel: md: disk2 read error, sector=7294697752 May 28 11:08:21 Tower kernel: md: disk2 read error, sector=7294697760 May 28 11:08:21 Tower kernel: md: disk2 read error, sector=7294697768 May 28 11:08:21 Tower kernel: md: disk2 read error, sector=7294697776 May 28 11:08:21 Tower kernel: md: disk2 read error, sector=7294697784 May 28 11:08:21 Tower kernel: md: disk2 read error, sector=11995447336 May 28 11:08:21 Tower kernel: md: disk2 read error, sector=66400360 May 28 11:08:21 Tower kernel: ata6: SATA link down (SStatus 0 SControl 300) May 28 11:08:27 Tower kernel: ata6: hard resetting link May 28 11:08:27 Tower kernel: ata6: SATA link down (SStatus 0 SControl 300) May 28 11:08:33 Tower kernel: ata6: hard resetting link May 28 11:08:33 Tower kernel: ata6: SATA link down (SStatus 0 SControl 300) May 28 11:08:33 Tower kernel: ata6.00: disabled This suggests a power/connection problem.
  4. One of the devices is considered missing (despite being assigned), this suggests the superblock is damaged, and it's failing to mount read/write with a single device due to no redundancy, at least partially (this could be the result of the pool not being redundant if it was created during v6.7.x due to a bug): May 28 12:33:14 unRAID kernel: BTRFS warning (device sdc1): chunk 101054414848 missing 1 devices, max tolerance is 0 for writeable mount May 28 12:33:14 unRAID kernel: BTRFS warning (device sdc1): writeable mount is not allowed due to too many missing devices You should be able to recover the data by mounting read only, see here.
  5. Does the HBA have a BIOS flashed? If yes you could delete or disable it, it's not needed for Unraid.
  6. It could possibly work when the upcoming v6.9-rc1 is released since it will use a newer kernel (and rtl driver), for now best bet is to use an add-on NIC.
  7. It is, but did the server crash during that period? I do see a macvlan related call trace, those are usually cause by having a docker with a custom IP address:
  8. https://www.geeksforgeeks.org/soft-hard-links-unixlinux/
  9. Yes, restore the config from the other flash drive but keep current key. Yes.
  10. Issue doesn't appear to be encryption related.
  11. I would keep it for now, monitor SMART, if those reported uncorrectable errors continue to increase it's probably a good idea to replace it. Likely related to this:
  12. The other threads are in the general support forum, this is the bug report section and there's no duplicate bug reports about this, at least AFAIK, and no point in having duplicates here, any other users with the same issue are encourage to add to this report, the more people affected the more likely it will be prioritized. That's not always easy to choose, I'll be leaving as minor for now since most users affected don't have major issues, it can always be changed later, but regardless it will have LT's attention. As for your issues I would try upgrading to v6.9-beta1 to see if it helps, you can also try running the mover during off hours, when there's no docker activity, since that might be one of the triggers, i.e., trying to move a file after it was moved/deleted by a docker.
  13. Like mentioned that's not a reason for a disk to became disable, if the power goes out there's no time or chance for Unraid to disable it/them. Yes, sorry, missed that, there's nothing logged before the crash, this suggests a hardware problem.
  14. That's not a motive to disable disks. We need the syslog after the problem happens and before rebooting or there's noting to see.
  15. Dou you mean the initial sync completes but then crashes on a parity check? Diags are after rebooting, enable the syslog server/mirror and post it after it crashes.
  16. On the low side, this is mine after about 3 years:
  17. Best bet it to ask in the plugin support thread:
  18. Are you trying to connect the NetApp to a Mellanox NIC? If that's the case and like @Bensonit will never work, you need a HBA, like an LSI.
  19. Yes, and if any more issues post new diags (before rebooting)
  20. You can set up a syslog server, though that's used more for troubleshooting when the server keeps crashing or similar, for this kind of problem you just need to download the diagnostics before rebooting.
  21. You can go directly to latest but it's a good idea to read the v6.5 upgrade notes and the release notes at least for the major point releases.
  22. Do a manual upgrade, old versions were deleted from the cloud.
  23. This is more likely a hardware issue, but please post the diagnostics, might be some clues there.
  24. You rebooted since the errors so we can't see what happened, for now I would recommend unassigning both disabled disks and starting the array, check that both emulated disk mount correctly and contents look OK.
  25. You can't do that, parity swap procedure needs to be done from start to finish or it won't work, any change will abort it.
×
×
  • Create New...