Jump to content

JorgeB

Moderators
  • Posts

    67,108
  • Joined

  • Last visited

  • Days Won

    703

Everything posted by JorgeB

  1. ddrescue can recover partial files, say you have a movie and there's a bad sector on the middle of it, it will fail to copy normally, ddrescue could recover 99.99% of the file, making it playable but possible resulting in a very small glitch during playback, but can be better than no file at all if you don't have backups.
  2. Run a balance, only for the metadata: btrfs balance start -mconvert=raid1 /mnt/cache
  3. -Tools -> New Config -> Retain current configuration: All -> Apply -assign a new disk12, start the array, use for example UD to mount old disk and copy to new disk (array) before or after syncing parity Any LSI with a SAS2008/2308/3008 chipset in IT mode, e.g., 9201-8i, 9211-8i, 9207-8i, 9300-8i, 9400-8i, etc and clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflashed.
  4. Disk is really failing, you now have a few options to try and recover as much data as possible, in order of what I consider best to worst: 1)-use ddrescue to clone the disk to a new one then rebuild parity 2)-copy all the data you can from that disk to another then rebuild parity 3)-connect that disk to a different controller, sync parity, it should finish with some read errors, after parity is synced replace disk, IMHO this is the worst option because there's no way to know which files are corrupt unless you have checksums.
  5. Already said that, parity is invalid, disk12 is likely failing, but most data should be recoverable, wait for the SMART test to finish to see how best to proceed.
  6. Data is still there, but if the disk is failing you might not be able to get it all.
  7. Disk12 doesn't look very good, lots of read raw errors and a pending sector, but let's wait for the SMART test to confirm. You can't currently replace disk12 since parity is invalid, you might be able to re-sync parity (completely or mostly) depending on if disk12 is really failing or not, and if yes how bad it is, part of the problem might also be the SASLP controller that completely crashed (likely when the read error was detected) and dropped the disk offline (these controllerers are not recommended for some time now), using a different controller should allow you to sync most of parity even if the disk is starting to fail, another option would be to clone the disk with ddrescue and then re-sync parity, either way you should be able to recover most data.
  8. No point in running xfs_repair on the emulated disk, there will be a lot of data corruption because parity isn't valid. Run an extended SMART test on disk12 and post new SMART report when done, it will take a few hours.
  9. OK, so fs corruption is expected since parity is invalid, need SMART report for disk12.
  10. That's filesystem corruption on disk12, if you can't get the diags get the SMART report for disk12, also is disk12 still enable (green icon)?
  11. It's usually under a minute, if it doesn't work reboot, you'll need to anyway since disk12 dropped offline, and grab diags then.
  12. Please post the diagnostics: Tools -> Diagnostics
  13. Server has a memory problem, most are been corrected: Jun 5 05:59:39 Tower kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR But if there is an uncorrectable error server will halt to prevent data corruption, if there is one check the board's system event log, there might be more info there.
  14. It shouldn't be needed if at least one balance has been made on a recent kernel, it might need one if pool is old and never balanced, yours could use a balance but the way it was it still wouldn't cause issues, at least when the diags were download, it could have before if it was fuller.
  15. Copy all the bz* files from the older release to the flashdrive overwriting the existing ones and reboot.
  16. Not done anything, but if it's kernel related going to the previous release you were using will solve the problem.
  17. If the same thing happens it wasn't because of the update.
  18. You might need to wait for a newer kernel, a board bios and/or HBA firmware upgrades might also help.
  19. Macvlan call traces are usually related to dockers with custom IP addresses.
  20. Kernel oops during device discovery, possibly a kernel issue: Jun 30 23:54:59 unWejaton kernel: sas: phy-14:0 added to port-14:0, phy_mask:0x1 (5003048001dfecff) Jun 30 23:54:59 unWejaton kernel: sas: DOING DISCOVERY on port 0, pid:2363 Jun 30 23:54:59 unWejaton kernel: BUG: unable to handle kernel paging request at ffff8880ffa7b360 Jun 30 23:54:59 unWejaton kernel: PGD 2401067 P4D 2401067 PUD 0 Jun 30 23:54:59 unWejaton kernel: Oops: 0000 [#1] SMP PTI Jun 30 23:54:59 unWejaton kernel: CPU: 11 PID: 2363 Comm: kworker/u112:1 Not tainted 4.19.56-Unraid #1 Jun 30 23:54:59 unWejaton kernel: Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3305 06/22/2016 Jun 30 23:54:59 unWejaton kernel: Workqueue: 0000:82:00.0_disco_q sas_discover_domain [libsas] Jun 30 23:54:59 unWejaton kernel: RIP: 0010:pm80xx_chip_smp_req+0x22d/0x36c [pm80xx] Jun 30 23:54:59 unWejaton kernel: Code: c7 c6 10 1e 13 a0 48 c7 c7 8f 5b 13 a0 e8 3f 25 f6 e0 49 8d 44 24 1c 45 31 ff 48 89 44 24 08 41 39 ef 73 87 8b 83 48 44 00 00 <47> 0f b6 44 3d 00 83 e0 08 41 83 ff 0f 77 18 85 c0 47 88 44 3c 0c Jun 30 23:54:59 unWejaton kernel: RSP: 0018:ffffc9000812bbf8 EFLAGS: 00010093 Jun 30 23:54:59 unWejaton kernel: RAX: 0000000000000001 RBX: ffff88905a6f0000 RCX: 0000000000000080 Jun 30 23:54:59 unWejaton kernel: RDX: 00000000ffa7b360 RSI: 0000000000000001 RDI: ffff88885f027200 Jun 30 23:54:59 unWejaton kernel: RBP: 0000000000000008 R08: 0000000000000040 R09: 0000000000000002 Jun 30 23:54:59 unWejaton kernel: R10: ffff889059bff4e0 R11: ffff88885a4d0e00 R12: ffffc9000812bc0c Jun 30 23:54:59 unWejaton kernel: R13: ffff8880ffa7b360 R14: ffff889044801080 R15: 0000000000000000 Jun 30 23:54:59 unWejaton kernel: FS: 0000000000000000(0000) GS:ffff88885f6c0000(0000) knlGS:0000000000000000 Jun 30 23:54:59 unWejaton kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 30 23:54:59 unWejaton kernel: CR2: ffff8880ffa7b360 CR3: 0000000001e0a001 CR4: 00000000003606e0 Jun 30 23:54:59 unWejaton kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 30 23:54:59 unWejaton kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 30 23:54:59 unWejaton kernel: Call Trace: Jun 30 23:54:59 unWejaton kernel: pm8001_task_exec.isra.2+0x2ba/0x3c1 [pm80xx] Jun 30 23:54:59 unWejaton kernel: ? smp_task_timedout+0x44/0x44 [libsas] Jun 30 23:54:59 unWejaton kernel: smp_execute_task_sg+0xf8/0x224 [libsas] Jun 30 23:54:59 unWejaton kernel: smp_execute_task+0x52/0x6e [libsas] Jun 30 23:54:59 unWejaton kernel: sas_discover_expander.part.6+0x7d/0x542 [libsas] Jun 30 23:54:59 unWejaton kernel: sas_discover_root_expander+0x46/0xd2 [libsas] Jun 30 23:54:59 unWejaton kernel: sas_discover_domain+0x4a5/0x59b [libsas] Jun 30 23:54:59 unWejaton kernel: process_one_work+0x16e/0x24f Jun 30 23:54:59 unWejaton kernel: ? pwq_unbound_release_workfn+0xb7/0xb7 Jun 30 23:54:59 unWejaton kernel: worker_thread+0x1dc/0x2ac Jun 30 23:54:59 unWejaton kernel: kthread+0x10b/0x113 Jun 30 23:54:59 unWejaton kernel: ? kthread_park+0x71/0x71 Jun 30 23:54:59 unWejaton kernel: ret_from_fork+0x35/0x40 Jun 30 23:54:59 unWejaton kernel: Modules linked in: acpi_cpufreq(-) sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate ipmi_ssif intel_uncore pm80xx(+) mxm_wmi wmi_bmof igb libsas intel_rapl_perf mlx4_core(+) sr_mod ahci i2c_i801 scsi_transport_sas i2c_algo_bit i2c_core libahci cdrom wmi ipmi_si pcc_cpufreq acpi_power_meter acpi_pad button Jun 30 23:54:59 unWejaton kernel: CR2: ffff8880ffa7b360 Jun 30 23:54:59 unWejaton kernel: ---[ end trace 30c9b447b115a633 ]---
  21. Try booting in safe mode (also stop docker service) and use it like that for a few days to see if there's still a problem, if yes it's likely hardware related.
×
×
  • Create New...