Jump to content

johnnie.black

Members
  • Content Count

    19498
  • Joined

  • Last visited

  • Days Won

    236

Everything posted by johnnie.black

  1. Note that some data will be corrupt on disk1, since you're running btrfs a scrub will identify which files.
  2. Also and if a full backup is not an option for you consider at least adding a second parity drive, while not a replacement for backups it's a very small price to pay for the added redundancy.
  3. You can't, first make sure disk1 really failed, i.e., not a cable or other problem, if it did but it's still partially readable use ddrecuse on both disk1 and disk2 to recover as much data as possible.
  4. Disk2 is failing: Nov 16 18:19:03 Tower kernel: res 51/40:20:c0:ec:3e/00:00:14:00:00/e0 Emask 0x9 (media error) ... Nov 16 18:41:23 Tower kernel: md: disk2 read error, sector=2352394520 Nov 16 18:41:23 Tower kernel: md: disk2 read error, sector=2352394528 Nov 16 18:41:23 Tower kernel: md: disk2 read error, sector=2352394536 Nov 16 18:41:23 Tower kernel: md: disk2 read error, sector=2352394544 Nov 16 18:41:23 Tower kernel: md: disk2 read error, sector=2352394552
  5. See if you can get just the syslog
  6. That's normal since it's mounting the emulated disk. Btrfs disks can take several seconds to mount, an emulated btrfs disk can take much longer since data is being reconstructed from all the other disks, I don't see any error on the log, have you waited a few minutes to see if it's just taking a longer time than usual as expected?
  7. Please post the diagnostics: Tools -> Diagnostics
  8. We need diags and the name of the share you're writing to.
  9. You have your server exposed to the internet and the log is being spammed with login attempts, and because of that there are several hours missing, including when the problem started, correct that, reboot and post new diags when the issue starts again.
  10. Unraid uses hdparm to spin down drives, and hdparm doesn't work with SAS devices, they need sdparm, LT is working on adding support, but it's not a simple thing, especially with limited hardware available for testing.
  11. Syslog snippets are seldom useful, post the full diagnostics, but likely you attempted to pass trough a disk controller and disabled one of the disks connected there.
  12. You can also disable c-states in the bios, or better yet updated your bios and look for a "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar).
  13. Yes, though for completion let me add that if the RAM was already bad when the data was originally written to cache it could still be corrupt, since in that case it could be corrupted in RAM and btrfs would store an already corrupted checksum, to avoid that you'd need ECC RAM, without ECC btrfs can still detect if data is being corrupted on read.
  14. You're welcome, lucky you're using btrfs at least on cache or data would be corrupted when moved to the array, and it could be some time before you notice the problem.
  15. Is your RAM ECC? With ECC you shouldn't get any RAM errors, even in memtest, either ECC corrects the error or halts the computer if it can't.
  16. Checksum errors are usually from bad RAM, data could be getting corrupt in RAM, you could try running a scrub but should give the same results as btrfs check --check-data-csum, and run memtest.
  17. Bypassing the expander, in your case connecting to LSI onboard ports. Yep
  18. Diags are just after rebooting, not much to see, though disk5 appear to be suffering from a connection issue, wait for the problem to re-occur then grab new diags before rebooting.
  19. You need to specify an IP address to each 10GbE NIC, both on the same subnet but different from the current gigabit subnet, on Unraid that's done on settings -> network settings Then to make sure 10GbE is used when connecting from the desktop either connect using the IP address or add the Unraid server to the Windows hosts file.
  20. It's fine if you disconnect them when not in use, or e.g. a PSU problem can fry all the drives connected.
  21. That's not an ideal move. If you have an incident that takes out multiple drives, you could easily lose both the primary data and all your backups. Yes, I missed that part of your post, dual parity adds redundancy, it's not a substitute for backups, you're much better off with single parity and backups than dual parity and no backups, I was recommending dual parity with 6+ plus disks + backups.
  22. If you have backups of at least most of it I would say 1 parity is OK for now, but with such high capacity disks I would go for dual parity at around 6 data disks.
  23. There's no general rule, depends on a number of things: -how many disks -size of the disks -how important is the data -existence of backups I generaly have single parity on servers with up to 8 disks, dual parity for with more than 8 disks, but I have at least one full backup of every server.
  24. 24H is recommended for memtest, still it will only be conclusive if errors are detected. But you can try rebooting and running another parity check, if the same happens it's likely hardware related, but possibly not bad RAM.