Jump to content

JorgeB

Moderators
  • Posts

    63,674
  • Joined

  • Last visited

  • Days Won

    674

Everything posted by JorgeB

  1. I believe Ryzen support is much better on kernel 4.10, so it should improve a lot on next unRAID release (maybe v6.4-rc?)
  2. Yes to both, SMART was clean, all errors were reported at the same time, so they happened in a matter of seconds/minutes, whichever time the unRAID notification system uses to poll SMART, and there was nothing on the syslog about them.
  3. While I cannot be 100% certain I really doubt it was a coincidence, I make daily btrfs snapshot backups then use btrfs send/receive to do an incremental backup to another disk, if there is a checksum error during the copy (send/receive) it will fail with a read error (btrfs always aborts a read if checksum error is detected), all previous backups were successful, so it would be a very big coincidence this happening the same day I got the retire block error on the SSD.
  4. Also check if there is a setting for disk boot order, not the normal device boot order, one to select which HDD is the first when there's more than one, I don't recall if Intel boards are like that, but on eg Asus the flash drive will appear on the HDD list, and it needs to be in first place of that list for boot to work.
  5. Just would like to share this, most know it's common for HDDs and SSDs to reallocate sectors/blocks, but some probably don't know that there's a chance of data corruption when that happens. It was my feeling this was possible (likely?), but luckily I haven't had reallocated sectors in any of my server's HDDs for many years, I did however had an issue yesterday with one of my SSDs used for VMs, I got these notifications: 12-03-2017 22:53 unRAID device sdk SMART health [187] Warning [TOWER7] - reported uncorrect is 26 SanDisk_SDSSDA120G_153910407249 (sdk) warning 12-03-2017 22:53 unRAID device sdk SMART health [5] Warning [TOWER7] - retired block count is 1 SanDisk_SDSSDA120G_153910407249 (sdk) warning There was nothing is the syslog, ie, this was all handled by the SSD firmware. I have a script doing daily incremental backups of my vdisks, so today I looked at the log and sure enough, there was an error: ERROR: send ioctl failed with -5: Input/output error ERROR: unexpected EOF in stream. Looking at the syslog I could see the reason for the errors: Mar 13 00:09:59 Tower7 kernel: BTRFS warning (device sdk1): csum failed ino 262 off 21441933312 csum 2062942272 expected csum 1983964368 Mar 13 00:09:59 Tower7 kernel: BTRFS warning (device sdk1): csum failed ino 262 off 21441933312 csum 2062942272 expected csum 1983964368 Mar 13 00:10:00 Tower7 kernel: BTRFS warning (device sdk1): csum failed ino 262 off 21441933312 csum 2062942272 expected csum 1983964368 Mar 13 00:10:01 Tower7 kernel: BTRFS warning (device sdk1): csum failed ino 262 off 21441933312 csum 2062942272 expected csum 1983964368 Mar 13 00:10:02 Tower7 kernel: BTRFS warning (device sdk1): csum failed ino 262 off 21441933312 csum 2062942272 expected csum 1983964368 Mar 13 00:10:03 Tower7 kernel: BTRFS warning (device sdk1): csum failed ino 262 off 21441933312 csum 2062942272 expected csum 1983964368 And a scrub confirmed the problem and the affected file: Mar 13 10:24:49 Tower7 kernel: BTRFS warning (device sdk1): checksum error at logical 102987755520 on dev /dev/sdk1, sector 192759352, root 313, inode 262, offset 21441933312, length 4096, links 1 (path: Win8.1/vdisk1.img) Mar 13 10:24:49 Tower7 kernel: BTRFS warning (device sdk1): checksum error at logical 102987755520 on dev /dev/sdk1, sector 192759352, root 407, inode 262, offset 21441933312, length 4096, links 1 (path: Win8.1/vdisk1.img) Mar 13 10:24:49 Tower7 kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 Mar 13 10:24:49 Tower7 kernel: BTRFS error (device sdk1): unable to fixup (regular) error at logical 102987755520 on dev /dev/sdk1 Mar 13 10:24:49 Tower7 kernel: BTRFS warning (device sdk1): checksum error at logical 102987759616 on dev /dev/sdk1, sector 192759360, root 313, inode 262, offset 21441937408, length 4096, links 1 (path: Win8.1/vdisk1.img) Mar 13 10:24:49 Tower7 kernel: BTRFS warning (device sdk1): checksum error at logical 102987759616 on dev /dev/sdk1, sector 192759360, root 407, inode 262, offset 21441937408, length 4096, links 1 (path: Win8.1/vdisk1.img) Mar 13 10:24:49 Tower7 kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 Mar 13 10:24:49 Tower7 kernel: BTRFS error (device sdk1): unable to fixup (regular) error at logical 102987759616 on dev /dev/sdk1 Mar 13 10:24:49 Tower7 kernel: BTRFS warning (device sdk1): checksum error at logical 102987763712 on dev /dev/sdk1, sector 192759368, root 313, inode 262, offset 21441941504, length 4096, links 1 (path: Win8.1/vdisk1.img) Mar 13 10:24:49 Tower7 kernel: BTRFS warning (device sdk1): checksum error at logical 102987763712 on dev /dev/sdk1, sector 192759368, root 407, inode 262, offset 21441941504, length 4096, links 1 (path: Win8.1/vdisk1.img) Mar 13 10:24:49 Tower7 kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 Mar 13 10:24:49 Tower7 kernel: BTRFS error (device sdk1): unable to fixup (regular) error at logical 102987763712 on dev /dev/sdk1 Mar 13 10:24:50 Tower7 kernel: BTRFS warning (device sdk1): checksum error at logical 102987767808 on dev /dev/sdk1, sector 192759376, root 313, inode 262, offset 21441945600, length 4096, links 1 (path: Win8.1/vdisk1.img) Mar 13 10:24:50 Tower7 kernel: BTRFS warning (device sdk1): checksum error at logical 102987767808 on dev /dev/sdk1, sector 192759376, root 407, inode 262, offset 21441945600, length 4096, links 1 (path: Win8.1/vdisk1.img) Mar 13 10:24:50 Tower7 kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 Mar 13 10:24:50 Tower7 kernel: BTRFS error (device sdk1): unable to fixup (regular) error at logical 102987767808 on dev /dev/sdk1 Problem fixed by restoring the vdisk from a previous backup, I'll keep the SSD for now and keep an eye on it, but hope this reminds users on the importance of backups and especially having checksums, although in case of vdisks having them on a btrfs device if the only practical way of doing it.
  6. Sorry, just one last comment about this, since I believe it's important: The current problem, besides unRAID no recognizing an UD partitioned disk, is that since unRAID will re-write the partition to start on sector 64, after the user tries to mount it in unRAID and it doesn't work it will then also not mount on UD or manually (unless the starting sector is manually changed to the initial one), leaving the user with an unmountable disk.
  7. Everybody should be using 4k aligned by now, that always equals starting sector 64, but if someone is using unaligned, sector 63, it's ok, unRAID will accept both. Not for the partition, just a supported starting sector, 63 or 64. UD is using sector 2048, but AFAIK as long as the disk is 4k aligned, ie, starting sector is divisible by 8 performance will be optimal. unRAID will accept it if partition 1 starts either in sector 63 or sector 64. If it starts anywhere else, or if it's length does not comprise the whole disk, then it re-writes it so that partition 1 starts in sector 64 (if using 4k aligned) and extends to the end of the disk.
  8. Yes I know, but most users don't know that it's not going to work, it should be an easy change but if you don't want to do it consider at least putting a warning on the first post.
  9. Post the output of both: btrfs fi show /mnt/disk1 btrfs fi df /mnt/disk1
  10. Request: When UD partitions a disk use 64 as starting sector, same as unRAID, to avoid a situation like this.
  11. JorgeB

    Turbo write

    IIRC Tom mentioned the auto setting is for future enhancement implementation, it defaults to auto so that when that enhancement is made all users take advantage without needing to change the setting.
  12. If you want to improve parity check speed with current setup try this: 6 disks onboard (use your fastest disks only, 4 and 8tb) 6 disks on SASLP #1 using PCIE1 5 disks on SASLP #2 using PCIE4 Divide slower 2TB disks by the 2 SASLP evenly. With the right tunables this should give a starting speed of around 100MB/s, eventually decreasing a little during the first 2TB but speeding up considerably once past that mark, total parity check time should be well under 24 hours.
  13. IMO this is more important, ie, even with multiple cores hashing multiple files on the same disk concurrently will always be slower than one at a time.
  14. I have an unmountable BTRFS filesystem disk or pool, what can I do to recover my data? Unlike most other file systems, btrfs fsck (check --repair) should only be used as a last resort. While it's much better in the latest kernels/btrfs-tools, it can still make things worse. So before doing that, these are the steps you should try in this order: Note: if using encryption you need to adjust the path, e.g., instead of /dev/sdX1 it should be /dev/mapper/sdX1 1) Mount filesystem read only (safe to use) Create a temporary mount point, e.g.: mkdir /temp Now attempt to mount the filesystem read-only. v6.9.2 and older use: mount -o usebackuproot,ro /dev/sdX1 /temp v6.10-rc1 and newer use: mount -o rescue=all,ro /dev/sdX1 /temp For a single device: replace X with actual device, don't forget the 1 in the end, e.g., /dev/sdf1 For a pool: replace X with any of the devices from the pool to mount the whole pool (as long as there are no devices missing), don't forget the 1 in the end, e.g., /dev/sdf1, if the normal read only recovery mount doesn't work, e.g., because there's a damaged or missing device you should use instead the option below. v6.9.2 and older use: mount -o degraded,usebackuproot,ro /dev/sdX1 /temp v6.10-rc1 and newer use: mount -o degraded,rescue=all,ro /dev/sdX1 /temp Replace X with any of the remaining pool devices to mount the whole pool, don't forget the 1 in the end, e.g., /dev/sdf1, if all devices are present and it doesn't mount with the first device you tried use the other(s), filesystem on one of them may be more damaged then the other(s). Note that if there are more devices missing than the profile permits for redundancy it may still mount but there will be some data missing, e.g., mounting a 4 device raid1 pool with 2 devices missing will result in missing data. With v6.9.2 and older, these additional options might also help in certain cases (with or without usebackuproot and degraded), with v6.10-rc1 and newer rescue=all already uses all theses options and more. mount -o ro,notreelog,nologreplay /dev/sdX1 /temp If it mounts copy all the data from /x to another destination, like an array disk, you can use Midnight Command (mc on the console/SSH) or your favorite tool, after all data is copied format the device or pool and restore data. 2) BTRFS restore (safe to use) If mounting read-only fails try btrfs restore, it will try to copy all data to another disk, you need to create the destination folder before, e.g., create a folder named restore on disk2 and then: btrfs restore -v /dev/sdX1 /mnt/disk2/restore For a single device: replace X with actual device, don't forget the 1 in the end, e.g., /dev/sdf1 For a pool: replace X with any of the devices from the pool to recover the whole pool, don't forget the 1 in the end, e.g., /dev/sdf1, if it doesn't work with the first device you tried use the other(s). If restoring from an unmountbale array device use mdX, where X is the disk number, e.g. to restore disk3: btrfs restore -v /dev/md3 /mnt/disk2/restore If the restore aborts due an error you can try adding -i to the command to skip errors, e.g.: btrfs restore -vi /dev/sdX1 /mnt/disk2/restore If it works check that restored data is OK, then format the original btrfs device or pool and restore data. 3) BTRFS check --repair (dangerous to use) If all else fails ask for help on the btrfs mailing list or #btrfs on libera.chat, if you don't want to do that and as a last resort you can try check --repair: If it's an array disk first start the array in maintenance mode and use mdX, where X is the disk number, e.g., for disk5: btrfs check --repair /dev/md5 For a cache device (or pool) stop the array and use sdX: btrfs check --repair /dev/sdX1 Replace X with actual device (use cache1 for a pool), don't forget the 1 in the end, e.g., /dev/sdf1
  15. If the SASLPs still are in the slots pictured your main bottleneck is slot PCIE3, then the DMI, if still available use the top PCIe slot (PCIE1): Expansion / Connectivity Slots - 1 x PCI Express 3.0 x16 slot (PCIE1: x16 mode) - 2 x PCI Express 2.0 x16 slots (PCIE3: x1 mode; PCIE4: x4 mode) - 1 x PCI Express 2.0 x1 slot - Supports AMD Quad CrossFireX™ and CrossFireX™ *PCIe Gen3 is supported on 3rd Generation of Intel® Core™ i5 and Core™ i7 CPUs.
  16. You can, don't forget you need to create the destination folder before doing the restore.
  17. For the future follow the FAQ instructions to remove a cache device, much safer. For now your best bet is probably to try an mount it read only, copy all data and format: mkdir /x mount -o recovery,ro /dev/sdX1 /x Replace X with actual device
  18. You should avoid preclearing solid state devices, I you just want to wipe it use blkdiscard instead: blkdiscard /dev/nvme0n1
  19. It's not required but it's highly recommended, no trim without it.
  20. I use the user scripts plugin.
  21. Devices are tracked by serial number, not controller port, you wont need to do anything.
  22. SAS2008 based controllers don't support trim on most SSDs, you should connect the SSD on the onboard controller, swap with another disk if needed.
×
×
  • Create New...