Jump to content

hawihoney

Members
  • Posts

    3,513
  • Joined

  • Last visited

  • Days Won

    7

Everything posted by hawihoney

  1. cow? Hmm, system share on cache is set to Auto. What does that mean? Nevermind, found it in help text. Auto is COW for BTRFS. Thanks a lot.
  2. Oops, never seen that. I have no clue what both do. Ok, did run a corrective Scrub. Does that mean everythings ok now?
  3. I'm running the stats regulary. That's why I saw the errors. But Unraid didn't notice the errors til now. If I call stats that's the result: [/dev/nvme1n1p1].write_io_errs 0 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 0 [/dev/nvme1n1p1].generation_errs 0 [/dev/nvme0n1p1].write_io_errs 0 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 0 [/dev/nvme0n1p1].generation_errs 0 Looking at syslog at the same time shows: Nov 2 11:00:04 Tower kernel: BTRFS warning (device nvme0n1p1): csum failed root 5 ino 15050349 off 1114939392 csum 0x382b6324 expected csum 0x54474642 mirror 2 Nov 2 11:00:04 Tower kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 15050349 off 1114939392 (dev /dev/nvme1n1p1 sector 534070312) Nov 2 11:12:38 Tower kernel: BTRFS error (device nvme0n1p1): parent transid verify failed on 1481548267520 wanted 16496481 found 16461691 Nov 2 11:12:38 Tower kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 1481548267520 (dev /dev/nvme1n1p1 sector 337334848) Nov 2 11:12:38 Tower kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 1481548271616 (dev /dev/nvme1n1p1 sector 337334856) Nov 2 11:12:38 Tower kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 1481548275712 (dev /dev/nvme1n1p1 sector 337334864) Nov 2 11:12:38 Tower kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 1481548279808 (dev /dev/nvme1n1p1 sector 337334872) The link in your answer mentioned scrub. Is scrub another name for balance? Many thanks in advance.
  4. Thanks. After 70 hours I had to hard-reset the server because I was not able to force stop the script nor the DD executable. I gave up on that project (tried to remove and zero two empty disks from the array to build a second cache pool in the future).
  5. Update: I could get Docker/VM services to start. I had to delete the docker.img file. This one was corrupt. All Dockers were reconstructed and are running currently. BUT: BTRFS still shows errors on my cache pool. What do I need to fix these? Many thanks in advance.
  6. Yesterday, out of sudden, I did receive errors on my cache pool (2x NVMe M.2 disks). The Unraid main page didn't report these errors even when disk2 of that pool went offline. Today I did restart the server, Unraid comes up, but can't start the docker service. In syslog I see lots of BTRFS errors but Unraid still does not show any problems. It seems that the cache pool does not work any longer but Unraid is working as if nothing had happened. What are the steps to get the cache pool - and Dockers and VMs - back into operation? Rebalance? Diagnostics attached. Many thanks in advance. tower-diagnostics-20201102-0757.zip
  7. The script issues the same command exactly. You run in maintenance mode, I'm in production. Hmm? Now i'm confused. Something wrong with my array? Is there a way to find out at what position dd is working?
  8. I want to throw out a disk from the array and used that documentation: https://wiki.unraid.net/Shrink_array I'm using the save method (parity always valid) and the User Script mentioned to zero a 3 TB disk. Turbo write was activated prior start. This process is running since 51 hours now and there's no end in sight. The array usually writes at 50 MB/s with dual parity. Zeroing "runs" at 10 MB/s. What's wrong with that process? Why isn't it writing/zeroing at similar speed? Why is writing zeros to a disk over 5 times slower than writing random bytes of file content? Any insights are highly appreciated. Many thanks in advance.
  9. What's the product name of your HBA? LSI 3008 is the name of the chip - not the product.
  10. Go to that site: https://www.broadcom.com/support/download-search Product Group: Storage Adapters, Controllers, and ICs Product Family: SAS/SATA/NVMe Host Bus Adapters --> Now select your HBA under Product Name (e.g. SAS 9300-8i or SAS 9300-8e, etc)
  11. What are the steps I need to do before moving from 6.83 to 6.90? I'm talking about passthru of adapters (2x HBA for two VMs, 2x USB for two VMs, 1x GPU for docker). Currently on 6.83 I have possibly redundant settings as follow: 1.) /boot/syslinux/syslinux.cfg: xen-pciback.hide (hide both HBAs, this one redundant?) [...] label unRAID OS menu default kernel /bzimage append xen-pciback.hide=(06:00.0)(81:00.0) initrd=/bzroot [...] 2.) /boot/config/vfio-pci.cfg: BIND (hide both HBAs, or this one redundant?) BIND=06:00.0 81:00.0 3.) First VM: 1x HBA, 1x USB [...] <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x0930'/> <product id='0x6544'/> <address bus='2' device='4'/> </source> <alias name='hostdev1'/> <address type='usb' bus='0' port='1'/> </hostdev> [...] 4.) Second VM [...] <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x81' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x8564'/> <product id='0x1000'/> <address bus='2' device='10'/> </source> <alias name='hostdev1'/> <address type='usb' bus='0' port='1'/> </hostdev> [...] What do I need to remove/add before booting with the new 6.90 system? Thanks in advance.
  12. Some questions allowed? 1.) The Plex folder on my cache contains "trillions" of directories and files. Those files change rapidly. New files are added at high frequency. Does that mean that the result are trillions of hardlinks? Stupid question, I know. But I never worked with rsync that way. 2.) What about backup to remote locations? I do feed backups to Unraid servers at two different remote locations (see below). Will this work and create hardlinks at the remote location? rsync -avPX --delete-during --protect-args -e ssh "/mnt/diskx/something/" "user@###.###.###.###:/mnt/diskx/Backup/something/" Thanks in advance.
  13. Did add it to User Scripts (Array Start). Will re-think upon 6.9 stable. Thanks, man.
  14. Can you please elaborate a little. A small Google search shows this as a BTRFS thing. Reading the 6.9 announcements shows that this is a Samsung, etc./Unraid incompatibility when using different sector alignments (that's what I understand).
  15. Don't know. I just read the 6.9 beta notes. In one of the past beta releases a complete handling has been introduced to move data off these disks, reformat them and move the data back. Beta releases are no option here, so I'm looking for a way to do something similar on stable 6.8.3. ***EDIT*** Your values look as if you are using the cache disk as a cache disk 500 TB read, 233 TB written look reasonable. I use the cache pool as docker/VM store only. My values are 473 GB read, 22 TB written. This is definitely not reasonable.
  16. Can please somebody point me to a documentation how to avoid massive writes on NVMe M.2 Cache SSDs running on Unraid 6.8.3? After reading the 6.9 Beta notes I checked my cache disks and they are heavily involved. Two 1 TB disks in a BTRFS cache pool show 21 TB of writes in 4 months while holding around 250 GB. Models in use are "Samsung SSD 970 EVO Plus 1TB". - Available spare 100% - Available spare threshold 10% - Percentage used 0% - Data units read 924,128 [473 GB] - Data units written 42,803,014 [21.9 TB] - Host read commands 12,644,896 - Host write commands 505,220,723 - Controller busy time 324 - Power cycles 2 - Power on hours 190 (7d, 22h) Needless to say that I want to avoid looong outages. Many thanks in advance.
  17. It was the long boot that made me get nervous. I could SSH into the starting server but the server killed the session after a minute or so. That was the point I took the screenshot with the putty error. I was no longer able to SSH into the server. I started IPMI (never needed it for a looooong time) just to find out that there's no JRE on my new laptop. Argh, my fault. So I gave up and rushed downstairs, pulled the USB stick and when I came up again, Unraid was started - without the /boot folder. So I rushed down again, pulled the stick and copied the old Unraid files back to the USB stick and booted. That came up fast and without a problem. In the meantime we chatted here. This morning I've set all Dockers and VMs to not autostart. Starting the full blown server even needs way more time. So I gave your files a second go. I pulled the USB stick from the server again, copied the new files to the stick und pushed it in the server again. This time I gave it moooore time before doing something. And voila, after what seems a very long time, the server came up. I double checked the GPU UUID in your kernel helper GUI, and double checked the device IDs for the 5 passed thru devices (2x HBAs, 1x GPU and 2x USB license sticks). Everything was identical. So I manually started all mounts, VMs and dockers. Everything looks good now. No idea what's hanging during the boots. But for me it's fine. I don't start the server that often. With that server grade backplanes and HBAs you don't need to shutdown the server for e.g. disk replacement in the JBODs. Only disk replacement on the bare metal needs to stop the array. It's running mostly 365/24.
  18. I'm running your build now. As there is a parity sync running right now I don't want to stress the server. NVENC and NVDEC seem to work - tested with my smartphone and a forced reduced bitrate. No errors/warning in syslog from last boot.
  19. Did CHKDSK, stick is fine. This morning I did boot two times with your precompiled files and one time with my old environment (Unraid NVIDIA) with the same USB stick. It was a huge difference here. If you say it can't be, then it must be on my side. Can live with that.
  20. Switched from Unraid NVIDIA to your precompiled NVIDIA kernel 6.8.3 this morning - and gained a near heart attack: Booting the server took three times as long as before (LSIO Unraid NVIDIA). During the long boot process I took some screenshots from error messages that passed by during the boot process (see below). In a rush I took the Unraid USB stick out, copied the 8 old Unraid files over and put the USB stick back in. In the meantime Unraid came up - with an empty boot folder. It was when I realized that extremly long boot process. So, I pulled the USB stick again, copied the new 8 files from you over, and did restart again. This time I gave a long time to boot before checking. Seems that the system came up now and is doing a parity check. Early in the morning, running the steps to the basements that many times, in my age. Puh. Please add a note about the long boot process. It may help people like me to cool down.
  21. Last question before I switch. In the Nvidia plugin description I found: Is the docker system modified too if using your prepared Nvidia build?
  22. I do have the GPU UUID in my Plex docker already. I took this UUID from the LSIO Nvidia plugin that I'm using currently. I asked my question because you mentioned in this thread somewhere, that the GPU UUID might change, when switching over to your approach. Just curious. Edit: What about Device IDs like "IOMMU group 36:[1000:0097]"? Can they change too?
  23. What's the correct way to switch over from NVIDIA plugin to this one? Need support for Unraid 6.8.3 with latest NVIDIA drivers. - Do I need to uninstall the NVIDIA plugin first? - If yes, how to get the GPU UUID? - Are there additional differences between 6.8.3 with NVIDIA plugin and this one here? Any hints are highly appreciated.
×
×
  • Create New...