MrFrizzy

Members
  • Posts

    20
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

MrFrizzy's Achievements

Noob

Noob (1/14)

1

Reputation

2

Community Answers

  1. UnRAID 6.10.3 Debian 11 VM Ryzen 5600G RTX A2000 My Debian 11 VM has no video hardware acceleration and I cannot seem to figure out where I am going wrong. The GPU is detected, the drivers are installed, but the system keeps complaining about there not being video hardware acceleration and I have no output on my monitor connected to the GPU directly. I first followed the info in this post to set the primary GPU as VNC and the A2000 as the second GPU. I also followed the second method from this website to install the Nvidia drivers. I installed the drivers at first using the proprietary option without cuda, found that didn't fix my problem, uninstalled everything, then chose the proprietary option with cuda which didn't make a difference. The first method mentioned in that article doesn't work for me because nvidia-detect doesn't say there is a compatible driver. Also, if I install both the proprietary and the open source drivers, then the GPU no longer shows up in the NVIDIA X Server Settings. In the BIOS, I have the system set to use the igfx as the primary video device. Upon booting UnRAID, the console is output through the motherboard's HDMI port so that is working correctly. In Under Tools > System Devices, both the A2000 GPU and audio device are part of the same IOMMU group without anything else in the group, and the two are bound to vfio-pci on boot. Under the VM settings, I have the first GPU as VNC and the second GPU as the A2000 (1:00:0) with the sound card being the audio device (1:00:1) in the A2000. I've tried with and without a vBIOS ROM supplied. The ones I have tried are an unmodified one from Techpowerup and one that I modified to not have the header in it as Spaceinvader One mentions in this video: https://www.youtube.com/watch?v=1IP-h9IKof0. I have also tried with and without specifying in the XML under the destination source that the GPU is "multifunction" (multifunction='on') while marking the audio device as the same bus as the GPU ('0x05') with a function of '0x1'. None of the VM setting changes seem to change what happens in the VM. In the Debian 11 VM, running `lscpci` shows the GPU and audio are detected but as `05:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2531] (rev a1)` for the GPU. No matter what I've tried, I cannot get it to show up like it does in the "System Devices" of UnRAID: `01:00.0 VGA compatible controller: NVIDIA Corporation GA106 [RTX A2000] (rev a1)`. `nvidia-detect` shows the same info as above and says the "...card is not supported by any driver version up to 535.104.05." However, the driver is installed and the NVIDIA X Server Settings app shows the correct info (below). So, it seems that the GPU passthrough is working to some extent, but the fact that it doesn't show up as `GA106 [RTX A2000]` has me thinking there is something I am missing with UnRAID. Any insight is appreciated. I've been trying to get this working on and off for months now with no luck. Here is the XML config for the VM: Below are screenshots from the NVIDIA X Server Settings. I can provide diagnostics files via DM. I don't wish to post them publicly.
  2. Thanks for the examples, that makes a lot of sense! It seems that every distro uses ZFS a little differently and there is no one way to go about things.
  3. I agree. I don't expect the scrub to come back with any errors but it's always good to be certain. After 2 days of researching this issue, it seems that the disk partitions created by ZFS are deterministic. Do you have any insight on that? At least for drives that have 512-byte logical sector sizes (even if the physical sector size is 4096), the sources I can find within the last few years all seem to show that ZFS will align the first partition to sector 2048 and the last partition to exactly 8MiB before the last full 2048 sector chunk of the disk. I didn't really find any info on drives with 4096-byte logical sectors (4K Advanced Format) so I would imagine the behavior there is different. I don't have any 4096-byte logical sector drives to test with and even if I did, my SAS2008 HBA doesn't support 4K logical sectors, only 512 physical and emulated. By my math that means the partition start and end points can be calculated based purely on the total sector count of a given drive. That could save someone who didn't have a spare identical drive or a working identical drive still in the pool. It would also mean no need to create a sparse qemu-img somewhere of the same sector count as the drive in question just to create a new zpool on it in order to grab the partition info (which didn't work for me). For 512-byte logical sector drives, calculate the end of partition 9: 1. Total sector count of drive / 2048 2. Get rid of everything after the decimal point 3. Multiply by 2048 4. Subtract 1 For example: 1. 5860533168 / 2048 = 2861588.46094 2. 2861588 3. 2861588 * 2048 = 5860532224 4. 5860532224 - 1 = 5860532223 = part9 end sector Then get the partition 9 start sector: 1. Take the partition 9 end sector and subtract 16383 (in this example: 5860532223 - 16383 = 5860515840) Lastly, get the partition 1 end sector by subtracting 1 from the partition 9 start sector (in this example: 5860515840 - 1 = 5860515839) Thoughts?
  4. Update time! I added the spare 3TB drive to my system and created a new zpool on it using: zpool create pool /dev/sdl Then ran fdisk to get the partition info: fdisk -l /dev/sdl Disk /dev/sdl: 2.73 TiB, 3000592982016 bytes, 5860533168 sectors Disk model: Hitachi HDS5C303 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: 018D37A8-E21E-DC4B-AC14-7D3B1CB9CBFA Device Start End Sectors Size Type /dev/sdl1 2048 5860515839 5860513792 2.7T Solaris /usr & Apple ZFS /dev/sdl9 5860515840 5860532223 16384 8M Solaris reserved 1 Then ran sgdisk to copy that info over to the 3 other drives: sgdisk -n1:2048:5860515839 -t1:BF01 -n9:5860515840:5860532223 -t9:BF07 /dev/sdi sgdisk -n1:2048:5860515839 -t1:BF01 -n9:5860515840:5860532223 -t9:BF07 /dev/sdj sgdisk -n1:2048:5860515839 -t1:BF01 -n9:5860515840:5860532223 -t9:BF07 /dev/sdk EDIT: Forgot to include that I had to re-import the pool first before the scrub: zpool import -a Then started a scrub. The data seems to be intact so far, but the scrub is going to take more than 12 hours looking at the average disk speeds.
  5. Ah, the command line is the way I did it before and the only way I know how to. Should be pretty straightforward to create a new zpool on the spare and copy the partition info over. We'll see how it goes later today, time permitting. Any idea how this could have happened in the first place? It would be one thing to have the pool not mount upon boot, or have an issue with a drive or two, but having the partition info randomly deleted on all 3 drives is not something I can explain. That is unless all of that information was never written to the drives or otherwise saved anywhere besides memory? Would that be possible?
  6. They are connected directly to a SAS2008 HBA via a HD mini-SAS to (4) SATA cable along with the (4) drives in my main array on a second HD mini-SAS to (4) SATA cable. The main array shows no signs of anything wrong. My other drives are plugged into the motherboard directly: (2) 500GB SATA drives for cache, (1) 250GB SATA via Unassigned Devices, and a 1TB NVME drive as a second BTRFS pool device. No issues to report with those either. I don't recall exactly, but I do know that it was RAIDZ and what name I used for the pool. Once I have all 3 drives imaged, I'll shut everything down, add my spare 3TB drive, and start testing the procedure with creating and copying the partition tables over. The data on the ZFS pool isn't critical, I don't bother backing it up, but it would be nice to also not lose 5TB of data just because I restarted my machine.
  7. Unraid 6.10.3 zfs-2.1.5-1 zfs-kmod-2.1.5-1 If you need my diagnostics file, please let me know so I can send it to you via DM. I don't wish to post it publicly. A few months ago I added a ZFS pool to my system (before Unraid 6.12 was released). It included (3) 3TB drives in what I believe was raidz. Well, I had not rebooted my system after it was all working until a few days ago and now the ZFS pool is gone. zpool status zpool import zpool import -a zpool import -D -f [poolname] zfs list all come back saying there is no pool available to import or no datasets available. Using ls -l /dev/disk/by-id/ I get the below for the 3 disks in question: lrwxrwxrwx 1 root root 9 Jul 3 18:01 ata-Hitachi_HDS5C3030ALA630_MJ1311YNG39DAA -> ../../sdg lrwxrwxrwx 1 root root 9 Jul 3 18:01 ata-Hitachi_HDS5C3030ALA630_MJ1313YNG1PTSC -> ../../sdh lrwxrwxrwx 1 root root 9 Jul 3 18:01 ata-Hitachi_HDS5C3030ALA630_MJ1313YNG244VC -> ../../sdf fdisk -l Returns: Disk /dev/sdf: 2.73 TiB, 3000592982016 bytes, 5860533168 sectors Disk model: Hitachi HDS5C303 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/sdg: 2.73 TiB, 3000592982016 bytes, 5860533168 sectors Disk model: Hitachi HDS5C303 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/sdh: 2.73 TiB, 3000592982016 bytes, 5860533168 sectors Disk model: Hitachi HDS5C303 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes I also cannot seem to find any zpool.cache files anywhere in /etc or /usr. I presume I need to first recover the partitions on each of the 3 drives before I can even attempt to get them re-imported, but I am not sure how to do that. I've seen other posts talk about using sgdisk and specifying the partition start and end points, but I don't know what those values would be on my drives or where to find that info. Many other posts talk about using tools that I can't seem to find for Unraid or paid for tools that are simply out of the question here. Can anyone provide me some guidance on how to recover the partitions and how to prevent this from happening in the future? I'll likely upgrade the OS to 6.12 after I can get the pool working again (even if I have to blow the whole thing out and lose the data). EDIT: gdisk -l /dev/sdg Returns: GPT fdisk (gdisk) version 1.0.8 Partition table scan: MBR: not present BSD: not present APM: not present GPT: not present Creating new GPT entries in memory. Disk /dev/sdg: 5860533168 sectors, 2.7 TiB Model: Hitachi HDS5C303 Sector size (logical/physical): 512/512 bytes Disk identifier (GUID): 7E12C536-4BE0-497C-9432-B1E75D2C4AB5 Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 34, last usable sector is 5860533134 Partitions will be aligned on 2048-sector boundaries Total free space is 5860533101 sectors (2.7 TiB) Number Start (sector) End (sector) Size Code Name dd if=/dev/sdg bs=512 count=2048 2>/dev/null | hexdump -C | grep EFI.PART Returns nothing at all. Considering I do have a spare one of those 3TB Hitachi drives, I could create a new ZFS pool on it, check the partition table positions, and then use sgdisk to reconstruct the partition table on the 3 other drives, correct? I'll probably use clonezilla to image the drives before doing anything just in case.
  8. I should have included more details in the original post. I have no issues with saving the files, I have weeks worth saved on my array. It is just the 64MB temporary location that the container uses for the videos that randomly fills up and causes the stream to crash. I am using the "shinobi-cctv-pro" template which uses the shinobicctv/shinobi:latest repository. Everything is default other than the video storage location (/mnt/user/Secondary/Shinobi/Videos/ [on the main array]) and the "Streams" cache (/mnt/user/appdata/streamCache, which is supposed to be /dev/shm/Shinobi/streams in the container but clearly isn't). After much more research, I found out that /dev/shm is the "shared memory device", a temp location that is stored in RAM. By default, it is 1/2 of the total system RAM capacity, however, within docker containers, /dev/shm defaults to 64MiB. That is exactly what I was running into. I reverted the "Streams" path back to "dev/shm/Shinobi/streams" and started messing with different ways to get the shm size to increase. After a bunch of trial an error, I was finally able to increase it by adding "--shm-size=256m" to the "extra parameters" section of the container template. It took me a while to find that as it is only shown when the "advanced" option is selected. Once I did that and restarted the container, I had a 256MiB /dev/shm! No crashes as of yet!
  9. I am running Shinobi in a docker for some IP cameras. I am running into an issue where the camera streams will periodically "die" with an error in the log about not enough space. watch df -h /dev/shm Using the above, I have this single mount point within the docker container run out of space just before the logs show the error. It is only 64MB in size. What doesn't make sense to me is that in the docker container configuration, that location is supposed to be mapped to a folder inside of appdata, but no data ever ends up in that location despite what the watch command above says. Clearly something isn't adding up and I must be using the wrong keywords to search around because I can't find any info on what to do. Can anyone give me some insight as to why /dev/shm is only 64MB in size when the container config for that path is set to a directory with 100+ GB of available space? How can I increase the size of /dev/shm inside the docker container? I appreciate any help I can get!
  10. @JorgeB Thank you for all your help these last few days! Everything seems to be happy and stable now! You've undoubtedly saved me hours if not days of troubleshooting and headache, so again, thank you for the help!
  11. To be clear, the second btrfs pool does not push files to the array. The shares on the second pool will only ever use the (2) drives in that pool, not the cache and not the array. The scrub is still running on the second pool, but I have gotten some errors like the below which don't point to a file as far as I can tell. Any clue how I can address those? Feb 24 10:45:55 Tower kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 48, gen 0 Feb 24 10:45:55 Tower kernel: BTRFS error (device sdb1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 49, gen 0 Feb 24 10:45:55 Tower kernel: BTRFS error (device sdb1): unable to fixup (regular) error at logical 12169646080 on dev /dev/sdb1 Feb 24 10:45:55 Tower kernel: BTRFS error (device sdb1): unable to fixup (regular) error at logical 12169646080 on dev /dev/sde1 Feb 24 10:45:55 Tower kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 49, gen 0 Feb 24 10:45:55 Tower kernel: BTRFS error (device sdb1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 50, gen 0 Feb 24 10:45:55 Tower kernel: BTRFS error (device sdb1): unable to fixup (regular) error at logical 12169650176 on dev /dev/sdb1 Feb 24 10:45:55 Tower kernel: BTRFS error (device sdb1): unable to fixup (regular) error at logical 12169650176 on dev /dev/sde1 EDIT: From the researching I've done, it seems that whatever was there has been deleted or overwritten but there is some reference to the extent that contains that block. I ran the below command and got the shown result. I also shifted the logical address +/- 5,000 then +/- 100,000, but got the same return for all 4 of those as that seen below. btrfs inspect-internal logical-resolve -v -P 12169746080 /mnt/[secondpool] ioctl ret=0, total_size=65536, bytes_left=65520, bytes_missing=0, cnt=0, missed=0 At this point, do I just need to move forward with rebuilding the csum tree (--init-extent-tree --init-csum-tree) and see if I end up with any messed up files? Or should I format the pool and move everything back over from backups?
  12. 2 out of 4 matched sticks have errors. I have no idea how long those have been failing, but it has to have been quite some time. Even some files from backups are failing checksum. I've pulled out the bad DIMMs, added my one spare DIMM, and passed a full run of memtest. I've also pulled off all of the "bad" files from the second pool (since they are the same as what I have backed up) and am running another scrub to see if any more errors come up. The main cache pool passes scrub no problem, even before finding the bad DIMMs. The parity check is passing every time I run it. Does the main array not compare checksums like the btrfs pools?
  13. I'm having a lot of problems with this second pool. Each time I do a scrub, more uncorrectable errors are found. This 3rd scrub is the first one to complete and not terminate early. Since there are uncorrectable, the balance fails as well. I am copying off all of the data now and am getting read corrections in the logs for files that have not come up in the scrub or balance. Not sure what is going on, but I am reaching the ends of my wits here. What should have been a simple file recovery has turned into 3 days of headache. I plan to copy all of the data off, blow out that second pool, recreate it from scratch, and repopulate the drive from the copy and backups (where needed). I will also implement some sort of scheduled scrub to help maintain data integrity.
  14. @JorgeB The cache pool is working as expected now! Correct size and free space is shown! The second pool is also showing the correct capacity and free space after running a scrub and balance. It did have a few files fail checksum of which I have removed, but the second balance has erred out on another csum failed message for a file that does not exist (both find /mnt/[secondpool] -inum 946 and btrfs inspect-internal inode-resolve 946 /mnt/[secondpool] fail to find the file). I am trying the scrub again to see if that can fix anything but it will likely take an hour or two take over 3 hours to complete. [43939.203266] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 44, gen 0 [43939.203270] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 45, gen 0 [43939.252058] BTRFS warning (device sdb1): csum failed root -9 ino 946 off 199262208 csum 0xe16b1f3e expected csum 0x0b5860dc mirror 1 [43939.252064] BTRFS error (device sdb1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 57, gen 0 [43939.252201] BTRFS warning (device sdb1): csum failed root -9 ino 946 off 199262208 csum 0x90983c1a expected csum 0x0b5860dc mirror 2 [43939.252214] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 46, gen 0 [43941.686335] BTRFS info (device sdb1): balance: ended with status: -5 I do have a few questions for you. Does Unraid ever do a scrub and/or balance on btrfs drives? There really isn't a need to unless there is a RAID1 pool, right? Should I script out a scrub and/or rebalance once a month alongside the parity check on the array?
  15. I removed the 9 uncorrectable files, ran a successful scrub, and a successful rebalance. Running the same procedure on the second pool now.