fcaico

Members
  • Posts

    34
  • Joined

  • Last visited

Everything posted by fcaico

  1. That issue doesnt seem like the same thing I experienced. In my case the array was not doing anything (and I have my mover scheduled for the early morning hours) and another PC on the same network was able to stream a movie from the array without issue. It was only the VM that had issue.
  2. So I've been running Unraid 6.4 for quite some time without issue. I use unraid for a variety of things - file storage, dockers, and virtual machines - one of which, a windows 10 vm, has a video card pinned to it and I use it as an HTPC for my home theater. Today I decided to upgrade Unraid to 6.7 as I was doing some routine maintenance. The upgrade went smoothly and I assumed all was well until later on when I tried using my HTPC to watch a movie served from the array. The movie would occasionally freeze for several seconds and then continue, but this was quite annoying. Trying to watch this movie on another computer in the house was fine - it was just from the windows VM that I had issue. Unable to figure out what was wrong, I eventually tried downgrading my unraid OS back to 6.4 and sure enough now things are running well again. My unraid box has 32 Gb of RAM and Ive got 12 Gb allocated to the VM along with 4 threads (2 cores) the domains are cache only. Any thoughts? Id like to be current with my unraid installation... Thanks!
  3. I did the upgrade and now see this warning in the web UI after restarting the array: unRAID Cache disk message: 04-02-2017 12:39 Warning [uNRAID] - Cache pool BTRFS too many profiles MKNSSDE3240GB_ME16021910019DE84 (sdd) What does this mean? How serious is it? and why am I only seeing this now (I restarted my array without incident 2 days ago). Frank
  4. Since I know myself and others have had MCE issues in the past (with memtest usually not finding an issue), I was curious if LT might consider adding mcelog from http://mcelog.org/index.html to the unRAID betas? I may be mistaken, but from what I've read it seems to be the only way to acertain what exactly an MCE log event was actually caused by (even if ultimately benign). That's a great idea, and I agree. If it's not too large, I hope LimeTech will consider adding mcelog, and run it in the recommended daemon mode. I'm not sure it's the best way, but you might also use the --logfile option for persistence, force the logging to /boot (don't know how chatty this is though). Without this, we really don't have any tools for solving users' MCE issues. Plus, this actually can in some cases sideline faulty memory and processes, and possibly other live fixes, allowing continued operation and better troubleshooting. I could add mcelog to the NerdPack 6.2 repo. Not familiar with NerdPack... Is that a plugin?
  5. I'm not VM experienced, so others might be better, but here are a few comments - * Several MCE's (Machine check events) were noted, no apparent cause. You may want to try a Memtest, etc. As hardware events, it's hard to relate then to any specific software symptom, but they may be a source of trouble. Yeah, I've seen those. Not sure what is causing them. I ran memtest for 36 hours and it came up with nothing, so i dont think its ram. Not quite sure how to proceed on that front. What causes CPU stalls? Its odd, because CPU 6 IS pinned to my Dev VM, but that has been an extremely light duty VM. CPU 8 and 9 arent pinned to anything however! After the reboot its been 3 days 13 hours of uptime with normal usage patterns so we'll have to see if this happens again....
  6. My system became unstable yesterday after a long run (almost 30 days) of being stable with 6.2.0-beta21. Web services and VMs became unresponsive. Not sure if the Shares did as well, but I assume the did - though in retrospect I regret not checking. I could SSH in (and did so), run Powerdown -r. diagnostics were saved bu the system COULD NOT shut itself down as devices remained busy. Here is my diagnostics file: unraid-diagnostics-20160613-2116.zip
  7. I have a question about file share configuration and how dockers use them. If a file share is set up to use the cache, do dockers running on the unraid server honor this? e.g. If a NZBGet docker is downloading a large file to a file share which is marked to use the cache, are all of the rights going through the cache?
  8. SO its been over a week now and everything has been running great. Here's what I did to solve the problem 1) Adjust all my shares so that any large copying/moving between folders happens on the same share. About the only files I have that are large enough to cause the SMB lockups when moving between shares are movie files. I've since put my downloads and Movies folders on the same share. 2) switched from SABNZB to NZBGet. I miss the ratings capabilities of SAB but otherwise NZBGet is great. After doing those two things my SMB shares no longer get 'locked up' or held busy causing me to lose the UI and other functionality (including most importantly large chunks of my SMB file system).
  9. Sorry, I couldn't upload the diagnostics earlier. I was off network. I've now attached it. As far as SABNZB goes, Like I said - I've also been able to cause the SMB lock merely by copying a large file between 2 shares...
  10. OK So I've been going at this for three weeks now and I'm at the end of my ability to figure out any more things I can try. I use my unRAID server as a media server, an application server for SABNZB, Couchpotato, Sickbeard and Mariadb and lastly as a vm host for a windows 10 vm (with GPU and USB passthrough). I started with the latest beta of 6.2 as it seemed relatively stable and had the features I wanted. First the good: The OS has never crashed on me. Not once. The VM Seems to work just great. my GPU is passed through nicely as well as the USB ports (and this is on a system with no onboard video and a nVidia GPU!) File serving is just fine (as long as all I'm doing is reading) but writing across shares or moving large data around a share is problematic. I first noticed the problem that if I let SABNZB download large files it would hang up the disks keeping them locked. I would lose all SMB access and the web would hang up as soon as I did any operation that accessed the shares. If I turn off couch potato or don't allow SAB to do large downloads (TV shows were still fine apparently) then everything is stable. I can go for a week without trouble. I then found out that if I copied a large file (8-10 GB) between 2 shares I would lock up the SMB in just the same way! I tried the following experiment: Copy large file from another machine to a share on the uNRAID array : No Problem Copy large file from a share on one disk to another share on a different disk : Locks up the SMB Copy large file from a share on one disk to another share on a same disk : Locks up the SMB Copy large file from a share on one disk to the same share on the same disk : No Problem Copy large file from a share on one disk to the same share on another disk : No Problem This was fascinating. Perhaps a defect in unRAID beta regarding moving between shares? So I tried putting my movies, tv shows and downloads all within one share (Media). I still get the SMB lockup. :-( I've tried adjusting docker settings, I've ran the common problems plugin etc. can't figure out any solution to this. These SMB lockups are such that I can't kill the processes holding the files open and the only solution is to "powerdown" and then after hold down the power button on my server until it shuts down - then restart. I have yet to have this make me lose data at least. Before I did anything I ran a memtest for over 24 hours without any errors. There doesn't seem to be anything in the logs indicating a problem even happens (I do see 2 mce [hardware error]s in the log at boot up, but I haven't found anything to explain them: checked all the cabling and no issues in the smart logs - temps all seem good). Question: Should I downgrade to 6.1.9? Is this likely even to help? I can't figure out if there is an easier way to downgrade other than to reformat my USB flash drive and load 6.1.9 on it and reconfigure everything. Is that the only/best way? I really want to figure all this out. Ive spent a ton of time on it and I appear to be out of my depth. Frank
  11. So I'm still having this issue. If Im just reading files off the file system (or light writes) everything seems fine for days (had a previous up time of 7 days). However it seems that large writes cause the problem (If I queue up some big downloads in SABNZB this happens). Again, one or more disks are held busy by a process (usually SMBD or SHFS) and I can't kill them. Yesterday I restarted my server after such an occurrence and (From a windows machine) moved a large directory (about 15GB) from one user share to another. Halfway through the process it hung. SMB stopped responding and when I attempted to SSH into the machine I was able to, but I was unable to stop the move. I ended up having to restart. Could this be the problem? Moving large files around? SABNZB does this when it completes a download... so does Couchpotato... Anyone? I'm pretty desperate to resolve this.
  12. Hmmm Thats pretty interesting. I have a mix of drives - and 3 of them are WD 6TB Red drives. I've noticed that sometimes its not the Red drives which are held 'busy' by some process, but I suppose since my Parity drive IS a Red drive this *might* be my issue... UPDATE: I just checked an all my drives Spin Down Delay is set to "Use Default" which is set to Never So that does not appear to be it.
  13. This also seems like the problem I reported on my thread: https://lime-technology.com/forum/index.php?topic=48581.0 Everything is fine with the system running, and even running SABNZB, mariadb and sickbeard dockers and a win 10 vm (for nearly a week straight) as soon as I also turn on couchpotato and it downloads more than a trivial amount, I see the same symptoms as the OP here.
  14. You should upload your diagnostics next time it happens.
  15. Yeah your hardware is far below the spec of my unraid server (which may or may not be the problem in your case. not sure.).
  16. Well I'm having a very similar problem and I assure you hardware age is not the problem (in my case). I have a i7 5820K and 32 GB of ram (12 Gb is assigned to a VM). And this is from a clean install of V6 - not an upgrade. All my drives were defaulted to xfs (except the cache which is btrfs).
  17. No - your motherboard uses the Z77 chipset. What CPU are you running?
  18. So I just checked and saw that in fact I was using the top RJ45 jack - which is the Atheros one -- so I don't think this is my issue. Also, It seemed pretty unlikely at any rate since The system was still responding just fine over ssh. Its just that some of the disk devices were being kept busy by something and the web interface stopped responding (which also made SMB stop responding). Still havent resolved this issue and I'm wondering if this is a V6 issue as I dont seem to be the only one. I'm not sure exactly what I would lose by downgrading to v5 but I'm considering it as I'd like to resolve these issues.
  19. I'm using 6.2 beta 20.... Do you think downgrading to 6.1 release would fix this? or even upgrading to 6.2 beta 21?
  20. I've recently put together this unRAID system which is intended to be a media and application server along with providing a virtual HTPC to my media room. It all runs swimmingly for a little while but after 1-3 days I will typically find my SMB shares, the unRAID web (or one of the docker webs - or a combination) in a non-responsive state. There are some files being held onto (usually by SMBD or SHFS) and while I can unmount some of the drives, there will typically be some which are held busy by one of those rogue processes and I cannot kill the offending process. This prevents me from shutting down nicely -even using the powerdown plugin, but so far I haven't lost any data (or even been forced to do a parity check on startup). Powerdown -r says its halting the system, but after that nothing happens. I can still login to the system using SSH. My only recourse is to hold down the power button to restart. The only thing I can see in the log is a couple of mce [Hardware Error]s at boot time. But the problem occurs much later usually and doesn't crash the system at all. I've done a memtest on the system for over 24 hours and Im running everything at normal clock rates (not overclocked). I have 1 Windows 10 VM (with a video card and a USB bus passed through) I have 3 dockers running: Mariadb, SABNZBd and SickBeard I have 2 plugins loaded: Community Applications, Powerdown The system is pretty beefy too: cpu: i7 5820K, ram:32 GB mobo: ASRock X99 Extreme 6 video: GTX 970 cache: 2x 240GB SSDs parity 1 6TB data: 2x 6TB, 2x 4TB, 1x2TB unraid-diagnostics-20160422-0749.zip
  21. Do you have front-panel usb connectors? Are all of them live (connected to the motherboard)? Its possible (and even likely) that you have USB headers on your motherboard which arent connected to anything.
  22. OK. I'm a new unraid user and ive encountered an issue that im having trouble cleanly resolving. [ Note: I couldnt add diagnostics because I cant get the logs of the machine to another ] Everything seemed fine on the array. Woke up this morning and went to check the WebUI of my SAZNZBD docker and it was non-responsive. Attempted to restart it and pretty much the same thing. I then tried to access the user shares from my windows 10 explorer and they were now timing out. At this point the unraid console web was also unresponsive! I ssh into the unraid box just fine and do a docker ps. All my dockers are running so 1 by 1 I stop them. I cannot stop one of them (CouchPotato). Not sure if I should kill the docker process or not in order to release this. im trying to cleanly stop the array but it seems unresponsive and I dont know quite how to go about things. I attempt to stop samba from /usr/local/sbin/samba stop but the samba process continues. umount /dev/md1 tells me the (/mnt/disk1) target is busy lsof /mnt/disk1 returns COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME shfs 11912 root 91r DIR 9,1 20480 104 /mnt/disk1/Movies and theres a buch of files on /dev/md5 also being kept open by shfs! Is it safe to just kill shfs? I'm not sure this will solve the problem either. What steps should I go about. Ive tried to research this but Im not sure what the safest thing to do is.
  23. This method worked beautifully for me! Thanks!
  24. Just wanted to chime in here to say that I have also used hupster's method and successfully passed my NVIDIA GTX970 through to a Windows 10 vm while it was the only video card in the system (and in slot 1). The hardware in question is an X99 board (ASRock Extreme 6) and an i7 5820K - so no onboard video. I had to fart around a bit with this because for some reason, even with the 970 in slot 2 and another (AMD) video card in slot 1, I could not get an OVMF vm to work at all (even though my card supports UEFI) with either or both cards installed. Once I switched over to SeaBios things started working. But then I still had issues when I moved the 970 to slot 1 (which I wanted because I wanted the x16 performance). I'll recap my steps here so that its all in one place as I had to put things together from the posts above, the WiKi and one or two other spots. 1) I started by placing the AMD video card in PCIe x16 slot 1 and the NVIDIA GTX 970 in the second PCIe x16 slot. 2) stopped the VMs 3) ssh into the unraid machine 3) Type "lspci -v" and note the pci id for the NVIDIA card. In my case the NVIDIA had an ID of 01:00.0. The output looked like this: 01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1) (prog-if 00 [VGA controller]) Subsystem: ZOTAC International (MCO) Ltd. GM204 [GeForce GTX 970] Flags: bus master, fast devsel, latency 0, IRQ 42, NUMA node 0 Memory at fa000000 (32-bit, non-prefetchable) [size=16M] Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f0000000 (64-bit, prefetchable) [size=32M] I/O ports at e000 [size=128] Expansion ROM at fb000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Legacy Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [250] Latency Tolerance Reporting Capabilities: [258] L1 PM Substates Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900] #19 Note that in my case the " Kernel driver in use: vfio-pci " line was not present so I didnt have to unbind it like hupster did. If you see that line for you card, you'll need to follow his instructions (I'll put them here as well): 3a) 4) This is the key step. The card wont currently work in slot 1 because its vbios is shadowed during bootup. So we need to capture its bios when its working "normally" then when we move the card to slot 1 we can start the vm using the dumped vbios. so do the following to dump the card's vbios (again, note that instead of 01:00.0 use whatever PCI Id was designated for your card): cd /sys/bus/pci/devices/0000:01:00.0/ echo 1 > rom cat rom > /mnt/user/Public/drivers/vbios.dump echo 0 > rom 4a) Bind the card back to vfio-pci if required (Note that since i did not do step 3a above, I do not need to do this). echo "0000:02:00.0" > /sys/bus/pci/drivers/vfio-pci/bind 5) At this time I removed the AMD card and put the NVIDIA card in slot 1 and restarted the system. 6) Now that we have the proper vbios, we can tell the vm to use it when it starts up. hupster's xml was not correct for me. I assume that the newer versions of unraid (I'm using v6 beta 20) use the libvirt API to configure the VMs which is a little different. So I will show you what I did here. If your vm's xml has the same format as hupsters (you can look for the qemu:arg element in your xml then see his post on page 2. I found the hostdev line in the vm's xml which pertained to my video card. It looked like this: <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev> Note that the address lists function as 0. Watch for this, there was another hostdev block for the same card but its function was 1 - I assume this was for the HDMI sound. I modified this xml to add the bios reference from step 4: <hostdev mode='subsystem' type='pci' managed='yes' xvga='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <rom file='/mnt/user/Public/drivers/vbios.dump'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev> 7) Thats it! Save the xml and restart the vm. Worked like a charm. Special thanks goes out to hupster -- i never would have gotten this far without his excellent steps above. Frank