Mizerka

Members
  • Posts

    75
  • Joined

  • Last visited

Posts posted by Mizerka

  1. So I got nvidia device to pass over into container, but still can't find a way to run nvidia-smi, figured out that I can access it from /rootfs/usr/bin and it lists it using ls but fails to run with error file not found.

     

    /rootfs/usr/bin # ls -la /dev | grep nvidia
    crw-rw-rw-    1 root     root      195, 254 May  8 13:05 nvidia-modeset
    crw-rw-rw-    1 root     root      243,   0 May  8 13:05 nvidia-uvm
    crw-rw-rw-    1 root     root      243,   1 May  8 13:05 nvidia-uvm-tools
    crw-rw-rw-    1 root     root      195,   0 May  8 13:05 nvidia0
    crw-rw-rw-    1 root     root      195, 255 May  8 13:05 nvidiactl
    /rootfs/usr/bin # 

     

     

  2. finally figured it out, god it's so stupid.

     

    at boot, after fix (40%), 30% test, 55% test, 100%.

    image.thumb.png.6776d5876d268784d07d128c773c343a.png

     

    at 85% because I don't like fan1 going out of spec, might move them to fana and b for direct control.

    image.thumb.png.6819fbccaaafc60c8ad8da40a0270a17.png

     

     

    anyway, the fix. create a dedicated user for unraid tool to use, same for grafana. I've been using just the default admin account for everything, until I opened new tab by accident and saw that message "your session timed out, log in again". also nothing in bios for fan/thermal/power control so probably stuck with fans1-6 and fan a-b

     

     

  3. Okay, enough edits, so I'm still playing with this, I went back into ipmi webfront and just switched fan mode from full speed to standard, and it dropped fans to their comfortable idles of around 700rpm, okay... played with min and max more but no matter what I changed it wouldn't do anything (maybe the issue?)

     

    so I grabbed prime95 docker and threw 60% load on it, to see fans ramping up, slowly, very slowly, it's been like 5mins and they're still climbing, despite logs stating they've been set at 95-97% instantly.

     

    2020-05-07 23:52:07 Fan:Temp, FAN1234(27%):CPU1 Temp(37°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:52:28 Fan:Temp, FAN1234(35%):CPU1 Temp(42°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:52:38 Fan:Temp, FAN1234(27%):CPU1 Temp(37°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:52:48 Fan:Temp, FAN1234(29%):CPU1 Temp(38°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:53:50 Fan:Temp, FAN1234(27%):CPU1 Temp(37°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:54:21 Fan:Temp, FAN1234(29%):CPU1 Temp(38°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:54:41 Fan:Temp, FAN1234(27%):CPU1 Temp(37°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:55:01 Fan:Temp, FAN1234(29%):CPU1 Temp(38°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:55:11 Fan:Temp, FAN1234(27%):CPU1 Temp(37°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:55:31 fan control config file updated, reloading settings
    2020-05-07 23:55:32 Fan:Temp, FAN1234( 6%):CPU1 Temp(37°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:55:42 Fan:Temp, FAN1234( 5%):CPU1 Temp(36°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:55:52 Fan:Temp, FAN1234( 6%):CPU1 Temp(37°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:57:24 fan control config file updated, reloading settings
    2020-05-07 23:57:25 Fan:Temp, FAN1234(28%):CPU1 Temp(38°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:57:55 Fan:Temp, FAN1234(27%):CPU1 Temp(37°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:58:46 Fan:Temp, FAN1234(28%):CPU1 Temp(38°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:59:27 Fan:Temp, FAN1234(27%):CPU1 Temp(37°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:59:37 Fan:Temp, FAN1234(35%):CPU1 Temp(48°C), FANA(25%):System Temp(35°C)
    2020-05-07 23:59:47 fan control config file updated, reloading settings
    2020-05-07 23:59:48 Fan:Temp, FAN1234(91%):CPU1 Temp(51°C), FANA(27%):System Temp(36°C)
    2020-05-07 23:59:58 Fan:Temp, FAN1234(92%):CPU1 Temp(53°C), FANA(30%):System Temp(38°C)
    2020-05-08 00:00:08 Fan:Temp, FAN1234(93%):CPU1 Temp(55°C), FANA(34%):System Temp(40°C)
    2020-05-08 00:00:18 Fan:Temp, FAN1234(93%):CPU1 Temp(56°C), FANA(39%):System Temp(43°C)
    2020-05-08 00:00:28 Fan:Temp, FAN1234(94%):CPU1 Temp(58°C), FANA(42%):System Temp(45°C)
    2020-05-08 00:00:39 Fan:Temp, FAN1234(95%):CPU1 Temp(60°C), FANA(44%):System Temp(46°C)
    2020-05-08 00:00:49 Fan:Temp, FAN1234(96%):CPU1 Temp(61°C), FANA(45%):System Temp(47°C)
    2020-05-08 00:00:59 Fan:Temp, FAN1234(96%):CPU1 Temp(62°C), FANA(47%):System Temp(48°C)
    2020-05-08 00:01:09 Fan:Temp, FAN1234(97%):CPU1 Temp(63°C), FANA(47%):System Temp(48°C)

     

    image.thumb.png.6fcca97144466befb9fc51eca481c77b.png

     

    So I'm really confused now because it's clearly working, but it doesn't seem to be done by unraid. also still not sure why the noctua NF-B9 redux 1600 PWM are reaching 2k rpm at 100%, they should only be allowed 1600 looking at spec sheets, only af-a9 would reach 2k with 1.6k pwm rating. strange...

     

    Anyway, time to check bios I guess, maybe it's forcing pwm from there somewhere. for reference using X9DRi-F 

     

  4. 38 minutes ago, dmacias said:
    44 minutes ago, Mizerka said:
    Okay, that makes sense, I'll play with values a bit, right now they're bouncing between 1400 and 2000k for 1600rpm rated isn't ideal, there could be some polling issue here.
     
    I'll have a look at the config editor as well, how would that interact with values already set manually would it overwrite it or only apply during fan control functions?

    It allows you to edit the values already set in the bmc. Instead of commands, it's a print out of the config. You edit them and then the whole config is uploaded back to the bmc.

    cool, just noticed load on boot slider as well.

     

    still no luck at controlling the fans though, I must be doing something wrong, here's what I've set currently;

    image.thumb.png.fae6ef3395c63f362d7354e35da4a7a9.png

     

    Which I'm assuming will set fans 1 2 3 and 4 based on cpu1 sensor (30c ish atm) report/alert below 20c and above 65c, pwm (or I guess ipmi hex control for supermicro?) will force it at 20% and let it rise until 30.1%

     

    i.e. keep it between 20 and 30% of it's own rated pwm (I forget my pwm wave but pretty sure it should know it's 0.3 and 100% and not need raw rpm values).

     

    but they still just ramp up to 2k rpm and don't drop.

     

    edit; grafana in place for this now, you can see rise when fan control was enabled and never went down after few fluxuations;

     

    image.thumb.png.df88d88ec9337df9c97bd4a59a066539.png

     

    or am I looking at this wrong and I should be using upper criticals etc to restrict fan speeds?

     

    edit2;

     

    logs btw;

     

    2020-05-07 22:46:15 Fan:Temp, FAN1234(66%):CPU1 Temp(37°C), FANA(66%):Peripheral Temp(39°C)
    2020-05-07 22:47:16 Fan:Temp, FAN1234(68%):CPU1 Temp(40°C), FANA(66%):Peripheral Temp(39°C)
    2020-05-07 22:47:46 Fan:Temp, FAN1234(66%):CPU1 Temp(38°C), FANA(66%):Peripheral Temp(39°C)
    2020-05-07 22:49:16 Fan:Temp, FAN1234(65%):CPU1 Temp(35°C), FANA(66%):Peripheral Temp(39°C)
    2020-05-07 22:53:18 Fan:Temp, FAN1234(66%):CPU1 Temp(37°C), FANA(66%):Peripheral Temp(39°C)
    2020-05-07 22:53:48 Fan:Temp, FAN1234(65%):CPU1 Temp(35°C), FANA(66%):Peripheral Temp(39°C)
    2020-05-07 22:55:18 Fan:Temp, FAN1234(65%):CPU1 Temp(36°C), FANA(66%):Peripheral Temp(39°C)
    2020-05-07 22:55:49 Fan:Temp, FAN1234(65%):CPU1 Temp(35°C), FANA(66%):Peripheral Temp(39°C)
    2020-05-07 22:56:19 fan control config file updated, reloading settings
    2020-05-07 22:56:19 Fan:Temp, FAN1234(41%):CPU1 Temp(34°C), FANA(45%):Peripheral Temp(39°C)
    2020-05-07 22:57:49 Fan:Temp, FAN1234(44%):CPU1 Temp(36°C), FANA(45%):Peripheral Temp(39°C)
    2020-05-07 22:58:20 Fan:Temp, FAN1234(43%):CPU1 Temp(35°C), FANA(45%):Peripheral Temp(39°C)
    2020-05-07 22:58:50 Fan:Temp, FAN1234(44%):CPU1 Temp(36°C), FANA(45%):Peripheral Temp(39°C)
    2020-05-07 22:59:10 fan control config file updated, reloading settings
    2020-05-07 22:59:10 Fan:Temp, FAN1234(20%):CPU1 Temp(35°C), FANA(20%):Peripheral Temp(39°C)
    2020-05-07 23:02:11 fan control config file updated, reloading settings
    2020-05-07 23:02:11 Fan:Temp, FAN1234(23%):CPU1 Temp(34°C), FANA(23%):Peripheral Temp(39°C)
    2020-05-07 23:23:08 fan control config file updated, reloading settings
    2020-05-07 23:23:08 Fan:Temp, FAN1234(30%):CPU1 Temp(34°C), FANA(23%):Peripheral Temp(38°C)
    2020-05-07 23:24:39 fan control config file updated, reloading settings
    2020-05-07 23:25:32 Stopping Fan Control
    2020-05-07 23:25:32 Setting fans to auto
    2020-05-07 23:27:27 Starting Fan Control
    2020-05-07 23:27:27 Setting fans to full speed
    2020-05-07 23:27:49 Fan:Temp, FAN1234(30%):(0°C), FANA(20%):(0°C)

     

     

    so looks like it's setting it correctly... at least tool believes so, but despite that, fans are still sitting at 1900-2000 (the fan5/6 are 1800rpm rated and are on 1700-1800)

     

    fun fact, you can set minimum above maximum, also logs with correct values

     

    2020-05-07 23:33:05 Setting fans to auto
    2020-05-07 23:34:26 Starting Fan Control
    2020-05-07 23:34:26 Setting fans to full speed
    2020-05-07 23:34:36 Fan:Temp, FAN1234(20%):CPU1 Temp(35°C), FANA(25%):System Temp(35°C)

     

    no changes to actual rpm

     

     

    edit3;

     

    'Configure' button tests your fans and determines their location for fan control. All fans will initially be set to full speed. Each location will be tested at one third speed to determine which fan is present. This will take about 1 min. On completion fans will return to auto. You can start fan control

     

    sounds useful, but doesn't exist? anymore?

  5. 15 minutes ago, dmacias said:

    The thresholds can be edited here. See image. The Supermicro and ASRock fan control operate very differently. With ASRock you can control each fan header individually. With Supermicro it is usually FANA controlled separately and then all other fans together. There's only two commands. Hence FANA settings and FAN1234 settings. a15a37a20a2529f7315d6f5510bd3f29.jpg&key=dd61236875aeebc619aafcedeb949a49d2da0d774d9ed6ee1946f4290202c827

     

    Edit. Also for Supermicro the ipmi fan mode it set to full speed setting to avoid bmc intervention

    Okay, that makes sense, I'll play with values a bit, right now they're bouncing between 1400 and 2000k for 1600rpm rated isn't ideal, there could be some polling issue here.

     

    I'll have a look at the config editor as well, how would that interact with values already set manually would it overwrite it or only apply during fan control functions?

  6. 9 minutes ago, Hoopster said:

    OK, I misunderstood that. 

     

    Must be how your supermicro board reports fan headers to the tool.

     

    I know every motherboard is different, but with my ASRock board, I needed to go into the BIOS and select the H/W Monitoring section which forced it to read and populate all fan headers, sensors, etc. that could be monitored.  Once I had done this, all the individual fan headers and sensors appeared in the IPMI Tool drop downs.

    Thanks I'll have a look but can't remember seeing an option like that there.

     

    also playing around with fan control more, it just sets bmc to full speed instead of controlling it, hmmm, to be fixed another day

  7. 20 minutes ago, ogi said:

    you can definitely modify the thresholds from within the app; you need to go to config settings and select sensors from the drop down.  Careful removing a fan, you cannot get it back (unless you factory reset the BMC controller). On my x9 board, I can only set the thresholds on increments of 75.  For example I can set a threshold at 750, or lower it to 675, but I cannot set it to 700 even.

    maybe I missed it then, it only allows for fan control, but not modifying treshhold that ipmi actually uses, in supermicro case if a fan falls below critical error it ramps it to 100% (like mentioned in op), so I had to get ipmitool and manually change it through console rather than addin

    11 minutes ago, Hoopster said:

    Huh? You can do that now, unless your motherboard somehow does not allow that in the tool.  I am controlling the speed of four fans (others are not PWM fans). Here's a partial view of my ASRock board fan settings in the IPMI plugin.

     

    image.thumb.png.900dc5f61afb820c5ab91ff4bab6999f.png

    I'm talking about fan thresholds that ipmi will use to report fan speed errors etc, pretty sure fan control panel only for speed modulation based on sensor reading. Also in my case I only see this;

     

    image.thumb.png.f1e6cc334042e85ffc403d54cb0a89be.pngimageproxy.php?img=&key=e5eec7c5c933ca16imageproxy.php?img=&key=e5eec7c5c933ca16imageproxy.php?img=&key=e5eec7c5c933ca16

     

    unlike the nice fan by fan, I get fan1234 which are 4 pwm headers, which is weird but whatever I can deal with that.

     

    And yes, this works fine, i.e. if I pop above on, I can see my fan1 and 2 cpu fans ramp up to 1700rpm from 900 it idles at normally, but if I were to do something like 0.3 to 100, they'd drop below 400 rpm triggering alerts every 10seconds.

     

    here's my tresh values, so unless  I force pwm within values fan either won't like or bmc reports according to it's logic (good or not), it's not ideal, for what is just a simple tresh command to bmc

     

    image.thumb.png.1d39d6d20a0b36835250d3356d83a38f.png

  8. Hey,

     

    Sorry not amazing with linux yet, so I'm building additional grafana dashboards, I'm using typical  telegraf into influxdb with grafana display. I got everything else sorted but wanted to add ipmi stats and nvidia smi. I found thread on ipmitool so added /bin/sh -c 'apk update && apk add ipmitool && telegraf' to post arguments which installs ipmitool within containers /usr/bin as expected but can't get nvidia-smi to work properly.

     

    So I'm thinking it might just be easier to give container access to sytems path directly, but not sure how to accomplish that either :S

  9. 4 hours ago, Mizerka said:

    Thanks I'll give that a try, I've now removed any other addons or packages that might interfere with ipmi including the nerdpack pkg that I've used ipmitool with. Did try to recreate the connection few times, previously on bad pass it'd just throw a conn refused in logs but this time I got nothing and the fact that ipmitool worked on it's work (without configuring it) would've made me believe it created the connection fine. failing that, I'll give it a good ol turn it on and back off again.

    hmmm so ended up resetting bmc and changing ip addressing and looks like it picked it up afterwards, ipmisensors reported connection timeout, so probably was networking/arp issue. anyway, up and running again. 

     

    btw, being able to modify fan thresh from tool would be nice :) my x9 board really didn't like my noctua's getting down to 300rpm and assumed my case fans are fine up to 18k rpm. just supermicro things. Also am I right in thinking fan control only control fans 1-4 and fan A? or does it just specify naming and actually does all numbered and then all lettered?

  10. 1 hour ago, dmacias said:


     


    A better check would be to use the included freeipmi from the console since that's what the plugin uses. Very similar commands to ipmitool. ipmi-sensors is the freeipmi command. I also include with the plugin non hyphenated versions that take settings from the plugin (e.g. username, password, ip address) and plug the info into the console command. So if you have network mode set up in the plugin, the command "ipmisensors" will use those settings.

    Also you might check that some browser auto fill didn't mess up your password.

    Thanks I'll give that a try, I've now removed any other addons or packages that might interfere with ipmi including the nerdpack pkg that I've used ipmitool with. Did try to recreate the connection few times, previously on bad pass it'd just throw a conn refused in logs but this time I got nothing and the fact that ipmitool worked on it's work (without configuring it) would've made me believe it created the connection fine. failing that, I'll give it a good ol turn it on and back off again.

  11. Hey, 

     

    thanks for your work on this, it looks like I managed to break something after a reboot and it no longer sees network ipmi, it doesn't report any issues and using ipmitool from nerdpack in console reports the sensors correctly. I can only see the hdd's and hdd temp reported from unraid.

     

    Gave it another reboot but still didn't do anything.

  12. Just now, Squid said:

    You must've picked that up off of today's avatar.  While Polish, today I treat simply as an excuse to be able to throw a water balloon at my wife and (hopefully) get away with it :) 

    haha, yeah it was that, sorry for assuming, and yes, I remember Easter Mondays of keeping close eye on my Father as he might just walk into living room with a bucket of cold water and toss it over you. Stay safe, and thanks again.

  13. Hey,

     

    So it's been an issue for a long time but I've dealt with it using jumpboxes etc, but lately I've been accessing it over vpn a lot more and don't fancy leaving pc on all day just to act as a jump box for unraid access.

     

    So, right now everytime I try and access my unraid web portal from any device on network, I am forced and redirected to unraid.local fqdn  (http, https is disabled atm afaik), not a major issue, pihole deals with .local as expected and queries local tables for it and resolves.

     

    Issue then becomes that over vpn tun (ovpn) can access any of  other resources just fine but can't get web portal to work because it can't handle .local resolutions, I could probably fix with hosts but would prefer to get unraid into a previous state where it accepted IP address as an acceptable portal address.

     

    Thanks

     

     

  14. 4 hours ago, Mizerka said:

    Hey, thanks for the work on container;

     

    lately I seem to be really struggling with down speeds, can't seem to get anything more than 3MB/s down and 1MB/s up, must've been happening for around a week or two (I auto update containers so can't tell exactly).  Previously I'd easily saturate the wan link (130mbps and 40mpbs up). There weren't any changes or anything that'd affect this? I did upgrade to 3.8 around the same time as well if that changes anything. Using nordvpn UK p2p tcp vpn (same server and ovpn file, but tried others as well).

     

    Thanks

     

     

    Hmm, scrap that, so I played around with it more. I Ruled out local and networking, all of which looked as expected. The issue is isolated to the vpn tunnel, I say that because I've also tried another brand new container, same results, brand new qbit container, same results.

     

    6OzZgWV.png

     

    is what traffic looks like, with spikes being when I briefly turned vpn off for testing, where you can clearly see a spike to expected 13-15mib/s

     

    So, playing around with ovpn files, looks like it's not liking tcp, after changing nordvpn connection profile to udp, it instantly kicked back into proper speeds saturating entire wan link at 13mib down. Both can be replicated on delugevpn and qbittorrentvpn containers, with default config.

     

    Change itself must've been over a week ago or was introduced in unraid 6.8 as I haven't noticed it prior.

  15. Hey, thanks for the work on container;

     

    lately I seem to be really struggling with down speeds, can't seem to get anything more than 3MB/s down and 1MB/s up, must've been happening for around a week or two (I auto update containers so can't tell exactly).  Previously I'd easily saturate the wan link (130mbps and 40mpbs up). There weren't any changes or anything that'd affect this? I did upgrade to 3.8 around the same time as well if that changes anything. Using nordvpn UK p2p tcp vpn (same server and ovpn file, but tried others as well).

     

    Thanks

  16. On 12/23/2019 at 11:22 AM, S1dney said:

    Also you might want to run "btrfs balance start -mconvert=raid1 /mnt/cache" against your pool cause your setup isn't that redundant at the moment 🙂

     

                 Data      Metadata System              
    Id Path      RAID1     single(!)   single(!)   Unallocated
    -- --------- --------- -------- -------- -----------
     1 /dev/sdt1 250.00GiB  2.01GiB  4.00MiB   213.75GiB
     2 /dev/sdu1 250.00GiB  2.00GiB        -   213.76GiB
    -- --------- --------- -------- -------- -----------
       Total     250.00GiB  4.01GiB  4.00MiB   427.51GiB
       Used      219.61GiB  1.96GiB 64.00KiB            

     

    If one of your drives fails now, your in bad luck.

    See:

     

    Thanks for flagging this, wasn't aware of it.

  17. oh, you're right, i missed that;

     

    nobody   13716  8.7 92.7 92190640 91863428 ?   Sl   06:34  66:18  |   |               \_ /usr/bin/python -u /app/bazarr/bazarr/main.py --no-update --config /config

     

    okay, killing it for now then, I guess it's some memory leak, never seen it use that much/

     

    Thanks

    • Like 1
  18. Guess who's back, back again

    Array's dead, dead again.

     

    I've isolated one of the cores after forced reboot, so now at least webgui is usable (I guess isolation to everything but unraid os? okay), despite every other core sitting at 100%. Dockers are mostly dead due to lack of cpu time, but sometimes respond back with a webpage or output. shares are working almost normally as well.

     

    nothing useful in logs again.

     

    After removing plugins one by one, array returned to normal after killing ipmi or temparature sensor plugins. so that's interesting that it'd brick unraid out of nowhere... oh well, we'll see tomorrow.

     

     

  19. 4 minutes ago, kocka said:

    it`s for testing and it shod still work. if anybody has any other ideas pls post 

    sure, well I give up then, good luck.

     

    only other thing in terms of config is you have disk shares force enabled, you're better of using user shares or leaving it on auto default. and mounting disk outside of array if that's what you need.

  20. Quote

    # Generated settings:
    IFNAME[0]="br0"
    BONDNAME[0]="bond0"
    BONDING_MIIMON[0]="100"
    BRNAME[0]="br0"
    BRSTP[0]="no"
    BRFD[0]="0"
    BONDING_MODE[0]="1"
    BONDNICS[0]="eth0"
    BRNICS[0]="bond0"
    PROTOCOL[0]="ipv4"
    USE_DHCP[0]="yes"
    DHCP_KEEPRESOLV="yes"
    DNS_SERVER1="192.168.88.1"
    DNS_SERVER2="193.231.252.1"
    DNS_SERVER3="213.154.124.1"
    USE_DHCP6[0]="yes"
    DHCP6_KEEPRESOLV="no"
    VLANS[0]="1"
    SYSNICS="1"

    those dns servers are a bit weird, first is likely your router, but other 2 are public and weird, I'd change to local router only probably, this is given out by your dhcp, i.e. router, again, strange. one of them points to some random location in Romania.

     

    and yeah ipv6 enabled so it picked up fe80::

     

    Quote

    Dec 18 09:46:04 FTP ntpd[1663]: Listen normally on 3 br0 [fe80::1c20:77ff:fe46:fd3c%10]:123

     

    should disable dhcp for something like unraid, it'll  just cause you issues one day.

     

    Comparing my config to yours, there's nothing wrong unraid side and it doesn't report issues either.

     

    Make sure you have filesharing and discovery completely enabled

     

     

    e;

    Quote

    yes i can whit my phone and windows 7 laptop

    It will be windows 100%, unraid will use at least smb2 by default, so that's fine