Tom3

Members
  • Posts

    104
  • Joined

  • Last visited

Posts posted by Tom3

  1. Speculation...

     

    If sufficient errors were detected, the link might auto re-negotiate the link speed.

    That might fail for 1 GbE, then retry and succeed for 100 MbE.

     

    Can you disable auto negotiation, and set the link to only 1 GbE at the router?  At the server?

     

    -- Tom

     

  2. Hi Mike - early on, the equipment on each end used Link Aggregation to package 4 10GE links as a single 40 GE link.

    This is where some issues cropped up.

     

    The ratified version of the 802.3 ba standard (June 2010) for 40GE & 100GE inserted something called

    Multi Lane Distributor (MLD 64/66b) to handle all this in hardware chips on the Ethernet ASIC

    interfaces ahead of the physical interfaces. (The physical interfaces generally don't know about this).

    Your Mellanox interface is one end, the ethernet switch that the cable is plugged into is the other end.

     

    So the question is what is the age of the equipment on each end of the link?  If both ends are listed as

    being 802.3 ba compliant, then you should be OK.  Practically most equipment took several years after the

    standard rataification to adopt everything.

     

    -- Tom

     

  3. 40GE can be a bit of a rat's nest.  The physical standards are primarily 4-lane interfaces (4 copper lanes, or 4 optical lanes).

    That usually makes the transceivers a bit expensive.

     

    Many of the initial deployments of 40GE used it as 4 independent 10GE links with a special break out cable arrangement.

    This was done as a way to increase the density of 10GE interfaces on a piece of equipment, not as 40GE links.

     

    For 40GE use those usually had link aggregation (x4) to treat the bundle as a single 40GE point-to-point.  If your application

    is as a single-40GE-link (which is what it sounds like), you need to be concerned with compatibility of the two

    ends, both physically and with link aggregation compatibility.  Expect to need a fair amount of configuration effort.

     

    There were a number of link aggregation issues back in the 2010-2015 time frame where the two ends just would not

    talk between vendors.  Whether older equipment was eventually harmonized I don't know (and is probably vendor and

    model number dependent).

     

    -- Tom

     

     

    • Thanks 1
  4. Found the problem.

     

    It appears the VNC VM GUI display does not play well with Firefox 112.0.1 on the Ubuntu client. The previous days version

    of Firefox worked fine, so likely the latest update (I think this is the most current version).  I cleared browser cache

    on the Firefox client - that did not improve anything.

     

    I launched Chrome 112.0.5615.121  on the Ubuntu client, and that works well with the VMs.

     

    -- Tom

     

     

    • Like 1
  5. This morning, all VM's (old, new, recently updated, not recently updated) suddenly are extremely slow.  On 6.11.5.

    The UNRAID GUI seems fine and snappy, it's just the VM's.  Opening a console window on a VM that worked fine yesterday brings up

    a sluggish console window - very very slow to resize.  Running 'top' in that window VM shows nothing consuming large CPU resources or

    memory for that VM, adequate free space and CPU. Five or Six Ubuntu 20.04 and 22.04 VMs tried, all slow as molasses.

    Running Firefox in the VM is almost impossible, 10-12 seconds to respond to each mouse click, but it does function correctly eventually. All services run by VM are almost ground to a halt.

     

    Only update this morning was I think unassigned devices plugin 2023.04.17

     

    Rebooted the server, no improvement.  Server memory, disk stats look fine. Network working correctly.

     

    Docker images come up fine and the GUI is very responsive, so this seems isolated to just the VMs.

     

    Diagnostics attached.

     

    -- Tom

    tower-diagnostics-20230418-1000.zip

  6. Dynamix ssd trim plugin - is it still needed?

     

    On Unraid 6.11.5

    I am getting a plug-in file install error, referencing:     /boot/config/plugins-error/dynamix.ssd.trim.plg

     

    The button to remove it does not work.

     

    The file is in fact in the /boot/config/plugin-error/ directory.   Is this plugin still needed, or has it been superseded by upgrades to UNRAID over time?

     

    If it's no longer needed, can it just be manually deleted from the containing directory?

     

    -- Tom

     

     

     

  7. Having difficulty updating to Dynamix System Statistics 2023.02.05a

    When clicking the UPDATE button nothing happens.

     

    Under the tab Plugin File Install Errors one item is listed:

    /boot/config/plugins-error/dynamix.ssd.trim.plg

    It shows a status of ERROR.   When clicking the Remove button for that, nothing happens.

     

    Have since rebooted the server, the above does not change.

    There are no obvious errors in the log.

     

    The graphical stats page continues to work

     

    On Unraid 6.11.5.

     

     

    -- Tom

     

  8. Are you accessing it via an IP address (e.g. 192.168.x.x) or via a domain name / suffix

    (e.g. mytower.myhomenetwork).  IP addresses can sometimes be re-assigned by your

    router (or ISP modem/router) on reboot which perhaps might change the IP.

     

    If you are using a domain name, do you have a DNS that knows how to redirect it internally?

    Does that need to have it's IP updated?

     

    If your server is using static IP than that shouldn't be an issue. 

     

    Are you using My Servers service?  (if so, that's a bit outside my experience).

     

    -- Tom

     

  9. I use Acronis for backing up various machines to Unraid.  No share is mounted on Windows.

     

    In UNRAID, a new share is setup as:  Public,  Export=Yes, Use Cache pool = No     My backup speed is not limited by the UNRAID hard drive but by Acronis, so for me there is no value in using the cache for this.

     

    In Acronis, create a new backup, and select Backup Destination.  That will be Network --> name of your Unraid server (perhaps TOWER) --> Name of the public shared folder above. Acronis should (sometimes quite slowly) populate the path as you select each folder down to that new share.

     

    After the backup completes, you probably want to limit access to that newly created backup file.

    In the UINRAID command prompt:

    $ CD to the new backup file's containing directory

    $ chmod 444 the_backup_file_you_just_created

    $ chown root:root the_backup_file_you_just_created

     

    That will make the backup file read-only, and only the root login in UNRAID an change that. Acronis creates a new file for each

    differential backup.  I prefer NOT to use incremental backup.  The difference is that differential writes the backup between now

    and when the original full backup was done.  To restore you only need the full and the last differential file to be good.  Each

    differential file is larger as a result.

     

    Incremental writes since the last incremental backup.   That means to restore you have to go back to the full, then replay all

    incremental backups.  If one of those is damaged your restore stops at that point.  But the incremental file is smaller as a result.

     

    -- Tom

     

     

     

    • Like 1
    • Thanks 1
  10. There are certain ASCII characters in the filename that don't work with Rsync on Linux,

    rsync fails to copy the file as a result.

     

    My recollection is the vertical pipe character | is one of them, colon is another, perhaps some related to

    Linux file redirection such as greater-than, less-than might be others. Some of these are OK in Windows.

     

    -- Tom

     

     

     

     

  11. Hi Orlando - interesting.    ethtool reports 10000baseKR (which is the single lane

    backplane electrical interface version of 10GbE).  The coding on that is the same as

    10GBaseSR, so it's likely just an unusual way to report the interface.

     

    You don't report which driver the 10G interface is using. (ethtool -i devname)

    Not all NIC cards are supported in Linux, you need to check to see if the driver and card

    you are using is supported by UNRAID.

     

    With optical interfaces there are some compatibility issues that need to be taken account

    of:

     

    1. Both ends need to be compatible (same wavelength, same fiber type, correct fiber type).

    If you are using 10GBaseSR (short reach) multimode fiber (MMF) then the fiber needs to be the

    right type.  10GBaseSR is not too stressful on the fiber characteristics, I think up to about
    500m is normally OK.  The two optical modules need to be compatible.

     

    2. That interface can support FEC (forward error correction).  FEC needs to be set the

    same (On / Off) at both ends. Depending on the software, it may need to be manually

    configured.  Your card may or may not support FEC - ethtool is showing 'Not reported'.

    Perhaps you may need to set if to Off at both ends.

     

    3. Fiber connector cleanliness is critical. Dirty connectors are far and away the most

    common problem with fiber and can cause all kinds of strange symptoms. Both the fiber

    end-face and the barrels need to be clean and unscratched..

     

    4. ethtool message on eth2 reports that the interface is down, but ip show reports it up.

     

    -- Tom

     

    • Thanks 1
  12. One place to start is using 'ethtool' from the UNRAID cli.  It will

    tell you about negotiated speed, duplex, other parameters.  This

    may tell you if something is misconfigured.

     

    # ip link show

    will list all the interfaces on your system (devname is usually eth0  eth1, eth2,  etc.)

     

    # ethtool devname

    will list the parameters for that specific interface

     

    #ethtool -i devname

    will tell you what driver it's using 

     

    # ethtool -h

    will give you brief help.  Suggest reading up on ethtool online.

    https://linuxhint.com/ethtool_commands_examples/

     

    -- Tom

     

     

    • Thanks 1
  13. It probably depends on what you want to do with your system.

    A critical element being whether you want to host dockers or VMs.

    In my system I have 4 x HD and 1 x NVME SSD.   Initially all the docker

    containers and VMs were on SSD.

     

    However one of the VM's (an Ubuntu server instance) was totally clobbering the

    SSD with about 2 GB/s write rate.  After one week it had consumed about 10% of the

    SSD rated lifetime.  There was a lengthy thread about Docker containers having write

    amplification problems, and eventually a fix seemed to come about.   But my server

    VM did not appear to benefit.   So I moved all the always-on VMs over to Hard Drive.

    The write rate there is  *****WAY***** lower than to SSD for some reason.

     

    -- Tom

     

     

  14. I'm not an expert in reading diagnoistics, but it appears that your system has the br0 and br1 NIC interfaces

    bonded together into a Link Aggregation Group (LAG).  Does the thing they connect to (Ethernet switch?)

    support LAG and has that switch been configured for LAG?

     

    You might want to try looking at your settings, and if bonded, eliminating the LAG bonding, then connect

    just one of the interfaces to the Ethernet switch. The <your-servers-ip> /Dashboard/Settings/NetworkSettings page

    is where you can re-configure the network settings.

     

    -- Tom

     

     

  15. The retr column shows the number of retransmitted TCP packets. The first row

    in your transfer shows 56 retries, which is so high as to appear essentially non-functional.

    So the question: what is the cause ?

     

    Some things to check:

    1. Ethernet cables.

    2. Any intervening Ethernet switch.

    3. Do you have two things assigning DHCP addresses on the same LAN segment?

    4. Bad NIC card or connector.

     

    After that it gets more difficult to troubleshoot:

    5. Wrong driver for the Ethernet NIC.?

    6. Out of memory condition preventing TCP from acquiring buffer space?

     

    -- Tom

     

     

     

  16. It's difficult to see what market this is targeted at.  Large companies probably would use

    some sort of tape mechanism that can robotically insert, remove and change tapes. Rolling

    backups come to mind.

     

    For hobbyist and home use the price is too high compared to just buying some 10+ Tb disk drives.

     

    The 100 year lifetime also means that the drive and writing system would need to be around in 100 years.

    So does that really means that the probability of failure to read is lowered in the 1-10 year timeframe ?

     

    Perhaps the commercial photography and video market might be the target.  Hence BH Photo selling it.

     

    -- Tom