David Bott

February 6, 2023

1 hour ago, JorgeB said:

Yes, that's it, did it work?

Well it did for me. Not 100% on your issue or hardware. I had the issue with Samsung 980 Pro 1TB NVME drives.

February 6, 2023

33 minutes ago, JorgeB said:

Try this.

Hi... I added...

append initrd=/bzroot pcie_aspm=off

...to the UnRAID OS config area. (Click on Flash Drive to get to the right area.) Add it in as a new line, save and reboot.

October 12, 2022

14 hours ago, hasown said:

You can just search for ITE IT87 in the Community Apps in your server dashboard.

Thanks...Ok, did that...Bummer, even after a restart and a rescan, no changes, Still can not see the fans. System temp readings seems to the be the same and the only items I can see.

October 12, 2022

Greetings.... iPERF3 would be nice. (Unless I totally missed it.)

October 11, 2022

2 hours ago, hasown said:

I had success with my GB X570 AORUS PRO WIFI by installing the ITE IT87 Driver plugin. It adds some modified IT87 drivers for newer chipsets, after which unraid detected my fans.

Might I ask how you installed those drivers so your temp and fans could be seen? Thanks

October 10, 2022

14 hours ago, Squid said:

That being said, I am not a fan of some the dev choices made

Sorry, has to be said...Don't you mean "I am not a "FAN" of some the dev....

LOL...At least in my head it was funny. Carry on.

October 9, 2022

22 hours ago, BRiT said:

What are you talking about? Most of the initial Dynamix plugins are now included in the base unRaid release.

22 hours ago, NLS said:

what are you talking about?

22 hours ago, spl147 said:

Except the most important ones like fan speed and sleep!!

RIGHT on all of the above!!! SOME have been moved to unRAID directly, but where? One of the BIG ONES, one the most people are looking for, are the FAN control which requires being able to read various temp sensors for one, which is missing or not well done or not kept up with the drivers, but then being able to control the fan outputs based on the temps. Heck, I can't even see my fan speeds any more. i can see the temp...But even though I can see the temps, I can't do anything with them. A BIG THING when it comes to running a NAS with many drives.

Sorry... I have been an unraid users and promoter from when it 1st started....David Bott - Founder of AVSFroum.com But now I am looking to many to have to move as I get concerns over hardware failures when I can not monitor critical items.

October 8, 2022

Why has no one realized that this development has just gone away? unRAID, as good as it is, is also only as good as the supporting plugins. In this case, Dynamix was one of the best. Now it seems to be hardly supported. Ugh!

October 1, 2022

Don't count on much when it comes to a number of Dynamix plugins as it seems development has just dropped off for most of the items. Such a great product at one time.

September 2, 2022

19 hours ago, Nicktdot said:

I'm very glad to know it works.

I was researching the error / mask 00000001/0000e000 in the message, and found out it had to do with the PCI end device not responding to an ASPM command.

So while turning off AER masks the problem by not logging the errors, it doesn't solve the actual PCI errors.

So then started going down the rabbit hole of what ASPM is all about, ( https://en.wikipedia.org/wiki/Active_State_Power_Management ) and saw there is a kernel boot flag to turn off the feature.. I dont think we need it anyways seeing as my server is running 24h/day and never goes to sleep mode.

I figured it might help avoid the error altogether if the unused feature is disabled. I'll check my own server next time it reboots!

Totally agree on AER setting ,as I mentioned that before to others who mentioned using it. I sure do not want to suppress all PCI errors just to hide this one that was an actual error, even though it was being corrected, it should not need to be. But hiding all the errors is not not a good idea.

I have been looking for weeks and never came across that setting as I had no clue it was a power issue. (Power as in a power saving feature...which like you, surely us not needed in this case. Especially for Cache which is always doing something with Dockers. One setting I did try was the sleep timer setting, set to 5500 or something. It seemed to help, but not fully.

I have had no other errors as of yet and my system also just did a fully parity check that runs at the 1st of each month. Though it took 1 day, 5 hours, 21 minute to complete. (I have an 18TB Parity Drive for future upgrades to storage. The four data drives are all 8TB currently....But man, it takes a long time to verify 8TB and then it still continues to scan the Parity drive which is another 10TB above that. Not sure why it does that...but oh well. May need to look to using another method or drive format.)

This is all do to my upgrading my Motherboard, processor, and adding two NVMe's for cache (RAID) then the error showed up.

I so want to thank you again for your research. So...THANKS!!!! Kudos!!!! Yippee!!! Etc.

September 1, 2022

11 hours ago, Nicktdot said:

Could you try pcie_aspm=off . This seems to disable power management mode which is throwing the error.. I've put it in my config for next time I reboot

Morning...And a good one it is! No errors in the log using pcie_aspm=off

Might I ask where you came up with that being likely the issue?

Thanks

September 1, 2022

41 minutes ago, Nicktdot said:

Could you try pcie_aspm=off . This seems to disable power management mode which is throwing the error.. I've put it in my config for next time I reboot

Hi...Thanks for the idea. I just added it to my Sys Config so it now looks like this...

kernel /bzimage
append initrd=/bzroot pcie_aspm=off

...saved and rebooted. I will check the log tomorrow morning and see if I have the errors still and report back.

August 31, 2022

15 hours ago, trurl said:

Have you checked for firmware update? I had to update firmware on mine before they would play well on my new desktop build.

It seems I am on the current FW release... 5B2QGXA7 ... So bummer on that.

The good news is that the error shows it is "Corrected", the bad news it has an error at that it needs to correct at all.

August 31, 2022

14 hours ago, Nicktdot said:

Let me know how it works out for you. I have 1 of 4 SK Hynix NVMEs on an Asus HYPER M.2 X16 GEN 4 CARD throwing this constantly.

Well that did not take long. Sorry to say even with the new Kernel (in the 6.11.0-RC4) still have the error. So I went back to 6.10.3

Aug 31 10:53:52 Server kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:03:00.0
Aug 31 10:53:52 Server kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 31 10:53:52 Server kernel: nvme 0000:03:00.0:   device [144d:a80a] error status/mask=00000001/0000e000

~~Looks like I may have to try a firmware update on the NVMe's. I did find this FWUPDMGR Linux thing that it seems I might be able to run to upgrade the drives.~~ Otherwise, it seems Samsung only support Windows with their software and that would be a pain as I would need to remove the drives and find a Windows machine.

UPDATE: It seems that fwupdmgr will not work as it is only for supported products... https://fwupd.org/lvfs/devices/

August 31, 2022

13 hours ago, Nicktdot said:

Let me know how it works out for you. I have 1 of 4 SK Hynix NVMEs on an Asus HYPER M.2 X16 GEN 4 CARD throwing this constantly.

I just now installed Version: 6.11.0-rc4 and will let it run. As you may have seen by my log times, it is not all the time, so I need to give a number of hours. I will report back.

13 hours ago, trurl said:

Have you checked for firmware update? I had to update firmware on mine before they would play well on my new desktop build.

Hi...Do you mean Firmware for the Samsung 980 drive itself? If that is what you mean, no, I have not. I will need to look into a current version and how yo even upgrade it. Removing them from my setup is easy for one and hard for the other. So I hope I could do it in the setup. (Hoping it is a bootable flash or something.) Thanks for the thought as never considered updating the firmware on the drive.

August 30, 2022

On 8/29/2022 at 4:17 AM, JorgeB said:

You can try installing v6.11.0-rc4, newer kernel might help, if not not much more you can do other than suppressing the error, unless you are wiling to use a different board (or devices).

Well I guess that might be worth a shot. The error is just bugging me and I can not tell if it is hardware or software related. And just using one setting that suppresses all the PCI errors just makes no sense for you want to know if there is an error afterall. I have, as mentioned, even replaced one of the Samsung drives as it seemed the follow the drive.

Thank you again.

David

August 30, 2022

Hi...So...Hummm...Where do we find this updated plugin? I ask for doing a search for Network Stats only shows munin-server.

I had thought it was part of the dynamix ecosystem as it showed up under STATS and is now gone once I removed the depreciated plugin. STATS is the Dynamix System Statistics page where network showed...Thus thought it was dynamix.

Thank you for your efforts.

August 28, 2022

On 8/11/2022 at 2:25 PM, JorgeB said:

This is usually hardware related, a BIOS update might help, as well as using other PCIe/M.2 slots if available, different kernel might also help.

Hi... (1st, Sorry for the delay...I have been trying other things like changing timing and NVMe drive.)

Thank you kindly for the reply.

I have the latest BIOS as mentioned. This is happening in the NVMe channel. My motherboard has two NVMe slots and one on one board. One channel goes though PCIe and the other goes though the CPU it seems. I have tried swapping the two NVMe's using used as a RAID 1 cache and the error "seemed" to follow the drive. So I replaced that drive just for fun and yet the problem still shows.

It surely could be a kernel issue, but not sure how to deal with that in unRAID as they supply that with there OS.

Here is the error repeated on the new NVMe (Note they are Samsun 980 Pro drives). (I had the GUI only show me errors in the log.)

Aug 19 23:01:47 Server kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:03:00.0
Aug 19 23:01:47 Server kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 19 23:01:47 Server kernel: nvme 0000:03:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
Aug 20 00:26:42 Server kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:03:00.0
Aug 20 00:26:42 Server kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 20 00:26:42 Server kernel: nvme 0000:03:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
Aug 20 18:29:22 Server kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:03:00.0
Aug 20 18:29:22 Server kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 20 18:29:22 Server kernel: nvme 0000:03:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
Aug 22 17:26:33 Server kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:03:00.0
Aug 22 17:26:33 Server kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 22 17:26:33 Server kernel: nvme 0000:03:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
Aug 23 04:06:23 Server kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:03:00.0
Aug 23 04:06:23 Server kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 23 04:06:23 Server kernel: nvme 0000:03:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
Aug 23 11:17:45 Server kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:03:00.0
Aug 23 11:17:45 Server kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 23 11:17:45 Server kernel: nvme 0000:03:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
Aug 24 03:53:45 Server kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:03:00.0
Aug 24 03:53:45 Server kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 24 03:53:45 Server kernel: nvme 0000:03:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
Aug 24 13:18:54 Server kernel: pcieport 0000:00:1a.0: AER: Corrected error received: 0000:03:00.0
Aug 24 13:18:54 Server kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 24 13:18:54 Server kernel: nvme 0000:03:00.0:   device [144d:a80a] error status/mask=00000001/0000e000

August 26, 2022

Thanks for the replies. In regards to the temp and fan controls. Understood...But there not many chips that do this.

Quote

The Network stats plugin works fine with all Unraid versions, including latest beta release 6.11.0-rc4

Funny you mention that seeing it just was REMOVED as depreciated. I had shown a screen grab of the broken images that used to be the network stats which only occured after moving to the current version on unRAID.

Thank you again for the replies.

August 24, 2022

On 8/21/2022 at 12:02 PM, Squid said:

I have now marked this plugin as being deprecated across all versions of the OS. If at some point in the future @dorgan includes the required packages needed to run this plugin and does his own package management without forcing the user to sort it all out, this deprecation will be reversed.

If you already have this plugin installed, then be aware that you are doing your own package management, and there is always the possibility that whatever packages you are manually installing to run this plugin may (or may not) have adverse affects on the OS. It is entirely up to you whether or not to uninstall this plugin and revert whatever package management scripts you may have been using.

So sad to see so many Dynamix going away for non-development. I had mentioned this issue some time back in the Dynamix thread with no reply.

Same with the TEMP and FAN control plugins. Which is MUCH NEEDED and SHOULD BE already part of the unRAID system. Yet...Nope. Just another dead plugin(s). I am not even sure unRAID is being looked after these days. (IMHO)

August 20, 2022

30 minutes ago, spl147 said:

Agreed! i also think the system temp and fanspeed plugins should be part of the OS. the github issues even go unanswered!

i have one open from 3/2021 with not a single reply!

Yes, I have posted an issue with the TEMP and FAN control where can not even detect the fans and one even on the NETWORK STATS all being broken images from upgrading to the current version of unRAID. No replies on either.

I miss them...and I do understand they did these for free. But it so hurts when they just "go away".

I also agree that at this point, unRAID should be more mature product in such things as dealing with fans etc that the plugin were doing. Sadly, even unRAID seems to be underdeveloped to some degree. (IMHO...Even though it is good and I have used it for...gosh...from when they came out.)

August 20, 2022

Ok, it seems to me...Sorry to say...That this ONE thread for ALL DYNAMIX - V6 PLUGINS is just a poor idea.

To me it seems that DYNAMIX - V6 PLUGINS has more or less lost it's developers and thus is now lacking in support. I have read that it seems ONE person is working, when he can, on things, but that is it. Thus an issue for sure. Heck I can't even get a single reply to issues I have reported.

So just wanted to say that DYNAMIX was a great plugin at one point and was a MUST HAVE...Now is sadly lacking due to developers.

They do not get paid, so I get it. But now Plugins we have come to use and rely on are, well, just lacking.

August 11, 2022

Hi....

I am sorry...But the above solution of adding "pci=noaer" to boot I do not think really "solves" anything other than hiding the error. The error is still happening, just not reporting it. So the real question is why it is happening so it can be fixed? Here is my system config...with most recent BIOS running.

Gigabyte Technology Co., Ltd. Z690I A ULTRA LITE D4 , Version Default string
American Megatrends International, LLC., Version F20a
BIOS dated: Fri 22 Jul 2022 12:00:00 AM EDT
12th Gen Intel® Core™ i5-12600K @ 3700 MHz

Samsung 1TB NVMe (2 in a RAID for Cache)
LSI PCIe 8 drive controller

I get the error whenever MOVER runs. I use NVMe Cache Drives and it reports the issue each time.

Aug 11 13:42:59 Server emhttpd: shcmd (94): /usr/local/sbin/mover &> /dev/null &
Aug 11 13:43:29 Server kernel: pcieport 0000:00:06.0: AER: Corrected error received: 0000:02:00.0
Aug 11 13:43:29 Server kernel: nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 11 13:43:29 Server kernel: nvme 0000:02:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
Aug 11 13:43:29 Server kernel: nvme 0000:02:00.0:    [ 0] RxErr                 
Aug 11 13:43:51 Server kernel: pcieport 0000:00:06.0: AER: Corrected error received: 0000:02:00.0
Aug 11 13:43:51 Server kernel: nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 11 13:43:51 Server kernel: nvme 0000:02:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
Aug 11 13:43:51 Server kernel: nvme 0000:02:00.0:    [ 0] RxErr

Thanks for any additional info.

David

August 5, 2022

No suggestions on getting fan data and control on a Gigabyte Z690i ADRUS ULTRA LITE DDR4????

Please help.

August 2, 2022

Hi...

System Temp and Fan Control
unRAID 6.10.3 (Note that going to this ver of unRAID also broke the NEt Stats screen. I mentioned this also above. Sadly no reply.)

Motherboard: Gigabyte Z690i ADRUS ULTRA LITE DDR4

The System Temp can only see the CPU (and cores) , Motherboard Temp and NVMe temps. It can not find the FAN's at all.

I have tried a few things just to try to resolve it from reading above. I have tried adding "acpi_enforce_resources=lax" to the boot with no luck (hey you never know.) I also tried adding "modprobe it87" to the GO file again just to try something.

I ran the sensors-detect and the output is below. Can't seem to find the chip or driver that is needed that I can tell.

Anyone have any ideas on what I may be able to do to get this MB readings for the FAN's so I then can control them? (they are 4 pin headers so they are controllable and they do show up in the BIOS.) There are only 2 case fans and 1 CPU fan...None show.

Thank you for your time.

root@Server:~# sensors-detect
# sensors-detect version 3.6.0
# System: Gigabyte Technology Co., Ltd. Z690I A ULTRA LITE D4 [-CF]
# Kernel: 5.15.46-Unraid x86_64
# Processor: 12th Gen Intel(R) Core(TM) i5-12600K (6/151/2)

This program will help you determine which kernel modules you need
to load to use lm_sensors most effectively. It is generally safe
and recommended to accept the default answers to all questions,
unless you know what you're doing.

Some south bridges, CPUs or memory controllers contain embedded sensors.
Do you want to scan for them? This is totally safe. (YES/no): YES
Silicon Integrated Systems SIS5595... No
VIA VT82C686 Integrated Sensors... No
VIA VT8231 Integrated Sensors... No
AMD K8 thermal sensors... No
AMD Family 10h thermal sensors... No
AMD Family 11h thermal sensors... No
AMD Family 12h and 14h thermal sensors... No
AMD Family 15h thermal sensors... No
AMD Family 16h thermal sensors... No
AMD Family 17h thermal sensors... No
AMD Family 15h power sensors... No
AMD Family 16h power sensors... No
Hygon Family 18h thermal sensors... No
Intel digital thermal sensor... Success!
(driver `coretemp')
Intel AMB FB-DIMM thermal sensor... No
Intel 5500/5520/X58 thermal sensor... No
VIA C7 thermal sensor... No
VIA Nano thermal sensor... No

Some Super I/O chips contain embedded sensors. We have to write to
standard I/O ports to probe them. This is usually safe.
Do you want to scan for Super I/O sensors? (YES/no): YES
Probing for Super-I/O at 0x2e/0x2f
Trying family `National Semiconductor/ITE'... No
Trying family `SMSC'... No
Trying family `VIA/Winbond/Nuvoton/Fintek'... No
Trying family `ITE'... Yes
Found unknown chip with ID 0x8689
Probing for Super-I/O at 0x4e/0x4f
Trying family `National Semiconductor/ITE'... No
Trying family `SMSC'... No
Trying family `VIA/Winbond/Nuvoton/Fintek'... No
Trying family `ITE'... No

Some systems (mainly servers) implement IPMI, a set of common interfaces
through which system health data may be retrieved, amongst other things.
We first try to get the information from SMBIOS. If we don't find it
there, we have to read from arbitrary I/O ports to probe for such
interfaces. This is normally safe. Do you want to scan for IPMI
interfaces? (YES/no): YES
Probing for `IPMI BMC KCS' at 0xca0... No
Probing for `IPMI BMC SMIC' at 0xca8... No

Some hardware monitoring chips are accessible through the ISA I/O ports.
We have to write to arbitrary I/O ports to probe them. This is usually
safe though. Yes, you do have ISA I/O ports even if you do not have any
ISA slots! Do you want to scan the ISA I/O ports? (YES/no): YES
Probing for `National Semiconductor LM78' at 0x290... No
Probing for `National Semiconductor LM79' at 0x290... No
Probing for `Winbond W83781D' at 0x290... No
Probing for `Winbond W83782D' at 0x290... No

Lastly, we can probe the I2C/SMBus adapters for connected hardware
monitoring devices. This is the most risky part, and while it works
reasonably well on most systems, it has been reported to cause trouble
on some systems.
Do you want to probe the I2C/SMBus adapters now? (YES/no): YES
Found unknown SMBus adapter 8086:7aa3 at 0000:00:1f.4.
Sorry, no supported PCI bus adapters found.
Module i2c-dev loaded successfully.

Next adapter: SMBus I801 adapter at efa0 (i2c-0)
Do you want to scan it? (YES/no/selectively): YES
Client found at address 0x50
Probing for `Analog Devices ADM1033'... No
Probing for `Analog Devices ADM1034'... No
Probing for `SPD EEPROM'... Yes
(confidence 8, not a hardware monitoring chip)
Probing for `EDID EEPROM'... No
Client found at address 0x52
Probing for `Analog Devices ADM1033'... No
Probing for `Analog Devices ADM1034'... No
Probing for `SPD EEPROM'... Yes
(confidence 8, not a hardware monitoring chip)

Now follows a summary of the probes I have just done.
Just press ENTER to continue:

Driver `coretemp':
* Chip `Intel digital thermal sensor' (confidence: 9)

Do you want to generate /etc/sysconfig/lm_sensors? (yes/NO): yes
Copy prog/init/lm_sensors.init to /etc/init.d/lm_sensors
for initialization at boot time.
You should now start the lm_sensors service to load the required
kernel modules.

Unloading i2c-dev... OK

David Bott

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by David Bott

NVME drives throwing errors, filling logs instantly. How to resolve?

NVME drives throwing errors, filling logs instantly. How to resolve?

Dynamix - V6 Plugins

[PLUG-IN] NerdTools

Dynamix - V6 Plugins

Dynamix - V6 Plugins

Dynamix - V6 Plugins

Dynamix - V6 Plugins

Dynamix - V6 Plugins

NVME drives throwing errors, filling logs instantly. How to resolve?

NVME drives throwing errors, filling logs instantly. How to resolve?

NVME drives throwing errors, filling logs instantly. How to resolve?

NVME drives throwing errors, filling logs instantly. How to resolve?

NVME drives throwing errors, filling logs instantly. How to resolve?

NVME drives throwing errors, filling logs instantly. How to resolve?

NVME drives throwing errors, filling logs instantly. How to resolve?

[Plugin] Network Stats

NVME drives throwing errors, filling logs instantly. How to resolve?

Dynamix - V6 Plugins

[Plugin] Network Stats

Dynamix - V6 Plugins

Dynamix - V6 Plugins

NVME drives throwing errors, filling logs instantly. How to resolve?

Dynamix - V6 Plugins

Dynamix - V6 Plugins