Falcosc

Members
  • Posts

    91
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by Falcosc

  1. I did use lspci -vvvnnPPDq to compare the aspm state between unraid and ubuntu. This will contain the root address for any endpoint. If you just configure aspm as it was on Ubuntu with the script it should be saver then trying things which did not get enabled on Ubuntu. On my system, it was the ethernet controller and the NVMe SSD. If I changed the NVMe SSD from chipset to CPU by pluging it into the x16 slot, it got aspm, but the ethernet controller couldn't but changed or deactivated, so I used the hack for my ethernet controller endpoint and the according chipset root because ubuntu did enable it for them by default. If I plug my NVMe SSD in one of the x4 slots I needed to run the hack for ethernet and nvme. So maybe you didn't see which endpoint is forcing your root complex into the non-power saving mode.
  2. No, endpoint is the device actually connected. So check all endpoints connected to it, one of them is causing the root endpoint to be forced to be disabled. Compare the Ubuntu output with the unraid output and you will find the differences. That's why I didn't share the link, you would found the differences between devices and root complexes while reading https://wireless.wiki.kernel.org/en/users/documentation/aspm#enabling_aspm_with_setpci, and you would found the dead link to the script. I will remove the link so that only people who found the dead link in that documentation will search it on internet archive by them self. I hope the script did prevent you to write stuff in the wrong registers, since endpoints looks different to root complexes.
  3. I don't check mine, but I downloaded the script from the article author via internet archive and placed my endpoint and root device id in there. Just search the dead link to the script on internet archive. I put it in /boot/config/go and after the hack I do run powertop --auto-tune in there.
  4. No news, we still use the same procedure to work around the broken or to defensive device initialization in unraid: Boot from Ubuntu Live Check if your hardware supports ASPM on Ubuntu check if you get the expected power savings in ubuntu If you confirmed that your hardware supports it and if it actually saves power, then it is safe enough to use the hack to flip the according ASPM flag directly in the pci registers. https://wireless.wiki.kernel.org/en/users/documentation/aspm#enabling_aspm_with_setpci I did add the hack in /boot/config/go and after the hack I do run powertop --auto-tune in there. But only very few people are effected by the bug, most peoples don't have this issue. If you can not get it working in Ubuntu, then you are not effected by this topic and need to find a solution in a new topic instead of using this "Why is ASPM disabled in Unraid, while it's enabled in Ubuntu" topic for your question.
  5. Check first what happens with the CPU state after enabling L0s before investigating this. And then compare the power draw without this card to see if it is worth to hunt for. With Ubunutu I was able to reach C8 state in powertop which resulted in a major power draw change of 4W. And don't forget to execute autotune with powertop
  6. @ogiif Ubuntu works, I encourage you to create a new bug for this issue. You can have a look at to see which kind of information are needed for evidence. But if Ubuntu doesn't work for your devices, then you have a real setup issue. There could a specific hardware combination which does prevent it or a configuration issue. For example, your LSI SAS2308 PCI-Express Fusion-MPT SAS-2 doesn't have L1 support at all (LnkCap). The 4 Port SATA on the other hand have it. Have an eye on that, this one device could prevent your CPU from sleep. And even if 9 of 10 links are sleeping, you won't see much power savings as long as a single device keeps your CPU busy on the bus. For that reason, I recommend checking power consumption on Ubuntu. You don't need to spend time on ASPM for the supported devices, as long as you have at least one thing which will not support it. I don't know if L0s is enough to see significant changes in the power consumption, I only know about the huge effect of a proper sleeping CPU after having everything in the deeper L1 sleep.
  7. Yes, liveUSB does work fine. On my hardware, it did immediately work in ubuntu without having any boot options. After figuring this out, I did manually set the ASPM option in the registers via a startup script which is executed after unraid config did decide that it is not save/not supported to enable it. Here is the documentation which I did use to set the flags, it isn't the reference to the real specification, more a how-to https://wireless.wiki.kernel.org/en/users/documentation/aspm#enabling_aspm_with_setpci
  8. Have you already done the usual solutions like bios setup and kernel boot options? If your system is setup incorrectly, it will not work. If you have everything, then you should check Ubuntu first, because Ubuntu does not have the unknown ASPM bug which Unraid has for my hardware. If you can get ASPM enabled on Ubuntu without hacks by using only bios and kernel boot options, then we will use your result to push forward. It will provide more evidence to help Unraid team to identify the cause of the issue. And with the Ubuntu test you could see how much power saving is actually possible. If there isn't any, I wouldn't go forward. And I would only recommend the ASPM hack (forced activation, which does bypass any compatibility checks) if you did confirm that other operating systems can enable it, then bypassing logic is a bit less dangerous.
  9. -K Flag to make it persistent doesn't work. It is a common issue that the write cache needs to be set manually with hdparm on every boot. But I don't understand why it is disabled on some systems by default. I found people who want to disable it on every boot but also find people who asked how to enable it on every boot. How did we solved it in unraid? And is the cache flush hdd spin up a bug or is disable cache workaround worth to integrate into the commuity plugins?
  10. I would say the drive is bad because I beleave that SMART reporting is done on the integrated processor of the HDD PCB and does not run on your main processor. So even if your memory is bad, SMART should not get effected by this. That is an assumption, so correct me if I am wrong and HDD does actually need to OS to cooperate to offload the checksum calculation to the main CPU to compare them with the head read results, which is very unlikely. But cool that the preclear actually stops formatting. I thought you have to monitor these values manually. I did work with broken hard drives and have some experience with them. I would not recommend it, but here are some unreliable tricks to use broken disks:
  11. I figured out that a cache flush at the shutdown/sleep process does spin up all HDDs which have write cache enabled. Is this normal behavior, or should write cache be always empty if the disk does spin down after 10 minutes? Do we already have a configuration to turn write cache off as a workaround? ([sdb...sde] in my case) Currently I use a sleep wake up script: printf /dev/sd%s\\n {b..e} | xargs -n 1 hdparm -W0 Would it make sense to add this workaround as an optional configuration as pull request to the s3sleep plugin or is it a bug that spin up does happen for no reason even with probably empty write cache during shutdown?
  12. @joie95 do you know this https://superuser.com/questions/879374/hdd-power-up-in-standby-prevent-from-spinning-up/1210097 and this https://bbs.archlinux.org/viewtopic.php?pid=1712710#p1712710?
  13. Looks good, no issue. Looks like S3 Sleep plugin did correctly detect that there is disk activity and does prevent sleep. If you try to shut down in this state, you will get an unclean shutdown. So I don't see any issue with the plugin. You now have to look to what is causing disk activity on your cache drive. But this is something unrelated to the plugin and maybe worth an own topic. You will find lots of documentation online about how to detect which processes are causing disk access. You probably need the nerd plugin to add necessary tools to find out which process is causing disk access. Once you found the process which is accessing the disk, you may need help of other people to understand what kind of process it is. But in the end it is not related to s3 sleep plugin.
  14. Maybe you should share the full log before we start to blame unraid for logging issues. There should be a lot of output about stopping array. I use scripts to log to my NVMe drive, but needed to do write additional shutdown scripts to stop logging during shutdown to avoid unclean shutdown. That was my issue: May 26 17:15:08 server s3_sleep: Shutdown system now May 26 17:15:08 server shutdown[24407]: shutting down for system halt May 26 17:15:08 server init: Switching to runlevel: 0 May 26 17:15:08 server s3_sleep: System woken-up. Reset timers May 26 17:15:08 server init: Trying to re-exec init May 26 17:15:09 server unraid-api[8833]: 👋 Farewell. UNRAID API shutting down! May 26 17:15:10 server kernel: mdcmd (66): nocheck cancel May 26 17:15:11 server emhttpd: Spinning up all drives... May 26 17:15:11 server emhttpd: read SMART /dev/sdd May 26 17:15:11 server emhttpd: read SMART /dev/sde May 26 17:15:11 server emhttpd: read SMART /dev/sdb May 26 17:15:11 server emhttpd: read SMART /dev/sdc May 26 17:15:11 server emhttpd: read SMART /dev/nvme0n1 May 26 17:15:11 server emhttpd: read SMART /dev/sda May 26 17:15:11 server emhttpd: Stopping services... May 26 17:15:11 server emhttpd: shcmd (190242): /etc/rc.d/rc.libvirt stop May 26 17:15:11 server root: Waiting on VMs to shutdown May 26 17:15:11 server root: Stopping libvirtd... May 26 17:15:11 server dnsmasq[8614]: exiting on receipt of SIGTERM May 26 17:15:11 server kernel: device virbr0-nic left promiscuous mode May 26 17:15:11 server kernel: virbr0: port 1(virbr0-nic) entered disabled state May 26 17:15:11 server avahi-daemon[8044]: Interface virbr0.IPv4 no longer relevant for mDNS. May 26 17:15:11 server avahi-daemon[8044]: Leaving mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1. May 26 17:15:11 server avahi-daemon[8044]: Withdrawing address record for 192.168.122.1 on virbr0. May 26 17:15:11 server root: Network a67b7b9b-bef4-488b-b243-c9e5f391a3b1 destroyed May 26 17:15:11 server root: May 26 17:15:14 server root: Stopping virtlogd... May 26 17:15:15 server root: Stopping virtlockd... May 26 17:15:16 server emhttpd: shcmd (190243): umount /etc/libvirt May 26 17:15:16 server s3_sleep: ---------------------------------------------- May 26 17:15:16 server s3_sleep: command-args=-q May 26 17:15:16 server s3_sleep: action mode=sleep May 26 17:15:16 server s3_sleep: check disks status=no May 26 17:15:16 server s3_sleep: check network activity=no May 26 17:15:16 server s3_sleep: check active devices=no May 26 17:15:16 server s3_sleep: check local login=no May 26 17:15:16 server s3_sleep: check remote login=no May 26 17:15:16 server s3_sleep: version=3.0.8 May 26 17:15:16 server s3_sleep: ---------------------------------------------- May 26 17:15:16 server s3_sleep: included disks=nvme0n1 sdb sdc sdd sde May 26 17:15:16 server s3_sleep: excluded disks=sda May 26 17:15:16 server s3_sleep: ---------------------------------------------- May 26 17:15:16 server s3_sleep: killing s3_sleep process 14216 May 26 17:15:16 server unassigned.devices: Unmounting All Devices... May 26 17:15:16 server emhttpd: shcmd (190245): /etc/rc.d/rc.samba stop May 26 17:15:16 server wsdd[30446]: udp_send: Failed to send udp packet with Network is unreachable May 26 17:15:16 server wsdd[30446]: Failed to send bye with Network is unreachable May 26 17:15:16 server winbindd[30451]: [2022/05/26 17:15:16.991808, 0] ../../source3/winbindd/winbindd.c:244(winbindd_sig_term_handler) May 26 17:15:16 server winbindd[30451]: Got sig[15] terminate (is_parent=0) May 26 17:15:16 server nmbd[30439]: [2022/05/26 17:15:16.992354, 0] ../../source3/nmbd/nmbd.c:59(terminate) May 26 17:15:16 server nmbd[30439]: Got SIGTERM: going down... May 26 17:15:16 server nmbd[30439]: [2022/05/26 17:15:16.992430, 0] ../../source3/libsmb/nmblib.c:922(send_udp) May 26 17:15:16 server nmbd[30439]: Packet send failed to 192.168.122.255(138) ERRNO=Network is unreachable May 26 17:15:16 server nmbd[30439]: [2022/05/26 17:15:16.992456, 0] ../../source3/libsmb/nmblib.c:922(send_udp) May 26 17:15:16 server nmbd[30439]: Packet send failed to 192.168.122.255(138) ERRNO=Network is unreachable May 26 17:15:16 server winbindd[30927]: [2022/05/26 17:15:16.993900, 0] ../../source3/winbindd/winbindd.c:244(winbindd_sig_term_handler) May 26 17:15:16 server winbindd[30927]: Got sig[15] terminate (is_parent=0) May 26 17:15:16 server winbindd[30449]: [2022/05/26 17:15:16.994062, 0] ../../source3/winbindd/winbindd.c:244(winbindd_sig_term_handler) May 26 17:15:16 server winbindd[30449]: Got sig[15] terminate (is_parent=1) May 26 17:15:18 server emhttpd: shcmd (190246): rm -f /etc/avahi/services/smb.service May 26 17:15:18 server avahi-daemon[8044]: Files changed, reloading. May 26 17:15:18 server avahi-daemon[8044]: Service group file /services/smb.service vanished, removing services. May 26 17:15:18 server emhttpd: Stopping mover... May 26 17:15:18 server emhttpd: shcmd (190248): /usr/local/sbin/mover stop May 26 17:15:18 server root: Forcing turbo write on May 26 17:15:18 server kernel: mdcmd (67): set md_write_method 1 May 26 17:15:18 server kernel: May 26 17:15:18 server root: ionice -c 2 -n 7 nice -n 19 /usr/local/sbin/mover.old stop May 26 17:15:18 server root: mover: not running May 26 17:15:18 server emhttpd: Sync filesystems... May 26 17:15:18 server emhttpd: shcmd (190249): sync May 26 17:15:19 server emhttpd: shcmd (190250): umount /mnt/user0 May 26 17:15:19 server emhttpd: shcmd (190251): rmdir /mnt/user0 May 26 17:15:19 server emhttpd: shcmd (190252): umount /mnt/user May 26 17:15:19 server emhttpd: shcmd (190253): rmdir /mnt/user May 26 17:15:20 server emhttpd: shcmd (190255): /usr/local/sbin/update_cron May 26 17:15:20 server emhttpd: Unmounting disks... May 26 17:15:20 server emhttpd: shcmd (190256): umount /mnt/disk1 May 26 17:15:20 server kernel: XFS (md1): Unmounting Filesystem May 26 17:15:21 server emhttpd: shcmd (190257): rmdir /mnt/disk1 May 26 17:15:21 server emhttpd: shcmd (190258): umount /mnt/disk2 May 26 17:15:21 server kernel: XFS (md2): Unmounting Filesystem May 26 17:15:21 server emhttpd: shcmd (190259): rmdir /mnt/disk2 May 26 17:15:21 server emhttpd: shcmd (190260): umount /mnt/disk3 May 26 17:15:21 server kernel: XFS (md3): Unmounting Filesystem May 26 17:15:21 server emhttpd: shcmd (190261): rmdir /mnt/disk3 May 26 17:15:21 server emhttpd: shcmd (190262): umount /mnt/cache May 26 17:15:21 server root: umount: /mnt/cache: target is busy. May 26 17:15:21 server emhttpd: shcmd (190262): exit status: 32 May 26 17:15:21 server emhttpd: Retry unmounting disk share(s)... May 26 17:15:26 server emhttpd: Unmounting disks... May 26 17:15:26 server emhttpd: shcmd (190263): umount /mnt/cache "s3_sleep: Shutdown system now" is a bug and "server root: umount: /mnt/cache: target is busy." was my fault (it was my syslog to nvme script
  15. There is no shutdown after "Disk activity detected. Reset timers." If you don't have more logging you need to check how to improve the logging situation.
  16. The reason for unclean shutdowns can be found in the log as well. For example, in my case it was failed to unmount, and then I found the process.
  17. Well, after knowing that SMART does already cover bit errors on each disk, you don't need integrity checks anymore. And parity checks on top of SMART will be already the 2nd data validation tool. If 2 validations are not enough, you could use dual parity to have 3 validations of your data. Once the data got transmitted valid to your unraid system, Parity + SMART notifications will be enough. I did enable all SMART Notifications and added 10, 184, 196, 200. I don't use 1 because I would need to exclude Seagate disks from parameter 1. SMART data should be used to detect which disk has no errors. But only after checking the system health, that's always the first point on a system without drive failures
  18. You need some kind of low powered device in your network which is always on to forward the WOL. I use an old rasperyPi which uses ~1W And because I'm lazy and don't want to implement WOL on any device, I use https://github.com/nikp123/wake-on-arp @kalidus There is no known issue so far, we only know about unsupported combinations of features. Write your logs to flash and check which thing is stopping unraid from powering down. If you don't use shutdown at all, because you only work with sleep, then you may be effected by this: https://github.com/bergware/dynamix/issues/59
  19. True, maybe the acknowledged old smart value of the field "reallocated sector count" got lost, and the known good value shows up as new warning. Thanks for your feedback. Knowing that HDDs have checksums as well make it a lot easier to handle. For which SMART fields do we get notification emails on value change? I can not test SMART data changes, so I would like to know where to look up which fields are configured to raise notification emails.
  20. So, would you agree to my smart based handling procedure of parity check errors? Or is there a point which you would do differently?
  21. Wenn das S3 Plugin dann einmal richtig eingestellt ist, dann sind die wenigen Watt Ersparnisse, die man im idle noch rauskitzeln kann, unmöglich finanziell zu begründen. Ich denke, mit besserer Uptime Management und Planung wie viele Platten gleichzeitig rotieren (write mode und Datenverteilung), holst du viel mehr raus als mit Hardwareänderungen. Gibt beim S3 Plugin paar Fallstricke: die WOL "g" Option wird manchmal nicht richtig gespeichert und fehlt dann im cfg File, also nach jeder Änderung der Einstellungen immer kurz prüfen ob WOL noch geht den write cache musste ich per post sleep script abschalten da beim Schlafen gehen ein flash die platten unnötig aufgeweckt hat obacht das der letzte post sleep script befehl immer mit exit 0 endet, hab ich eben erst gemerkt, dass da ein fehler ist: https://github.com/bergware/dynamix/issues/59
  22. I have researched this topic. This is a main board issue. You have to mod the bios to keep the disks in standby. There was somebody how did try to fix this and found a hard disk access inside the Linux kernel which is necessary to initialize the disk. For testing, he did remove it completely as POC and deploy his test kernel. Unfortunately this didn't help, during wake up there were some bios related stuff involved which he couldn't disable. I don't have the reference anymore, but his hack was to physically disconnect the SATA connection partially or completely during this process. But I don't know if this was an idea or if he actually did deploy an Arduino to do this. Maybe modern systems does not do this anymore, and you need to revisit the kernel mod?
  23. I have a S3 Sleep question, why do I need to manually disable write cache on my disks? Because if the drives are spun down, and the plugin executes this mem >/sys/power/state Unraid will send a cache flush and the drives spin up again for no reason (or just to write cache, which would be strange if it is delayed by 15 minutes) I use this post sleep command to remove write cache printf /dev/sd%s\\n {b..e} | xargs -n 1 hdparm -W0 Would it make sense to add this to the sleep plugin configuration, or is my unraid wrongly configured? I didn't find this setting on the GUI. And I don't notice any performance issues on write. But the additional spin up was just unnecessary.
  24. Ich nutze das hier, um den Server bei jedem Zugriff aufzuwecken: https://github.com/nikp123/wake-on-arp
  25. Nutzt du ein post sleep script? Verhält sich relativ ähnlich, wenn die letzte Zeile ein non 0 exitcode sendet. https://github.com/bergware/dynamix/issues/59 Für mich war ein echo als letzte Zeile ein Workaround.