November 18, 20241 yr Timeline: Power outage put me on ups, which resulted in clean shutdown Power on followed by power outage led to unclean shutdown Power on started Parity, which seemed to hang at < 1kbps speeds that was not progressing after a week. Attempted pause, and cancel from UI, both logged in syslog but didn't respond. Dockers worked fine during this time, just parity was hung. Forced a restart, parity check kicked off but machine hard locked up (powered on but unresponsive to ssh and no video on ipmi) Forced another a restart and now parity is running again (currently ~200mbps) Diagnostics attached, on latest version. beyonder-nas-diagnostics-20241118-2246.zip
November 19, 20241 yr Community Expert Solution not much help general toruble shooting... From the syslog provided... Key Observations Unclean Shutdown: The unclean shutdown likely caused the array to trigger an automatic parity check. If there was ongoing disk activity or the disks were not properly unmounted, this might lead to inconsistencies. Initial Parity Check Issues: Parity speed dropped to <1kbps, indicating a significant issue, likely related to: Disk I/O problems. Disk health issues or pending sectors. Potential filesystem corruption. Forced Reboot and Lock-Up: The forced reboot may have exacerbated filesystem inconsistencies or caused additional stress on the hardware, leading to the lock-up. Current Status: After the second reboot, the parity check is running at ~200MB/s, indicating some temporary issues may have resolved, but further checks are necessary. Check Disk Health Run SMART tests on all drives to identify potential disk issues: Go to Main > Devices in the Unraid GUI. Select each drive and review SMART attributes, particularly: Reallocated Sectors. Pending Sectors. Errors in the SMART error log. Run a short or extended SMART test for each drive. Look for any failing drives or disks that might be causing I/O bottlenecks. Verify Filesystem Integrity The forced shutdown might have left filesystems in a corrupted state. Check and repair the filesystems: Stop the array in the Unraid GUI. For each disk: Click on the disk, then select Check Filesystem. Run the check in Read-only mode first to identify issues. If issues are found, repeat with the option to repair. Investigate Share and Pool Issues The logs mention warnings about share configuration: Share Configuration Warnings: Some shares (e.g., backups, docker) have files spread across multiple pools (scratch, scratch_old, and apps). This can cause issues during system startup or array operations. Action: Consolidate these files to their intended pools or correct the share configurations in Shares > Your Share > Pool Settings. Monitor Docker and Networking Docker containers were still operational despite parity issues, but the logs show frequent reinitialization of virtual Ethernet interfaces (vethXXXX), which could indicate network instability. Action:If you’re using ipvlan: Ensure no IP conflicts exist between the host and Docker containers. Consider testing with Docker disabled temporarily Monitor Parity Check Progress Parity speeds of ~200MB/s are typical for modern systems. Let the parity check complete. Action if the speed drops again: Check Main > Array Operation for which disk is being read. Investigate that disk for potential I/O issues. Protect Against Future Power Outages Use a UPS (Uninterruptible Power Supply) to prevent hard shutdowns during power outages. Configure Unraid to shut down gracefully when the UPS battery is low: Go to Settings > UPS Settings and enable UPS monitoring.
November 19, 20241 yr Author Neat, thanks. I've now reached a point past where the parity stalled, the speed has dropped to 100mbps, but its at 61% whilst it had stalled at ~40%. Will look through the other items and see if there's anything there. Settings look fine elsewhere and nothing that could indicate why I needed to force reboot to get parity check killed. Will monitor the parity check and see if the system locks up again. One thing I'm worried about is the USB durability, I saw it took a long time to list it in the bios boot devices: is there any check for usb life issues and could that cause a lock up/kernel panic?
November 19, 20241 yr Community Expert Honestly not sure. usualy its wait until FCP plugin compalsin about medai write Since i use teh appdata backup plugin which will also back up the flash when it run... what your asking to me is called a wearout percentage.... We can caculate WER I'm not no relay sure, as I have not seen this in unraid. I curently use the Proxmox ssytem and it has it in there web UI... as we need the usb /flash mount In Unraid it is mounted at /boot But comands taht I know would need more linux access to the drvie. So here is Some math: Calculating USB Write Error Rate (WER) Percentage Example Smart data: Number of Write Errors: 5 In proxmox we get wearout So i would recomed running lsblk and geting the /dev location and run smart on the falsh drive if able.. sudo smartctl -a /dev/sdX # Replace "sdX" with your USB device to get the data to run the formula...
November 20, 20241 yr Author Seems fine, so odd. Thanks for the help Parity finished fine this time: Duration: 1 day, 18 hours, 17 minutes, 43 seconds. Average speed: 131.4 MB/s There was something that caused it to hang and then a lockup, but no clues. I guess i'll have to observe and keep an eye of things.
November 21, 20241 yr Community Expert just leaned today. alot on the form in the form of a FAQ so i mised this. AMD cstates can casue instalbilty:
November 25, 20241 yr On 11/18/2024 at 5:54 PM, scs3jb said: Power outage put me on ups, which resulted in clean shutdown Power on followed by power outage led to unclean shutdown This is exactly why it's recommended to shutdown based on time on battery first. Most of the time if the power is out for more than a minute or so, it's going to be down longer than the UPS can reasonably handle, so it's better to make the shutdown happen ASAP so there is the least battery drain possible. When the power does come back, a second outage is a distinct possibility, so waiting to boot until the battery backup has time to recharge is a good idea, and recharge rates are SLOW. Typical would be 10X, so if the UPS is running for 5 minutes, it's probably going to be close to an hour to recharge back to full. Also, batteries don't have to be replaced as often if you don't drain them below 50%. I have all my desktops and VM's set to shutdown after 1 minute of power out, they get their cue with local apcupsd in slave mode looking at the apcupsd master on Unraid. That gets the ball rolling, hopefully all the VM's are down by the time the 3 minute Unraid master shutdown is commanded. All the network infrastructure stays powered until manually shut down or their dedicated UPS is dead.
December 26, 20241 yr Author On 11/25/2024 at 9:55 PM, JonathanM said: This is exactly why it's recommended to shutdown based on time on battery first. Most of the time if the power is out for more than a minute or so, it's going to be down longer than the UPS can reasonably handle, so it's better to make the shutdown happen ASAP so there is the least battery drain possible. When the power does come back, a second outage is a distinct possibility, so waiting to boot until the battery backup has time to recharge is a good idea, and recharge rates are SLOW. Typical would be 10X, so if the UPS is running for 5 minutes, it's probably going to be close to an hour to recharge back to full. Also, batteries don't have to be replaced as often if you don't drain them below 50%. I have all my desktops and VM's set to shutdown after 1 minute of power out, they get their cue with local apcupsd in slave mode looking at the apcupsd master on Unraid. That gets the ball rolling, hopefully all the VM's are down by the time the 3 minute Unraid master shutdown is commanded. All the network infrastructure stays powered until manually shut down or their dedicated UPS is dead. afaik it doesn't prevent you powering back on when its low on battery I didn't have problems with the first clean shutdown, it was the second power cut that was the problem. A clean shutdown takes 5-10mins with 18 disks and loads of dockers. Edited December 26, 20241 yr by scs3jb
December 26, 20241 yr 1 minute ago, scs3jb said: afaik it prevents powering back on when its low on battery. What is "it"?
December 26, 20241 yr Author After the first power outage, unraid booted up and started the array, whilst there was low battery. It did not shutdown in time when the power cut happened. Edited December 26, 20241 yr by scs3jb
December 26, 20241 yr Unraid doesn't check and act on UPS status until way too late in the startup process. You need to make sure there is enough battery left to fully boot up and cleanly shut down before you start Unraid.
December 26, 20241 yr Author Just now, JonathanM said: Unraid doesn't check and act on UPS status until way too late in the startup process. You need to make sure there is enough battery left to fully boot up and cleanly shut down before you start Unraid. Yeah, so I think i was screwed. My UPS isnt big enough to keep it above 50% if there's a power outage, the clean shutdown just takes too long. Anyway, luckily after aborting the parity check and running again it went back to normal. I wonder if its worth checking UPS status and delaying starting the array, or is that too big a change? I think the problem is my motherboard is set to restore power state and my networking gear drained the battery completely over time, so this was all 'automated' ps. I just noticed the reply was form back in november, i thought it was a new notification
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.