dopeytree Posted September 3, 2022 Share Posted September 3, 2022 (edited) Fairly new build so no critical data. Ran 2nd parity check and it's showing 1,876,356,798 errors. Same amount on the main disk1 & again on the parity drive... Have I done something wrong? Or is this normal for a new build? Diagnostics attached moulin-rouge-diagnostics-20220903-1127.zip Edited September 3, 2022 by dopeytree Quote Link to comment
JorgeB Posted September 3, 2022 Share Posted September 3, 2022 Those are read errors, not parity sync errors, there are issues with both disks: Sep 3 06:41:20 Moulin-Rouge kernel: ata3: link is slow to respond, please be patient (ready=0) Sep 3 06:41:20 Moulin-Rouge kernel: ata1: link is slow to respond, please be patient (ready=0) Sep 3 06:41:24 Moulin-Rouge kernel: ata3: COMRESET failed (errno=-16) Sep 3 06:41:24 Moulin-Rouge kernel: ata3: hard resetting link Sep 3 06:41:24 Moulin-Rouge kernel: ata1: COMRESET failed (errno=-16) Sep 3 06:41:24 Moulin-Rouge kernel: ata1: hard resetting link Sep 3 06:41:29 Moulin-Rouge kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Sep 3 06:41:29 Moulin-Rouge kernel: ata3.00: configured for UDMA/133 Sep 3 06:41:29 Moulin-Rouge kernel: ata3: EH complete Sep 3 06:41:30 Moulin-Rouge kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 3 06:41:30 Moulin-Rouge kernel: ata1.00: configured for UDMA/133 Sep 3 06:41:30 Moulin-Rouge kernel: ata1: EH complete Do they share a power splitter or something? Quote Link to comment
dopeytree Posted September 3, 2022 Author Share Posted September 3, 2022 (edited) Many thanks for the speedy reply - It's a nas case so 4x drives per backplane. In the bios.. I did turn on sata devslip the other day in a bid to get it energy efficient.. so will reboot & turn that off. Edited September 3, 2022 by dopeytree Quote Link to comment
dopeytree Posted September 3, 2022 Author Share Posted September 3, 2022 (edited) It's now saying disk1 is disabled... but I can still run smart tests on it... FT is the parity drive. HG is disk1 drive. Am just running an extended smart test on disk1 will upload that aswell once complete. ST12000NE0008-1ZF101_ZTN0AZFT-20220903-1253.txt ST12000NE0008-1ZF101_ZTN0AZHG-20220903-1254.txt moulin-rouge-diagnostics-20220903-1303.zip Edited September 3, 2022 by dopeytree Quote Link to comment
dopeytree Posted September 3, 2022 Author Share Posted September 3, 2022 (edited) Is it possible to re-enable a drive or is that un-raids protective measure? I am fairly confident the drives are good as brand new less than 10days old. Is parity check smart enough to stop a drive spin down?? Is that what's caused this coupled with the devslip energy setting? I saw that POWERTOP has been removed from the tips & tweaks plugin because it can interfere with some sata interfaces... is that whats going on? Edited September 3, 2022 by dopeytree Quote Link to comment
dopeytree Posted September 3, 2022 Author Share Posted September 3, 2022 (edited) Confirmed - can mount & read the disk outside the array just fine. Looks like I will have to waste hours getting it to rebuild the drive (Rebuilding a drive onto itself) https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself How can I stop this happening again? How does spin down settings work when mover or parity check are already running?? Edited September 3, 2022 by dopeytree Quote Link to comment
trurl Posted September 3, 2022 Share Posted September 3, 2022 An array disk can spin down after parity check has checked the full capacity of that particular disk, as when parity is larger than that disk. Disk1 has disconnected. Looks like a hardware problem; cables, controller, power Quote Link to comment
dopeytree Posted September 3, 2022 Author Share Posted September 3, 2022 (edited) Can anyone answer the question asked.. can a drive be re-enabled? There's nothing wrong with the cables, controller or power. I think it must have spun down to a low power state and from there something has happened & unraid thinks a drives broken but it's not... Anyway is there a way to bypass unraid's hissy fit? Both drives pass all smart checks... I can mount and access disk1. Its currently re-building but what a waste of time. If unraid doesn't support DEVSLP it should be noted and recorded so that people don't enable it in their motherboards. DevSlp or DevSleep is a feature in some SATA devices which allows them to go into a low power "device sleep" mode when sent the appropriate signal, which uses one or two orders of magnitude less power than a traditional idle. The feature was introduced by SanDisk in a partnership with Intel. Edited September 3, 2022 by dopeytree Quote Link to comment
trurl Posted September 4, 2022 Share Posted September 4, 2022 Unraid disables a disk when a write to it fails, whether due to disk problem or, more commonly, connection problem. After a disk becomes disabled it isn't used again until rebuilt, because the failed write is emulated by parity and so the disk is out-of-sync with the array. The failed write is emulated by updating parity, and after that, any access to the disk is emulated. Reads are emulated from the parity calculation by reading parity and all other disks, writes are emulated by updating parity. That initial failed write, and any subsequent writes to the emulated disk, can be recovered by rebuilding the disk. To get the array in sync again, you either have to rebuild the data disk, or rebuild parity. If you rebuild parity instead of rebuilding the data disk, you would lose all those emulated writes. It is even possible that a failed write could be filesytem metadata which would make it corrupt if not recovered. Quote Link to comment
dopeytree Posted September 4, 2022 Author Share Posted September 4, 2022 I’ve been told by support their is a way to re-enable the drive but not how yet… Makes sense if you can mount and read it there should be a way to re-enable Quote Link to comment
JorgeB Posted September 4, 2022 Share Posted September 4, 2022 You can re-enable a drive by doing a new config, but like @trurlmentioned it requires a parity check, so it will take as long as a rebuild. Quote Link to comment
dopeytree Posted September 4, 2022 Author Share Posted September 4, 2022 (edited) Arg loosing my sh*t a bit here... It's just completed the 13hour rebuild and was fine completed well. averaged 180MB/s. I then rebooted as I noticed on the dashboard it wasn't displaying the CPU usage. Now it says both disks are missing.. It won't do anything if I click to download the diagnostics. Also it says bad gateway if I try to load the console.. I'm just attempting a boot via safe mode incase it is a plugin causing an issue. Edited September 4, 2022 by dopeytree Quote Link to comment
dopeytree Posted September 4, 2022 Author Share Posted September 4, 2022 (edited) Ok the array is working fine in safe mode. So it looks to be a plugin or docker problem... Edited September 4, 2022 by dopeytree Quote Link to comment
dopeytree Posted September 4, 2022 Author Share Posted September 4, 2022 After the safe boot I did a reboot to normal mode & it seems back to normal. Don't like not knowing what caused the issue but for now all is working 🙂 Quote Link to comment
dopeytree Posted September 5, 2022 Author Share Posted September 5, 2022 Oh for fecks sake. I told the server to sleep. and managed to wake it with wake on lan. But it didn't stop the array before sleeping so it's corrupted it again.. My fault for trusting a stupid sleep plugin. I wish unraid was a bit more feature rich.. that way you wouldn't need the plugins. Sleep & energy saving features should be build in. I'm now exploring using ubuntu server with a ZFS pool. I think if my drives were ZFS they would just repair themselves instead of doing a whole rebuild if anything goes mildly wrong.. Quote Link to comment
JonathanM Posted September 5, 2022 Share Posted September 5, 2022 9 minutes ago, dopeytree said: Sleep & energy saving features should be build in. Server hardware that has typically been used with Unraid is not designed with sleep in mind, it's designed to run 24/7/365 for years on end without hiccup. The extremely wide variety of hardware that can be used with Unraid means it's not possible to support sleep natively with any level of success, it's all dependent on the hardware combination in use. You could have a perfectly sleep compatible system, add a server grade HBA, and suddenly sleep causes all sorts of issues because the HBA doesn't support it. Instead of sleep, investigate safely shutting down then powering back up with WOL. 1 Quote Link to comment
dopeytree Posted September 5, 2022 Author Share Posted September 5, 2022 (edited) Thanks & very true but there is an energy crisis & most homelab people are thinking about energy usage for past 6months or so... I think I will just leave turned on 24/7. I got the cpu usage down in bios so it doesn't run at half tilt 2.8ghz while idle. Now runs 800mhz. It idles at 43watts which is about £0.30p a day. Remembered that as part of the powertop forum thread it recommends some changes to the go config file. So although tips & tweaks plugin removes the powertop package any modifications remain. This was the file: #!/bin/bash # Start the Management Utility /usr/local/sbin/emhttp & # ------------------------------------------------ # Disables FTP & Telnet # ------------------------------------------------ sed -i -e 's/^telnet/#telnet/;s/^ftp/#ftp/' /etc/inetd.conf /etc/rc.d/rc.inetd restart # ------------------------------------------------- # Set power-efficient CPU governor # ------------------------------------------------- /etc/rc.d/rc.cpufreq powersave # ------------------------------------------------- # Wake On Lan Ethernet # ------------------------------------------------- ethtool -s eth0 wol g # ------------------------------------------------- # powertop tweaks # ------------------------------------------------- # Enable SATA link power management echo med_power_with_dipm | tee /sys/class/scsi_host/host*/link_power_management_policy # Runtime PM for I2C Adapter (i915 gmbus dpb) echo auto | tee /sys/bus/i2c/devices/i2c-*/device/power/control # Autosuspend for USB device echo auto | tee /sys/bus/usb/devices/*/power/control # Runtime PM for disk echo auto | tee /sys/block/sd*/device/power/control # Runtime PM for PCI devices echo auto | tee /sys/bus/pci/devices/????:??:??.?/power/control # Runtime PM for ATA devices echo auto | tee /sys/bus/pci/devices/????:??:??.?/ata*/power/control If anyone else comes across these issues t's important to remove ALL the POWERTOP TWEAKS: So my file now looks like this: #!/bin/bash # Start the Management Utility /usr/local/sbin/emhttp & # ------------------------------------------------ # Disables FTP & Telnet # ------------------------------------------------ sed -i -e 's/^telnet/#telnet/;s/^ftp/#ftp/' /etc/inetd.conf /etc/rc.d/rc.inetd restart # ------------------------------------------------- # Set power-efficient CPU governor # ------------------------------------------------- /etc/rc.d/rc.cpufreq powersave # ------------------------------------------------- # Wake On Lan Ethernet # ------------------------------------------------- ethtool -s eth0 wol g Am I right in thinking ZFS wouldn't need / take asl-ong to fix the drive after a bad shutdown etc? Edited September 5, 2022 by dopeytree Quote Link to comment
trurl Posted September 6, 2022 Share Posted September 6, 2022 5 hours ago, dopeytree said: # ------------------------------------------------ # Disables FTP & Telnet # ------------------------------------------------ sed -i -e 's/^telnet/#telnet/;s/^ftp/#ftp/' /etc/inetd.conf You can disable these in the webUI Quote Link to comment
dopeytree Posted September 6, 2022 Author Share Posted September 6, 2022 (edited) Noticed after a restart it's changed the server name... from moulin-rouge to tower Urghhh.. It's halfway through the re-build. Its at 37%. Anyway right now noticing now in the logs it saying: Sep 6 10:17:45 Tower kernel: ata4: COMRESET failed (errno=-16) Sep 6 10:17:45 Tower kernel: ata1: COMRESET failed (errno=-16) Sep 6 10:17:45 Tower kernel: ata1: hard resetting link Sep 6 10:17:50 Tower kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320) Sep 6 10:17:50 Tower kernel: ata4.00: configured for UDMA/133 Sep 6 10:17:50 Tower kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320) rebuild has slowed down to 37MB/s from 240MB/s How do I narrow it down more as to what the problem is? I have changed the sata cables. It's plugged into the sataports on the motherboard a gigabyte 560 d3h The smart monior passes for both drives... tower-diagnostics-20220906-1825.zip Edited September 6, 2022 by dopeytree Quote Link to comment
JorgeB Posted September 6, 2022 Share Posted September 6, 2022 22 hours ago, dopeytree said: # Enable SATA link power management echo med_power_with_dipm | tee /sys/class/scsi_host/host*/link_power_management_policy This is known to cause SATA link issues with some boards, try disabling that. Quote Link to comment
dopeytree Posted September 6, 2022 Author Share Posted September 6, 2022 (edited) See post above have already removed that from the go config. Think your looking at old diagnostics See current diags from today 6th september here: https://forums.unraid.net/applications/core/interface/file/attachment.php?id=169834&key=91c67006a19731592e00a4e57b27a6e3 In the bios I have turned "Agressive Link Power Management" to disabled. Powertop is uninstalled. As above I removed the powertop commands from the go config file. I think there is a bug in the latest my servers plugin... see screenshot. Edited September 6, 2022 by dopeytree Quote Link to comment
JorgeB Posted September 6, 2022 Share Posted September 6, 2022 It that is removed it's possibly a power/connection problem, much less likely it could also be a board problem. Quote Link to comment
dopeytree Posted September 6, 2022 Author Share Posted September 6, 2022 Ok thanks think I'll try a BIOS refresh to make sure it is ok. Quote Link to comment
trurl Posted September 6, 2022 Share Posted September 6, 2022 1 hour ago, dopeytree said: Noticed after a restart it's changed the server name... from moulin-rouge to tower That suggests something happened to config/ident.cfg and it was recreated with defaults. What do you get from command line with this? ls -lah /boot Quote Link to comment
dopeytree Posted September 6, 2022 Author Share Posted September 6, 2022 (edited) As it stopped the parity re-build at 37% today.. I took the opportunity to fit the newer CPU which will give me access to the 2nd m2 SSD slot on the motherboard. I removed most plugins. Installed new 11th Gen CPU for 2nd m2 slot access. (its disabled with current 10th gen) Installed new CPU fan. Removed & reseated all power cables. Re-freshed the BIOS. With the bios refreshed the energy settings are all back to normal so basically unraid has no energy settings built in it seems to default to running everything at full wack even if the motherboard is set to 'auto' Every 3.0s: cpufreq-info | grep 'current CPU' Tower: Tue Sep 6 20:47:25 2022 current CPU frequency is 4.29 GHz. current CPU frequency is 4.42 GHz. current CPU frequency is 4.41 GHz. current CPU frequency is 4.27 GHz. current CPU frequency is 4.04 GHz. current CPU frequency is 4.40 GHz. current CPU frequency is 4.23 GHz. current CPU frequency is 3.48 GHz. current CPU frequency is 4.24 GHz. current CPU frequency is 4.40 GHz. current CPU frequency is 4.10 GHz. current CPU frequency is 3.98 GHz. current CPU frequency is 4.41 GHz. current CPU frequency is 4.40 GHz. current CPU frequency is 4.51 GHz. current CPU frequency is 4.51 GHz. Booted up drives are there now whereas before they wouldn't always appear. I have checked the .ident file and yes it's changed but not by me.... It has the time zone set for LA where as it should be London. using ls -lah /boot it gives the below: root@Tower:~# ls -lah /boot total 295M drwx------ 10 root root 16K Dec 31 1969 ./ drwxr-xr-x 19 root root 400 Jul 17 2021 ../ drwx------ 3 root root 16K Aug 23 18:33 .Spotlight-V100/ drwx------ 3 root root 16K Aug 25 14:36 .TemporaryItems/ drwx------ 3 root root 16K Aug 25 14:41 .Trashes/ -rw------- 1 root root 180 Aug 23 18:44 .gitattributes drwx------ 3 root root 16K Sep 6 2022 EFI/ -rw------- 1 root root 16K Sep 6 2022 FSCK0000.REC -rw------- 1 root root 16K Sep 6 2022 FSCK0001.REC -rw------- 1 root root 16K Sep 6 2022 FSCK0002.REC -rw------- 1 root root 16K Sep 6 2022 FSCK0003.REC -rw------- 1 root root 16K Sep 6 2022 FSCK0004.REC -rw------- 1 root root 32K Sep 6 2022 FSCK0005.REC -rw------- 1 root root 16K Sep 6 2022 FSCK0006.REC -rw------- 1 root root 16K Sep 6 2022 FSCK0007.REC -rw------- 1 root root 16K Sep 6 2022 FSCK0008.REC -rw------- 1 root root 16K Sep 6 2022 FSCK0009.REC -rw------- 1 root root 16K Sep 6 2022 FSCK0010.REC -rw------- 1 root root 16K Sep 6 2022 FSCK0011.REC -rw------- 1 root root 16K Sep 6 2022 FSCK0012.REC -rw------- 1 root root 16K Sep 6 2022 FSCK0013.REC -rw------- 1 root root 111M Sep 6 2022 bzfirmware -rw------- 1 root root 65 Sep 6 2022 bzfirmware.sha256 -rw------- 1 root root 5.9M Sep 6 2022 bzimage -rw------- 1 root root 65 Sep 6 2022 bzimage.sha256 -rw------- 1 root root 18M Sep 6 2022 bzmodules -rw------- 1 root root 65 Sep 6 2022 bzmodules.sha256 -rw------- 1 root root 135M Sep 6 2022 bzroot -rw------- 1 root root 26M Sep 6 2022 bzroot-gui -rw------- 1 root root 65 Sep 6 2022 bzroot-gui.sha256 -rw------- 1 root root 65 Sep 6 2022 bzroot.sha256 -rw------- 1 root root 30K Sep 6 2022 changes.txt drwx------ 11 root root 16K Sep 6 12:19 config/ -r-------- 1 root root 120K Aug 23 18:33 ldlinux.c32 -r-------- 1 root root 68K Aug 23 18:33 ldlinux.sys -rw------- 1 root root 7.8K Sep 6 2022 license.txt drwx------ 2 root root 16K Sep 6 2022 logs/ -rw------- 1 root root 1.8K Sep 6 2022 make_bootable.bat -rw------- 1 root root 3.3K Sep 6 2022 make_bootable_linux -rw------- 1 root root 2.4K Sep 6 2022 make_bootable_mac -rw------- 1 root root 147K Sep 6 2022 memtest drwx------ 2 root root 16K Sep 6 2022 preclear_reports/ drwx------ 2 root root 16K Sep 6 2022 syslinux/ Edited September 6, 2022 by dopeytree Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.