Jump to content

1,876,356,798 Parity Errors !?


Go to solution Solved by trurl,

Recommended Posts

Those are read errors, not parity sync errors, there are issues with both disks:

 

Sep  3 06:41:20 Moulin-Rouge kernel: ata3: link is slow to respond, please be patient (ready=0)
Sep  3 06:41:20 Moulin-Rouge kernel: ata1: link is slow to respond, please be patient (ready=0)
Sep  3 06:41:24 Moulin-Rouge kernel: ata3: COMRESET failed (errno=-16)
Sep  3 06:41:24 Moulin-Rouge kernel: ata3: hard resetting link
Sep  3 06:41:24 Moulin-Rouge kernel: ata1: COMRESET failed (errno=-16)
Sep  3 06:41:24 Moulin-Rouge kernel: ata1: hard resetting link
Sep  3 06:41:29 Moulin-Rouge kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Sep  3 06:41:29 Moulin-Rouge kernel: ata3.00: configured for UDMA/133
Sep  3 06:41:29 Moulin-Rouge kernel: ata3: EH complete
Sep  3 06:41:30 Moulin-Rouge kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep  3 06:41:30 Moulin-Rouge kernel: ata1.00: configured for UDMA/133
Sep  3 06:41:30 Moulin-Rouge kernel: ata1: EH complete

 

Do they share a power splitter or something?

Link to comment

It's now saying disk1 is disabled... but I can still run smart tests on it...

 

FT is the parity drive.

HG is disk1 drive.

 

Am just running an extended smart test on disk1 will upload that aswell once complete.

ST12000NE0008-1ZF101_ZTN0AZFT-20220903-1253.txt ST12000NE0008-1ZF101_ZTN0AZHG-20220903-1254.txt moulin-rouge-diagnostics-20220903-1303.zip

Edited by dopeytree
Link to comment

Is it possible to re-enable a drive or is that un-raids protective measure?

 

I am fairly confident the drives are good as brand new less than 10days old.

 

Is parity check smart enough to stop a drive spin down?? Is that what's caused this coupled with the devslip energy setting?

 

I saw that POWERTOP has been removed from the tips & tweaks plugin because it can interfere with some sata interfaces... is that whats going on?

Edited by dopeytree
Link to comment

Confirmed - can mount & read the disk outside the array just fine. 

 

Looks like I will have to waste hours getting it to rebuild the drive (Rebuilding a drive onto itself)

 

https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself

 

60538161_Screenshot2022-09-03at14_08_27.thumb.png.844f76eae3c7f64d9b9732c0a4ace3f4.png

 

How can I stop this happening again?

 

How does spin down settings work when mover or parity check are already running??

Edited by dopeytree
Link to comment

Can anyone answer the question asked.. can a drive be re-enabled?

 

There's nothing wrong with the cables, controller or power.

I think it must have spun down to a low power state and from there something has happened & unraid thinks a drives broken but it's not...

 

Anyway is there a way to bypass unraid's hissy fit?

 

Both drives pass all smart checks...

 

I can mount and access disk1. 

 

Its currently re-building but what a waste of time.

 

If unraid doesn't support DEVSLP it should be noted and recorded so that people don't enable it in their motherboards.

 

DevSlp or DevSleep is a feature in some SATA devices which allows them to go into a low power "device sleep" mode when sent the appropriate signal, which uses one or two orders of magnitude less power than a traditional idle. The feature was introduced by SanDisk in a partnership with Intel.

 

 

Screenshot 2022-09-03 at 17.31.13.png

Edited by dopeytree
Link to comment

Unraid disables a disk when a write to it fails, whether due to disk problem or, more commonly, connection problem. After a disk becomes disabled it isn't used again until rebuilt, because the failed write is emulated by parity and so the disk is out-of-sync with the array.

 

The failed write is emulated by updating parity, and after that, any access to the disk is emulated. Reads are emulated from the parity calculation by reading parity and all other disks, writes are emulated by updating parity.

 

That initial failed write, and any subsequent writes to the emulated disk, can be recovered by rebuilding the disk.

 

To get the array in sync again, you either have to rebuild the data disk, or rebuild parity. If you rebuild parity instead of rebuilding the data disk, you would lose all those emulated writes. It is even possible that a failed write could be filesytem metadata which would make it corrupt if not recovered.

Link to comment

Arg loosing my sh*t a bit here...

 

It's just completed the 13hour rebuild and was fine completed well. averaged 180MB/s.

 

I then rebooted as I noticed on the dashboard it wasn't displaying the CPU usage.

Now it says both disks are missing..

It won't do anything if I click to download the diagnostics.

Also it says bad gateway if I try to load the console..

I'm just attempting a boot via safe mode incase it is a plugin causing an issue.

607848115_Screenshot2022-09-04at20_03_17.thumb.png.6f327f3e069691c3e1f9158b3bb5d6d8.png

2124989611_Screenshot2022-09-04at20_20_12.thumb.png.70b19e5fe1a2ab262d6a8a5d95f5a18a.png

Edited by dopeytree
Link to comment

Oh for fecks sake. I told the server to sleep. and managed to wake it with wake on lan. But it didn't stop the array before sleeping so it's corrupted it again..

My fault for trusting a stupid sleep plugin.

 

I wish unraid was a bit more feature rich.. that way you wouldn't need the plugins.

 

Sleep & energy saving features should be build in.

 

I'm now exploring using ubuntu server with a ZFS pool.

 

I think if my drives were ZFS they would just repair themselves instead of doing a whole rebuild if anything goes mildly wrong..

Link to comment
9 minutes ago, dopeytree said:

Sleep & energy saving features should be build in.

Server hardware that has typically been used with Unraid is not designed with sleep in mind, it's designed to run 24/7/365 for years on end without hiccup. The extremely wide variety of hardware that can be used with Unraid means it's not possible to support sleep natively with any level of success, it's all dependent on the hardware combination in use.

 

You could have a perfectly sleep compatible system, add a server grade HBA, and suddenly sleep causes all sorts of issues because the HBA doesn't support it.

 

Instead of sleep, investigate safely shutting down then powering back up with WOL.

  • Like 1
Link to comment

Thanks & very true but there is an energy crisis & most homelab people are thinking about energy usage for past 6months or so...

 

I think I will just leave turned on 24/7.  

I got the cpu usage down in bios so it doesn't run at half tilt 2.8ghz while idle.

Now runs 800mhz.

It idles at 43watts which is about £0.30p a day.

 

 

Remembered that as part of the powertop forum thread it recommends some changes to the go config file.

 

So although tips & tweaks plugin removes the powertop package any modifications remain.

 

This was the file:

 



#!/bin/bash
# Start the Management Utility
/usr/local/sbin/emhttp &

# ------------------------------------------------
# Disables FTP & Telnet 
# ------------------------------------------------
sed -i -e 's/^telnet/#telnet/;s/^ftp/#ftp/' /etc/inetd.conf
/etc/rc.d/rc.inetd restart

# -------------------------------------------------
# Set power-efficient CPU governor
# -------------------------------------------------
/etc/rc.d/rc.cpufreq powersave

# -------------------------------------------------
# Wake On Lan Ethernet
# -------------------------------------------------

ethtool -s eth0 wol g

# -------------------------------------------------
# powertop tweaks
# -------------------------------------------------

# Enable SATA link power management
echo med_power_with_dipm | tee /sys/class/scsi_host/host*/link_power_management_policy

# Runtime PM for I2C Adapter (i915 gmbus dpb)
echo auto | tee /sys/bus/i2c/devices/i2c-*/device/power/control

# Autosuspend for USB device
echo auto | tee /sys/bus/usb/devices/*/power/control

# Runtime PM for disk
echo auto | tee /sys/block/sd*/device/power/control

# Runtime PM for PCI devices
echo auto | tee /sys/bus/pci/devices/????:??:??.?/power/control

# Runtime PM for ATA devices
echo auto | tee /sys/bus/pci/devices/????:??:??.?/ata*/power/control

 

If anyone else comes across these issues t's important to remove ALL the POWERTOP TWEAKS:

 

So my file now looks like this:

 



#!/bin/bash
# Start the Management Utility
/usr/local/sbin/emhttp &

# ------------------------------------------------
# Disables FTP & Telnet 
# ------------------------------------------------
sed -i -e 's/^telnet/#telnet/;s/^ftp/#ftp/' /etc/inetd.conf
/etc/rc.d/rc.inetd restart

# -------------------------------------------------
# Set power-efficient CPU governor
# -------------------------------------------------
/etc/rc.d/rc.cpufreq powersave

# -------------------------------------------------
# Wake On Lan Ethernet
# -------------------------------------------------

ethtool -s eth0 wol g

 

 

 

 

Am I right in thinking ZFS wouldn't need / take asl-ong to fix the drive after a bad shutdown etc?

Edited by dopeytree
Link to comment

Noticed after a restart it's changed the server name... from moulin-rouge to tower

 

Urghhh.. It's halfway through the re-build. 

Its at 37%.

 

Anyway right now noticing now in the logs it saying: 

 

Sep  6 10:17:45 Tower kernel: ata4: COMRESET failed (errno=-16)
Sep  6 10:17:45 Tower kernel: ata1: COMRESET failed (errno=-16)
Sep  6 10:17:45 Tower kernel: ata1: hard resetting link
Sep  6 10:17:50 Tower kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Sep  6 10:17:50 Tower kernel: ata4.00: configured for UDMA/133
Sep  6 10:17:50 Tower kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)

 

rebuild has slowed down to 37MB/s from 240MB/s

 

422408079_Screenshot2022-09-06at18_28_52.thumb.png.9dc72222bb4043e040ca3be4e986731c.png

 

How do I narrow it down more as to what the problem is?

 

I have changed the sata cables.

It's plugged into the sataports on the motherboard a gigabyte 560 d3h

The smart monior passes for both drives...

 

tower-diagnostics-20220906-1825.zip

Edited by dopeytree
Link to comment

See post above have already removed that from the go config.

Think your looking at old diagnostics

See current diags from today 6th september here: https://forums.unraid.net/applications/core/interface/file/attachment.php?id=169834&key=91c67006a19731592e00a4e57b27a6e3

 

 

 

In the bios I have turned "Agressive Link Power Management" to disabled.

 

Powertop is uninstalled.

 

As above I removed the powertop commands from the go config file.

 

I think there is a bug in the latest my servers plugin... see screenshot.

Screenshot 2022-09-06 at 18.55.49.png

Edited by dopeytree
Link to comment

As it stopped the parity re-build at 37% today.. I took the opportunity to fit the newer CPU which will give me access to the 2nd m2 SSD slot on the motherboard.

 

I removed most plugins.

Installed new 11th Gen CPU for 2nd m2 slot access. (its disabled with current 10th gen)

Installed new CPU fan.

Removed & reseated all power cables.

Re-freshed the BIOS.

 

With the bios refreshed the energy settings are all back to normal so basically unraid has no energy settings built in it seems to default to running everything at full wack even if the motherboard is set to 'auto' 

 

Every 3.0s: cpufreq-info | grep 'current CPU'                     Tower: Tue Sep  6 20:47:25 2022

  current CPU frequency is 4.29 GHz.
  current CPU frequency is 4.42 GHz.
  current CPU frequency is 4.41 GHz.
  current CPU frequency is 4.27 GHz.
  current CPU frequency is 4.04 GHz.
  current CPU frequency is 4.40 GHz.
  current CPU frequency is 4.23 GHz.
  current CPU frequency is 3.48 GHz.
  current CPU frequency is 4.24 GHz.
  current CPU frequency is 4.40 GHz.
  current CPU frequency is 4.10 GHz.
  current CPU frequency is 3.98 GHz.
  current CPU frequency is 4.41 GHz.
  current CPU frequency is 4.40 GHz.
  current CPU frequency is 4.51 GHz.
  current CPU frequency is 4.51 GHz.

 

Booted up drives are there now whereas before they wouldn't always appear.

 

I have checked the .ident file and yes it's changed but not by me....

 

It has the time zone set for LA where as it should be London.

 

using ls -lah /boot it gives the below:

 

root@Tower:~# ls -lah /boot
total 295M
drwx------ 10 root root  16K Dec 31  1969 ./
drwxr-xr-x 19 root root  400 Jul 17  2021 ../
drwx------  3 root root  16K Aug 23 18:33 .Spotlight-V100/
drwx------  3 root root  16K Aug 25 14:36 .TemporaryItems/
drwx------  3 root root  16K Aug 25 14:41 .Trashes/
-rw-------  1 root root  180 Aug 23 18:44 .gitattributes
drwx------  3 root root  16K Sep  6  2022 EFI/
-rw-------  1 root root  16K Sep  6  2022 FSCK0000.REC
-rw-------  1 root root  16K Sep  6  2022 FSCK0001.REC
-rw-------  1 root root  16K Sep  6  2022 FSCK0002.REC
-rw-------  1 root root  16K Sep  6  2022 FSCK0003.REC
-rw-------  1 root root  16K Sep  6  2022 FSCK0004.REC
-rw-------  1 root root  32K Sep  6  2022 FSCK0005.REC
-rw-------  1 root root  16K Sep  6  2022 FSCK0006.REC
-rw-------  1 root root  16K Sep  6  2022 FSCK0007.REC
-rw-------  1 root root  16K Sep  6  2022 FSCK0008.REC
-rw-------  1 root root  16K Sep  6  2022 FSCK0009.REC
-rw-------  1 root root  16K Sep  6  2022 FSCK0010.REC
-rw-------  1 root root  16K Sep  6  2022 FSCK0011.REC
-rw-------  1 root root  16K Sep  6  2022 FSCK0012.REC
-rw-------  1 root root  16K Sep  6  2022 FSCK0013.REC
-rw-------  1 root root 111M Sep  6  2022 bzfirmware
-rw-------  1 root root   65 Sep  6  2022 bzfirmware.sha256
-rw-------  1 root root 5.9M Sep  6  2022 bzimage
-rw-------  1 root root   65 Sep  6  2022 bzimage.sha256
-rw-------  1 root root  18M Sep  6  2022 bzmodules
-rw-------  1 root root   65 Sep  6  2022 bzmodules.sha256
-rw-------  1 root root 135M Sep  6  2022 bzroot
-rw-------  1 root root  26M Sep  6  2022 bzroot-gui
-rw-------  1 root root   65 Sep  6  2022 bzroot-gui.sha256
-rw-------  1 root root   65 Sep  6  2022 bzroot.sha256
-rw-------  1 root root  30K Sep  6  2022 changes.txt
drwx------ 11 root root  16K Sep  6 12:19 config/
-r--------  1 root root 120K Aug 23 18:33 ldlinux.c32
-r--------  1 root root  68K Aug 23 18:33 ldlinux.sys
-rw-------  1 root root 7.8K Sep  6  2022 license.txt
drwx------  2 root root  16K Sep  6  2022 logs/
-rw-------  1 root root 1.8K Sep  6  2022 make_bootable.bat
-rw-------  1 root root 3.3K Sep  6  2022 make_bootable_linux
-rw-------  1 root root 2.4K Sep  6  2022 make_bootable_mac
-rw-------  1 root root 147K Sep  6  2022 memtest
drwx------  2 root root  16K Sep  6  2022 preclear_reports/
drwx------  2 root root  16K Sep  6  2022 syslinux/

 

1555062197_Screenshot2022-09-06at20_31_25.thumb.png.7a9ae9a1595e50a9e6ed31a87c4cde77.png

 

Edited by dopeytree
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...