MotherBoard Failure - Replaced - Parity Errors

snuffy47 · August 28, 2021

Howdy all

Well I had a motherboard board taken out in a power outage My fault my UPS was not working - dead battery.....

All is fixed now however my parity check hit 3200+ errors first run and now came back with 200+

Looking for some help :)

tower-diagnostics-20210828-1226.zip

JorgeB · August 29, 2021

First need to fix this:

Aug 25 21:21:13 Tower kernel: md: disk2 read error, sector=54793824
Aug 25 21:21:13 Tower kernel: md: disk2 read error, sector=54793832
Aug 25 21:21:13 Tower kernel: md: disk2 read error, sector=54793840
Aug 25 21:21:13 Tower kernel: md: disk2 read error, sector=54793848

Disk appears to be failing, extended SMART test will confirm.

snuffy47 · August 29, 2021

Think smart test completed.

tower-smart-20210829-1204.zip

trurl · August 29, 2021

# 1  Extended offline    Aborted by host               80%     46043         -

Disable spindown on that disk and run again.

snuffy47 · August 30, 2021

Howdy

Well that took some time but it completed - without errors is what it indicated

Help is always appreciated

tower-smart-20210830-1549.zip

trurl · August 30, 2021

Still, that disk does have an attribute (3, spin up time, not usually monitored) with something in the FAIL column. And it has a pending sector. And it is over 5 years old. And you are having problems caused by the disk.

I think I would retire it.

trurl · August 30, 2021

From your earlier diagnostics, it looks like that disk2 (and disk8) are empty or mostly so. Is that expected?

snuffy47 · August 30, 2021

Howdy

You are correct regarding the 2 disks. A while back I upgraded some equipment and disks. They are associated with 2 other discs but the high water setting has not started using them again.

If I was going to do anything with Disk 2 currently I would like to just remove it out of the system.

I have not seen anything in performance that I can think of to date though my response time for Plex Movies seemed to be slow last few days but figured that was the Parity Checks and SmaRT Test causing some of that...

Happy to provide further details

trurl · August 30, 2021

All bits of all other disks must be reliably read to reliably rebuild a disk, so all disks in the array are important whether they have anything on them or not.

1 minute ago, snuffy47 said:

If I was going to do anything with Disk 2 currently I would like to just remove it out of the system.

https://wiki.unraid.net/Manual/Storage_Management#Removing_data_disk.28s.29

snuffy47 · August 30, 2021

Turl

Well I have a disk I can replace it with so will go that route . Before I do is there anything else I should complete prior to this?

trurl · August 30, 2021

Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? It is important to take care of any problem immediately so you don't get multiple problems that you may not easily or fully recover from.

snuffy47 · August 30, 2021

I get email notifications just not sure how well I monitor that though

Do you have a recommendation as I am sure it will run over the evening when I change it.

trurl · August 30, 2021

39 minutes ago, snuffy47 said:

when I change it

Make sure you double check connections. That is the main reason people have problems with replace/rebuild.

snuffy47 · September 2, 2021

In Painic mode.....

Have not opened up box or changed anything yet

What I started was to SMART Test disc 1 and disk 3 with the intentions of testing all my drives and posting. These are attached though my server started a schedule parity check that I forgot to turn off

Disk 3 is spitting out a ton of Raw Read Errors

I am wondering if there is something lose ;( The 1 item I should note I do have a 5 disk 5.25" insert. I was thinking I should emliminate this out of the system as it is old but I am 1 slot shy in my server to do that

Going to back things up now guess go from there

RAID Forums.zip tower-smart-20210902-1327 (1).zip tower-smart-20210902-1327.zip

Edited September 3, 2021 by snuffy47

snuffy47 · September 3, 2021

Update

Not sure if it was the correct approach but I had 2 disks that would allow me to back up lets say would rather not lose files but would rather not... Crazy part is 3/4 of my data is media that falls under that category. My do not want to ever lose I keep a external back up already

Order some new drives if replacement is required also as backing up files used my spares. Like I said maybe not the best idea but still not sure what is causing all the problems

The RAW error reads have seemed to stop now that I canceled the parity check

General Plan at this point and hoping if I am off track that the more experienced may correct

1. Back up Data

2. Check Connections on every thing and run a few days. Struggling with this as I feel there is failing hardware but maybe not

3. If problem continues change drive 3 - Hope it rebuilds - run for a few days

4. Have not went past that -

trurl · September 4, 2021

Serial Number:    S2H7J1BZB15127
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   001   001   051    NOW  41013

The SMART attributes are internal to the disks. Each disk monitors how well it is working as it is used, and records that in these attributes it keeps in its firmware. Unraid monitors some of these attributes by default and will warn you about those it monitors, but Unraid is only looking at some of them, and Unraid only reports exactly what the disk is telling it.

If the disk says it is failing, believe it.

trurl · September 4, 2021

Serial Number:    ZCT0P3KA
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    976

This one has way too many reallocated for my comfort. And this is one of the attributes Unraid monitors by default, so it should have been telling you about it.

On 8/30/2021 at 4:24 PM, trurl said:

All bits of all other disks must be reliably read to reliably rebuild a disk, so all disks in the array are important whether they have anything on them or not.

You have single parity, and at least 2 disks that can't be trusted. That is one more untrustworthy disks than you have parity. If you try to replace/rebuild one of these, the other may make the rebuild unreliable or impossible.

Copying important data off the array is probably the first priority.

Do any of your other disks show SMART warnings on the Dashboard page?

trurl · September 4, 2021

On 8/30/2021 at 4:01 PM, trurl said:

Still, that disk does have an attribute (3, spin up time, not usually monitored) with something in the FAIL column. And it has a pending sector. And it is over 5 years old. And you are having problems caused by the disk.

I think I would retire it.

Forgot about that one. So that is at least 3 unreliable disks in your array. How did you let things get this bad?

snuffy47 · September 4, 2021

Well arent I in a mess

I wasnt having any problems up to the power outage knocking my MOBO out but guess need to learn how to read the HD counts better. You get use to no problems and then you ignore things

The data back up will take another day

Do I toss a dart at the wall and try replacing Disk 3 first.. The 2 TB drives and 8TB I ordered wonnt be here till mid week

Edited September 4, 2021 by snuffy47

trurl · September 4, 2021

That screenshot shows that on 2 of your disks, some of the attributes that Unraid monitors aren't good. So Unraid has been warning you about them.

The other disk isn't showing its problems because the attribute that shows it is failing isn't normally monitored by Unraid.

snuffy47 · September 7, 2021

Well I can not win for losing on this one....

Just had another crazy storm roll through and I am having problems Mounting my external drive I was using to back up some of my data

the messages I get are

Quote

Sep 7 18:56:19 Tower unassigned.devices: Adding disk '/dev/sdn1'...
Sep 7 18:56:19 Tower unassigned.devices: Mount drive command: /sbin/mount -t 'ntfs' -o rw,auto,async,noatime,nodiratime,nodev,nosuid,nls=utf8,umask=000 '/dev/sdn1' '/mnt/disks/ST4000DM004-2CV104_ZFN3XMJZ'
Sep 7 18:56:19 Tower unassigned.devices: Mount of '/dev/sdn1' failed: '$MFTMirr does not match $MFT (record 0). Failed to mount '/dev/sdn1': Input/output error NTFS is either inconsistent, or there is a hardware fault, or it's a SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows then reboot into Windows twice. The usage of the /f parameter is very important! If the device is a SoftRAID/FakeRAID then first activate it and mount a different device under the /dev/mapper/ directory, (e.g. /dev/mapper/nvidia_eahaabcc1). Please see the 'dmraid' documentation for more details. '

Any help is appreciated

trurl · September 7, 2021

9 minutes ago, snuffy47 said:

my external drive

Is it NTFS? If so, put it in Windows and see it it can fix it.

snuffy47 · September 10, 2021

Well your suggestion worked

My drives are in hand I am going to exchange 3 now

One thing that popped up that I have never seen before is this which guess it means things are really messed up

Quote

Event: unRAID file corruption
Subject: Notice [TOWER] - bunker verify command
Description: Found 7 files with BLAKE2 hash key corruption
Importance: alert

BLAKE2 hash key mismatch, /mnt/disk3/movies/A-X-L (2018)/A-X-L (2018).mkv is corrupted
BLAKE2 hash key mismatch, /mnt/disk3/movies/50 50 (2011)/50 50 (2011).mkv is corrupted
BLAKE2 hash key mismatch, /mnt/disk3/movies/Aquaman (2018)/Aquaman (2018).mkv is corrupted
BLAKE2 hash key mismatch, /mnt/disk3/movies/Postcards from the Edge (1990)/Postcards from the Edge (1990).mkv is corrupted
BLAKE2 hash key mismatch, /mnt/disk3/movies/Legend of Tarzan, The (2016)/The Legend of Tarzan2016.ISO is corrupted
BLAKE2 hash key mismatch, /mnt/disk3/movies/Mandy (2018)/Mandy (2018).avi is corrupted
BLAKE2 hash key mismatch, /mnt/disk3/movies/The Chronicles of Narnia Prince Caspian (2008)/The Chronicles of Narnia Prince Caspian (2008).mkv is corrupted

snuffy47 · September 10, 2021

Well things are rebuilding.....

Watching the drive I am very unsure why this is so high - its what the old drive indicated as a failure mode but the new drive is not flagging anything

#    ATTRIBUTE NAME    FLAG    VALUE    WORST    THRESHOLD    TYPE    UPDATED    FAILED    RAW VALUE
1    Raw read error rate    0x000f    079    076    006    Pre-fail    Always    Never    74902826

trurl · September 11, 2021

I edited your post and put that SMART into a code block instead of a quote block as you had it. Now it lines up under the headings. Something to consider for future posts.

What model is that disk?

Post new diagnostics

MotherBoard Failure - Replaced - Parity Errors

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation