(solved again) Array has 5 disks with read errors.

Pjhal · May 18, 2021

As the title says.

Screen shot shows the discs, diagnostics logs included.

Also:

Wen using crusader i cannot access disk 3.

/mnt/disk 3

is somehow a file of zero bytes and not a folder

silverstone-diagnostics-20210518-2122.zip

I put Unraid in maintenance mode and ran a short smart test on Disk 3 it completed with no errors.

Disk 3 XFS check with -n

**********************************************************************************************************

    Phase 1 - find and verify superblock...
    Phase 2 - using internal log
            - zero log...
    ALERT: The filesystem has valuable metadata changes in a log which is being
    ignored because the -n option was used. Expect spurious inconsistencies
    which may be resolved by first mounting the filesystem to replay the log.
            - scan filesystem freespace and inode maps...
    sb_fdblocks 456165658, counted 458312794
            - found root inode chunk
    Phase 3 - for each AG...
            - scan (but don't clear) agi unlinked lists...
            - process known inodes and perform inode discovery...
            - agno = 0
            - agno = 1
            - agno = 2
            - agno = 3
            - agno = 4
            - agno = 5
            - agno = 6
            - agno = 7
            - process newly discovered inodes...
    Phase 4 - check for duplicate blocks...
            - setting up duplicate extent list...
            - check for inodes claiming duplicate blocks...
            - agno = 0
            - agno = 1
            - agno = 2
            - agno = 3
            - agno = 6
            - agno = 4
            - agno = 7
            - agno = 5
    No modify flag set, skipping phase 5
    Phase 6 - check inode connectivity...
            - traversing filesystem ...
            - traversal finished ...
            - moving disconnected inodes to lost+found ...
    Phase 7 - verify link counts...
    No modify flag set, skipping filesystem flush and exiting.

**********************************************************************************************************

Edit: i added the SMART diagnostics of all 5 disks with errors after running short SMART on all of them. Disk numbers appended( Disk 1 etc).

Disk 3 is also running the extensive SMART check atm.

WDC_WD80EMAZ-00W_7HKJT7EJ_35000cca257f1e771-20210518-2240 - Disk 3.txt

WDC_WD80EMAZ-00W_7HKJWUXJ_35000cca257f1f4f1-20210518-2244 - Disk 4.txt

WDC_WD80EZAZ-11T_2SG8U7JJ_35000cca27dc401ba-20210518-2245 Disk 7.txt

WDC_WD80EZAZ-11T_2SG9465F_35000cca27dc4271a-20210518-2244 Disk 6.txt

WDC_WD80EZAZ-11T_7HJJ6AVF_35000cca257e38cc8-20210518-2243 Disk 1.txt

What should i do ?

How bad is this?

Edited May 24, 2021 by Pjhal

JorgeB · May 19, 2021

Don't see any controller issues logged, so most likely a power/connection problem, power down the server, check all connections and power back up, array should be accessible after that.

Pjhal · May 19, 2021

13 hours ago, JorgeB said:

Don't see any controller issues logged, so most likely a power/connection problem, power down the server, check all connections and power back up, array should be accessible after that.

Thank you for your response.

I have rebooted, Unraid then reported zero errors. Then i started the array in maintenance mode, now doing a Parity check (read only).

After that ill try starting the array normally.

Pjhal · May 20, 2021

Oke it got worse i finished the Parity check with no errors and then tried to start the array normally now i have 6 unmountable Disks.

That is every Data Disk except Disk 5...

Edit: i included new diagnostics

silverstone-diagnostics-20210520-2253.zip

Edited May 20, 2021 by Pjhal

TechTitus · May 20, 2021

24 minutes ago, Pjhal said:

Oke it got worse i finished the Parity check with no errors and then tried to start the array normally now i have 6 unmountable Disks.

That is every Data Disk except Disk 5...

Edit: i included new diagnostics

silverstone-diagnostics-20210520-2253.zip 117.66 kB · 0 downloads

Same Issues I'm having. Are these shucked drives?

Edited May 20, 2021 by TechTitus

Pjhal · May 20, 2021

Yes they are, but the Disks them selves are fine according to SMART. This happened after upgrading to 6.9.2 and then downgrading again to 6.8.3. So i am hoping that it is just some limited file inconsistency. And not a mayor failure of hard drives or the whole array.

TechTitus · May 20, 2021

2 minutes ago, Pjhal said:

Yes they are, but the Disks them selves are fine according to SMART. This happened after upgrading to 6.9.2 and then downgrading again to 6.8.3. So i am hoping that it is just some limited file inconsistency. And not a mayor failure of hard drives or the whole array.

Yep, I'm having the exact same issue and UDMA CRC errors as well. I'm going to swap Power Supplies to see if it's a power issue.

JorgeB · May 21, 2021

Read errors on multiple disks:

May 20 22:48:29 Silverstone kernel: md: disk4 read error, sector=8
May 20 22:48:29 Silverstone kernel: md: disk4 read error, sector=16
May 20 22:48:29 Silverstone kernel: md: disk4 read error, sector=24
May 20 22:48:29 Silverstone kernel: md: disk7 read error, sector=8
May 20 22:48:29 Silverstone kernel: md: disk7 read error, sector=16
May 20 22:48:29 Silverstone kernel: md: disk7 read error, sector=24
May 20 22:48:29 Silverstone kernel: md: disk6 read error, sector=8
May 20 22:48:29 Silverstone kernel: md: disk6 read error, sector=16
May 20 22:48:29 Silverstone kernel: md: disk6 read error, sector=24
May 20 22:48:29 Silverstone kernel: Buffer I/O error on dev md1, logical block 0, async page read
### [PREVIOUS LINE REPEATED 1 TIMES] ###
May 20 22:48:29 Silverstone kernel: md: disk1 read error, sector=32
May 20 22:48:29 Silverstone kernel: md: disk1 read error, sector=40
May 20 22:48:29 Silverstone kernel: md: disk1 read error, sector=48

This is a likely a power, connection or controller problem.

Pjhal · May 21, 2021

5 hours ago, JorgeB said:

Read errors on multiple disks:




May 20 22:48:29 Silverstone kernel: md: disk4 read error, sector=8
May 20 22:48:29 Silverstone kernel: md: disk4 read error, sector=16
May 20 22:48:29 Silverstone kernel: md: disk4 read error, sector=24
May 20 22:48:29 Silverstone kernel: md: disk7 read error, sector=8
May 20 22:48:29 Silverstone kernel: md: disk7 read error, sector=16
May 20 22:48:29 Silverstone kernel: md: disk7 read error, sector=24
May 20 22:48:29 Silverstone kernel: md: disk6 read error, sector=8
May 20 22:48:29 Silverstone kernel: md: disk6 read error, sector=16
May 20 22:48:29 Silverstone kernel: md: disk6 read error, sector=24
May 20 22:48:29 Silverstone kernel: Buffer I/O error on dev md1, logical block 0, async page read
### [PREVIOUS LINE REPEATED 1 TIMES] ###
May 20 22:48:29 Silverstone kernel: md: disk1 read error, sector=32
May 20 22:48:29 Silverstone kernel: md: disk1 read error, sector=40
May 20 22:48:29 Silverstone kernel: md: disk1 read error, sector=48

This is a likely a power, connection or controller problem.

But this issue happened after downgrading from 6.92 back to 6.83 nothing else changed. I also read that some people had compatibility issues with the newer version.

I use a:

https://www.broadcom.com/products/storage/host-bus-adapters/sas-9300-8i

What can i do to fix this? I understand that it is hypothetically possible that my power supply failed or that it is a cable failure but it seems incredibly unlikely to me that this happens at the exact time that that i run into OS issues due to updating and downgrading my OS version.

Edit: oke i disconnected and reconnected the HBA and my array is back so maybe it was a badly plugged in connect?

Edited May 21, 2021 by Pjhal

JorgeB · May 21, 2021

32 minutes ago, Pjhal said:

But this issue happened after downgrading from 6.92 back to 6.83 nothing else changed.

It's still a hardware issue.

JorgeB · May 21, 2021

33 minutes ago, Pjhal said:

Edit: oke i disconnected and reconnected the HBA and my array is back so maybe it was a badly plugged in connect?

Missed the edit, possibly.

Pjhal · May 21, 2021

3 minutes ago, JorgeB said:

Missed the edit, possibly.

Thank you for your responses btw!

How should i handle the 22 errors that 6 Disks are reporting?

JorgeB · May 21, 2021

Rebooting will clear them.

Pjhal · May 21, 2021

3 hours ago, JorgeB said:

Rebooting will clear them.

As far as i can tell, everything seems to be normal and working again. thx again

Pjhal · May 21, 2021

5 hours ago, Pjhal said:

As far as i can tell, everything seems to be normal and working again. thx again

Well the errors are back again now on Disk 6 and 7.

silverstone-diagnostics-20210522-0008.zip

JorgeB · May 22, 2021

Still looks like a power/connection issue.

Pjhal · May 22, 2021

9 hours ago, JorgeB said:

Still looks like a power/connection issue.

Shutdown server, re plugged HBA and all Disks. Then started it up again.

After some time new errors

Quote

May 22 18:02:58 Silverstone kernel: mdcmd (58): spindown 7
May 22 18:09:15 Silverstone kernel: mdcmd (59): spindown 6
May 22 18:15:53 Silverstone kernel: sd 13:0:6:0: [sdh] tag#1409 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
May 22 18:15:53 Silverstone kernel: sd 13:0:6:0: [sdh] tag#1409 Sense Key : 0x5 [current]
May 22 18:15:53 Silverstone kernel: sd 13:0:6:0: [sdh] tag#1409 ASC=0x20 ASCQ=0x0
May 22 18:15:53 Silverstone kernel: sd 13:0:6:0: [sdh] tag#1409 CDB: opcode=0x88 88 00 00 00 00 01 0b b7 0b 50 00 00 00 08 00 00
May 22 18:15:53 Silverstone kernel: print_req_error: critical target error, dev sdh, sector 4491512656
May 22 18:15:53 Silverstone kernel: md: disk6 read error, sector=4491512592
May 22 18:15:53 Silverstone kernel: sd 13:0:5:0: [sdg] tag#1414 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
May 22 18:15:53 Silverstone kernel: sd 13:0:5:0: [sdg] tag#1414 Sense Key : 0x5 [current]
May 22 18:15:53 Silverstone kernel: sd 13:0:5:0: [sdg] tag#1414 ASC=0x20 ASCQ=0x0
May 22 18:15:53 Silverstone kernel: sd 13:0:5:0: [sdg] tag#1414 CDB: opcode=0x88 88 00 00 00 00 01 0b b7 0b 50 00 00 00 08 00 00
May 22 18:15:53 Silverstone kernel: print_req_error: critical target error, dev sdg, sector 4491512656
May 22 18:15:53 Silverstone kernel: md: disk7 read error, sector=4491512592

The weird thing that stands out to me is that the errors occur after the 2 disk happen to spin down. Could that be related?

Also if it is a hardware defect....I don't have a spare HBA, proper size power supply or SAS cable to do any testing (by swapping them out ) so i am at a loss as to how i should handle this right now.

Is there anything i can do?

silverstone-diagnostics-20210522-1828.zip

JorgeB · May 23, 2021

17 hours ago, Pjhal said:

Could that be related?

It could, though don't remember spinning issues with WDs, but try disabling spin down to see if it changes anything.

Pjhal · May 24, 2021

On 5/23/2021 at 11:39 AM, JorgeB said:

It could, though don't remember spinning issues with WDs, but try disabling spin down to see if it changes anything.

After disabling spin down on all disks and restarting the server it has now been running for 1d and 3 hours without any errors, so i am assuming it is fixed.

(solved again) Array has 5 disks with read errors.

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation