Disk Read Error No Smart Errors (SOLVED)

Resident_IT · March 21

My disk is throwing a read error and i can't clear it. I am on Unraid Version 6.12.8 2024-02-15. I am using a HPE DL360 gen 9 with dual E5-2660 v3's and 96gb of ddr4 ECC ram. i have tried a extended smart self test and no error's occurred. i have attached a sys-log a diagnostic file and the smart report. the error happened Mar 21 15:23:59. I have tried multiple reboots also.

tower-diagnostics-20240321-1531.zip tower-smart-20240321-1529.zip tower-syslog-20240321-2218.zip

Edited March 22 by Resident_IT

trurl · March 21

34 minutes ago, Resident_IT said:

attached a sys-log ... and the smart report

Already included in diagnostics.

Mar 21 15:12:32 Tower kernel: sd 1:0:6:0: [sdg] tag#545 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=DRIVER_OK cmd_age=0s
Mar 21 15:12:32 Tower kernel: sd 1:0:6:0: [sdg] tag#545 Sense Key : 0xb [current] 
Mar 21 15:12:32 Tower kernel: sd 1:0:6:0: [sdg] tag#545 ASC=0x0 ASCQ=0x0 
Mar 21 15:12:32 Tower kernel: sd 1:0:6:0: [sdg] tag#545 CDB: opcode=0x2a 2a 08 3a 38 37 30 00 00 08 00
Mar 21 15:12:32 Tower kernel: I/O error, dev sdg, sector 976762672 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 2
Mar 21 15:12:32 Tower kernel: md: disk1 write error, sector=976762608
Mar 21 15:12:32 Tower kernel: XFS (md1p1): log I/O error -5
Mar 21 15:12:32 Tower kernel: XFS (md1p1): Filesystem has been shut down due to log error (0x2).
Mar 21 15:12:32 Tower kernel: XFS (md1p1): Please unmount the filesystem and rectify the problem(s).

This would have disabled disk1 if you had parity. Possibly it is unmountable now.

Mar 21 15:24:37 Tower root: Fix Common Problems: Error: disk1 (ST1000LM010-9YH146_Z1008ZGE) has read errors ** Ignored
Mar 21 15:24:46 Tower root: Failed to write /mnt/disk1/33297349.tmp
Mar 21 15:24:46 Tower root: Fix Common Problems: Error: Unable to write to disk1

Seems it is at least readonly.

Why would you set FCP to ignore read errors???

Reboot and post new diagnostics.

trurl · March 21

1 minute ago, trurl said:

This would have disabled disk1 if you had parity

Possibly since it couldn't become disabled, it became corrupted instead.

Resident_IT · March 21

10 minutes ago, trurl said:

Reboot and post new diagnostics.

here is the new diagnostic

tower-diagnostics-20240321-1630.zip

trurl · March 22

Still seeing the same in these except FCP hadn't done its scan yet.

Check Filesystem on disk1. Post the output.

Resident_IT · March 22

3 hours ago, trurl said:

Check Filesystem on disk1. Post the output.

output of check fillesystem

Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 3 - agno = 2 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting.

JorgeB · March 22

Run it again without -n, or see if the filesystem still mounts, it may.

Resident_IT · March 22

5 hours ago, JorgeB said:

Run it again without -n, or see if the filesystem still mounts, it may.

i tried it nothing changed. this is the output.

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 3
        - agno = 1
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

the drive was used and is it possible that the drive is dead?

should i reboot the server?

Edited March 22 by Resident_IT

Resident_IT · March 22

5 minutes ago, Resident_IT said:

the drive was used and is it possible that the drive is dead?

i have no date on the drive so it is not a big deal.

JorgeB · March 22

Post new diags after array start.

Resident_IT · March 22

10 minutes ago, JorgeB said:

Post new diags after array start.

done

tower-diagnostics-20240322-0914.zip

JorgeB · March 22

Mar 22 08:31:13 Tower kernel: md: disk1 write error, sector=976762608

Still write errors, SMART looks OK, but it can still be failing, swap both cables with a different disk and re-start the array, if the same the disk is likely failing.

Resident_IT · March 22

10 minutes ago, JorgeB said:

Still write errors, SMART looks OK, but it can still be failing, swap both cables with a different disk and re-start the array, if the same the disk is likely failing.

still same thing uploading diagnostics for you if you want but i think the drive is definitely dead.

tower-diagnostics-20240322-1017.zip

JorgeB · March 22

43 minutes ago, Resident_IT said:

i think the drive is definitely dead.

Looks like it is.

Disk Read Error No Smart Errors (SOLVED)

Recommended Posts

Resident_IT

Link to comment

trurl

Link to comment

trurl

Link to comment

Resident_IT

Link to comment

trurl

Link to comment

Resident_IT

Link to comment

JorgeB

Link to comment

Resident_IT

Link to comment

Resident_IT

Link to comment

JorgeB

Link to comment

Resident_IT

Link to comment

JorgeB

Link to comment

Resident_IT

Link to comment

JorgeB

Link to comment

Join the conversation