Disk Read Error No Smart Errors (SOLVED)


Go to solution Solved by JorgeB,

Recommended Posts

My disk is throwing a read error and i can't clear it. I am on Unraid Version 6.12.8 2024-02-15. I am using a HPE DL360 gen 9 with dual E5-2660 v3's and 96gb of ddr4 ECC ram. i have tried a extended smart self test and no error's occurred. i have attached a sys-log a diagnostic file and the smart report. the error happened Mar 21 15:23:59. I have tried multiple reboots also.

 

tower-diagnostics-20240321-1531.zip tower-smart-20240321-1529.zip tower-syslog-20240321-2218.zip

Edited by Resident_IT
Link to comment
34 minutes ago, Resident_IT said:

attached a sys-log ... and the smart report

Already included in diagnostics.

 

Mar 21 15:12:32 Tower kernel: sd 1:0:6:0: [sdg] tag#545 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=DRIVER_OK cmd_age=0s
Mar 21 15:12:32 Tower kernel: sd 1:0:6:0: [sdg] tag#545 Sense Key : 0xb [current] 
Mar 21 15:12:32 Tower kernel: sd 1:0:6:0: [sdg] tag#545 ASC=0x0 ASCQ=0x0 
Mar 21 15:12:32 Tower kernel: sd 1:0:6:0: [sdg] tag#545 CDB: opcode=0x2a 2a 08 3a 38 37 30 00 00 08 00
Mar 21 15:12:32 Tower kernel: I/O error, dev sdg, sector 976762672 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 2
Mar 21 15:12:32 Tower kernel: md: disk1 write error, sector=976762608
Mar 21 15:12:32 Tower kernel: XFS (md1p1): log I/O error -5
Mar 21 15:12:32 Tower kernel: XFS (md1p1): Filesystem has been shut down due to log error (0x2).
Mar 21 15:12:32 Tower kernel: XFS (md1p1): Please unmount the filesystem and rectify the problem(s).

This would have disabled disk1 if you had parity. Possibly it is unmountable now.

 

Mar 21 15:24:37 Tower root: Fix Common Problems: Error: disk1 (ST1000LM010-9YH146_Z1008ZGE) has read errors ** Ignored
Mar 21 15:24:46 Tower root: Failed to write /mnt/disk1/33297349.tmp
Mar 21 15:24:46 Tower root: Fix Common Problems: Error: Unable to write to disk1

Seems it is at least readonly.

 

Why would you set FCP to ignore read errors???

 

Reboot and post new diagnostics.

Link to comment
3 hours ago, trurl said:

Check Filesystem on disk1. Post the output.

output of check fillesystem

 

Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 3 - agno = 2 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting.

Link to comment
Posted (edited)
5 hours ago, JorgeB said:

Run it again without -n, or see if the filesystem still mounts, it may.

i tried it nothing changed. this is the output.

 

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 3
        - agno = 1
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

 

 

the drive was used and is it possible that the drive is dead? 

should i reboot the server?

Edited by Resident_IT
Link to comment
  • Solution
Mar 22 08:31:13 Tower kernel: md: disk1 write error, sector=976762608


Still write errors, SMART looks OK, but it can still be failing, swap both cables with a different disk and re-start the array, if the same the disk is likely failing.

Link to comment
  • Resident_IT changed the title to Disk Read Error No Smart Errors (SOLVED)

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.