Resident_IT Posted March 21 Share Posted March 21 (edited) My disk is throwing a read error and i can't clear it. I am on Unraid Version 6.12.8 2024-02-15. I am using a HPE DL360 gen 9 with dual E5-2660 v3's and 96gb of ddr4 ECC ram. i have tried a extended smart self test and no error's occurred. i have attached a sys-log a diagnostic file and the smart report. the error happened Mar 21 15:23:59. I have tried multiple reboots also. tower-diagnostics-20240321-1531.zip tower-smart-20240321-1529.zip tower-syslog-20240321-2218.zip Edited March 22 by Resident_IT Quote Link to comment
trurl Posted March 21 Share Posted March 21 34 minutes ago, Resident_IT said: attached a sys-log ... and the smart report Already included in diagnostics. Mar 21 15:12:32 Tower kernel: sd 1:0:6:0: [sdg] tag#545 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=DRIVER_OK cmd_age=0s Mar 21 15:12:32 Tower kernel: sd 1:0:6:0: [sdg] tag#545 Sense Key : 0xb [current] Mar 21 15:12:32 Tower kernel: sd 1:0:6:0: [sdg] tag#545 ASC=0x0 ASCQ=0x0 Mar 21 15:12:32 Tower kernel: sd 1:0:6:0: [sdg] tag#545 CDB: opcode=0x2a 2a 08 3a 38 37 30 00 00 08 00 Mar 21 15:12:32 Tower kernel: I/O error, dev sdg, sector 976762672 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 2 Mar 21 15:12:32 Tower kernel: md: disk1 write error, sector=976762608 Mar 21 15:12:32 Tower kernel: XFS (md1p1): log I/O error -5 Mar 21 15:12:32 Tower kernel: XFS (md1p1): Filesystem has been shut down due to log error (0x2). Mar 21 15:12:32 Tower kernel: XFS (md1p1): Please unmount the filesystem and rectify the problem(s). This would have disabled disk1 if you had parity. Possibly it is unmountable now. Mar 21 15:24:37 Tower root: Fix Common Problems: Error: disk1 (ST1000LM010-9YH146_Z1008ZGE) has read errors ** Ignored Mar 21 15:24:46 Tower root: Failed to write /mnt/disk1/33297349.tmp Mar 21 15:24:46 Tower root: Fix Common Problems: Error: Unable to write to disk1 Seems it is at least readonly. Why would you set FCP to ignore read errors??? Reboot and post new diagnostics. Quote Link to comment
trurl Posted March 21 Share Posted March 21 1 minute ago, trurl said: This would have disabled disk1 if you had parity Possibly since it couldn't become disabled, it became corrupted instead. Quote Link to comment
Resident_IT Posted March 21 Author Share Posted March 21 10 minutes ago, trurl said: Reboot and post new diagnostics. here is the new diagnostic tower-diagnostics-20240321-1630.zip Quote Link to comment
trurl Posted March 22 Share Posted March 22 Still seeing the same in these except FCP hadn't done its scan yet. Check Filesystem on disk1. Post the output. Quote Link to comment
Resident_IT Posted March 22 Author Share Posted March 22 3 hours ago, trurl said: Check Filesystem on disk1. Post the output. output of check fillesystem Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 3 - agno = 2 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. Quote Link to comment
JorgeB Posted March 22 Share Posted March 22 Run it again without -n, or see if the filesystem still mounts, it may. Quote Link to comment
Resident_IT Posted March 22 Author Share Posted March 22 (edited) 5 hours ago, JorgeB said: Run it again without -n, or see if the filesystem still mounts, it may. i tried it nothing changed. this is the output. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 3 - agno = 1 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done the drive was used and is it possible that the drive is dead? should i reboot the server? Edited March 22 by Resident_IT Quote Link to comment
Resident_IT Posted March 22 Author Share Posted March 22 5 minutes ago, Resident_IT said: the drive was used and is it possible that the drive is dead? i have no date on the drive so it is not a big deal. Quote Link to comment
Resident_IT Posted March 22 Author Share Posted March 22 10 minutes ago, JorgeB said: Post new diags after array start. done tower-diagnostics-20240322-0914.zip Quote Link to comment
Solution JorgeB Posted March 22 Solution Share Posted March 22 Mar 22 08:31:13 Tower kernel: md: disk1 write error, sector=976762608 Still write errors, SMART looks OK, but it can still be failing, swap both cables with a different disk and re-start the array, if the same the disk is likely failing. Quote Link to comment
Resident_IT Posted March 22 Author Share Posted March 22 10 minutes ago, JorgeB said: Still write errors, SMART looks OK, but it can still be failing, swap both cables with a different disk and re-start the array, if the same the disk is likely failing. still same thing uploading diagnostics for you if you want but i think the drive is definitely dead. tower-diagnostics-20240322-1017.zip Quote Link to comment
JorgeB Posted March 22 Share Posted March 22 43 minutes ago, Resident_IT said: i think the drive is definitely dead. Looks like it is. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.