DanW Posted Monday at 11:06 PM Share Posted Monday at 11:06 PM (edited) Hey everyone, I started getting I/O errors on one of my drives. Noticed a load of my files suddenly disappear in my shares and went straight to the system log to see what was going on. I've ran the check on all 13 drives in maintenance mode and it's just the one playing up (disk 7) from what I can see. Any recommendations? Just run the check without -nv and see if it recovers the drive? I have two parity drives and I have a new drive spare that I could drop in to replace it. Some advice from someone who has experience in this area would be greatly appreciated, thank you check-nv.txt dansunraidnas-diagnostics-20230123-2257.zip Edited Monday at 11:07 PM by DanW Quote Link to comment
DanW Posted Tuesday at 07:51 AM Author Share Posted Tuesday at 07:51 AM I think I'm going to attempt to recover the file system then replace the drive later today. Quote Link to comment
DanW Posted Tuesday at 08:54 AM Author Share Posted Tuesday at 08:54 AM I got this when running the xfs_repair -v command. I didn't see anything about this in the instructions so i have no idea what to do next. Im going to just remove the drive and drop the new one in. Quote Link to comment
DanW Posted Tuesday at 07:56 PM Author Share Posted Tuesday at 07:56 PM (edited) I didn't attempt the recovery, I put a new drive in to replace this drive. Shortly after starting recovery, disk 8 reported an I/O error too and has been disabled. These drives are old and have a lot of uptime but seems to be a strange coincidence that they would both die together so soon. To rule out heat issues I've pointed fans at my SAS devices. I've also ordered some higher quality SAS cables. Going to be keeping an eye on the SAS controller & HBA, it has been fine for months and my drives are old, so could just be a coincidence. I'm currently using the following SAS devices: IBM SAS HBA M1015 IT Mode 6Gbps PCI-e 2.0 x8 LSI 9220-8i Intel 24 port 6 Gb/s SATA SAS RAID Expander Card PBA E91267-203 RES2SV240 Edited Tuesday at 08:04 PM by DanW Added more info Quote Link to comment
JorgeB Posted Wednesday at 08:59 AM Share Posted Wednesday at 08:59 AM Replacing the disk won't help with the filesystem problem. Quote Link to comment
DanW Posted Wednesday at 05:44 PM Author Share Posted Wednesday at 05:44 PM (edited) 8 hours ago, JorgeB said: Replacing the disk won't help with the filesystem problem. Really? I've replaced the disk (disk 7) and the disk has been rebuilt without any issues. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. I've got to replace disk 8 now as it failed during the rebuild of disk 7, lucky I had two parity drives. Edited Wednesday at 05:47 PM by DanW Quote Link to comment
Ronan C Posted Wednesday at 05:51 PM Share Posted Wednesday at 05:51 PM 6 minutes ago, DanW said: Really? I've replaced the disk (disk 7) and the disk has been rebuilt without any issues. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. I've got to replace disk 8 now as it failed during the rebuild of disk 7, lucky I had two parity drives. Hello! Did you replaced the drive 8? whats happen after? Thanks Quote Link to comment
DanW Posted Wednesday at 05:53 PM Author Share Posted Wednesday at 05:53 PM 1 minute ago, Ronan C said: Hello! Did you replaced the drive 8? whats happen after? Thanks I haven't replaced disk 8 yet (I replaced disk 7 first as it was the initial problem disk with filesystem corruption), I'm going to change some SAS cables and insert a new drive to replace disk 8 then start the rebuild soon. I will provide updates. Quote Link to comment
JorgeB Posted Wednesday at 06:46 PM Share Posted Wednesday at 06:46 PM 1 hour ago, DanW said: I've replaced the disk (disk 7) and the disk has been rebuilt without any issues. This suggests parity wasn't 100% in sync, or else it would be rebuilt exactly as it was, including the filesytem issues, parity is just bits. Quote Link to comment
DanW Posted Wednesday at 06:49 PM Author Share Posted Wednesday at 06:49 PM 3 minutes ago, JorgeB said: This suggests parity wasn't 100% in sync, or else it would be rebuilt exactly as it was, including the filesytem issues, parity is just bits. Interesting, is that an issue? Quote Link to comment
JorgeB Posted Wednesday at 06:54 PM Share Posted Wednesday at 06:54 PM Since the disk was rebuilt according to current parity a parity check now should not find any errors, but the data is OK so I wouldn't worry much about if for now. 1 Quote Link to comment
DanW Posted Wednesday at 06:58 PM Author Share Posted Wednesday at 06:58 PM 1 minute ago, JorgeB said: Since the disk was rebuilt according to current parity a parity check now should not find any errors, but the data is OK so I wouldn't worry much about if for now. Thank you for your help 🙂 I really appreciate your knowledge and suggestions. I'm going to go ahead and replace disk 8 now and rebuild it hopefully without any more issues 🤞 1 Quote Link to comment
DanW Posted Wednesday at 09:18 PM Author Share Posted Wednesday at 09:18 PM (edited) 3 hours ago, Ronan C said: Hello! Did you replaced the drive 8? whats happen after? Thanks Disk 8 is rebuilding, no other issues so far. Edited Wednesday at 09:19 PM by DanW Quote Link to comment
trurl Posted Wednesday at 10:21 PM Share Posted Wednesday at 10:21 PM Can't really see if the data is OK or not, no new diagnostics have been posted and all screenshots are clipped on the right so can't tell if the disk is unmountable. Quote Link to comment
DanW Posted Wednesday at 11:51 PM Author Share Posted Wednesday at 11:51 PM (edited) 1 hour ago, trurl said: Can't really see if the data is OK or not, no new diagnostics have been posted and all screenshots are clipped on the right so can't tell if the disk is unmountable. Apologies, please see attached. Rebuild of disk 8, the second disk to fail, is still underway. The array is live, the data that originally disappeared (when the first error with disk 7 occurred) is back, which is really positive. Disk 8 was emulated immediately when it failed so I didn't notice any data loss the second time. dansunraidnas-diagnostics-20230125-2349.zip Edited yesterday at 12:01 AM by DanW Quote Link to comment
Ronan C Posted 10 hours ago Share Posted 10 hours ago Nice news! glad you get there! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.