June 6, 20179 yr Hi Guys, Starting a new thread, as my original post was about one redballed drive, and things seem to have spiraled. 1 Drive is Red Balled 2 Drives are Unmountable All 3 drives are on the same SAS card, but across two sets of breakout cables I've tried to run XFS check on the unmountable drives, but they both come back with the following error: Phase 1 - find and verify superblock... superblock read failed, offset 0, size 524288, ag 0, rval -1 fatal error -- Input/output error I've tried plugging working SATA cables into the drives -- cables that are working fine going directly to MOBO -- but the XFS check comes back the same. I've ordered new breakout cables for both slots on my card, just in case. I will have them tomorrow. Not sure how to proceed Diagnostics attached tower-diagnostics-20170606-1328.zip Edited June 6, 20179 yr by newoski
June 6, 20179 yr Community Expert 12 minutes ago, newoski said: I've tried to run XFS check on the unmountable drives Did you do this from the webUI or from the command line? If from the command line, what was the exact command?
June 6, 20179 yr Community Expert There's something strange going here: Jun 6 13:13:04 Tower kernel: Buffer I/O error on dev md11, logical block 1953506608, async page read Jun 6 13:13:04 Tower kernel: Buffer I/O error on dev md12, logical block 1953506608, async page read Jun 6 13:13:05 Tower kernel: Buffer I/O error on dev md18, logical block 1953506608, async page read Like these 3 disks are not online, can you reboot and post new diags?
June 6, 20179 yr Author Welcome to my nightmare! New Diagnostics attached, following reboot tower-diagnostics-20170606-1404.zip
June 6, 20179 yr Community Expert OK, I understand what's happening now, both disk 11 and 12 where disable at some point in the past, and since disk18 is also disable you have 3 invalid disks that can't be emulated by dual parity, hence the errors.
June 6, 20179 yr Author Just now, johnnie.black said: OK, I understand what's happening now, both disk 11 and 12 where disable at some point in the past, and since disk18 is also disable you have 3 invalid disks that can't be emulated by dual parity, hence the errors. What order or operations would recommend to rectify?
June 6, 20179 yr Community Expert You best bet is doing a new config, I assume disk18 was the 1st disk to get disable. -Tools -> New Config -assign all disks, previous disk order needs to be maintained, double check all disks are in the correct slots-check both "parity is already valid" and "maintenance mode" before starting the array -start the array -stop array, unassign disk18 -start array, check emulated disk18 mounts and contents look correct (check that disks 11 and 12 mount also) -if all looks good, stop array, reassign disk18 -start array to begin rebuild
June 6, 20179 yr Author Shou 1 minute ago, johnnie.black said: You best bet is doing a new config, I assume disk18 was the 1st disk to get disable. -Tools -> New Config -assign all disks, previous disk order needs to be maintained, double check all disks are in the correct slots-check both "parity is already valid" and "maintenance mode" before starting the array -start the array -stop array, unassign disk18 -start array, check emulated disk18 mounts and contents look correct (check that disks 11 and 12 mount also) -if all looks good, stop array, reassign disk18 -start array to begin rebuild I should wait until I get the new SATA breakout cables, and replace those first, correct? (THANK YOU)
June 6, 20179 yr Community Expert If you don't know why disks11 and 12 got disable (like touching a cable, etc) probably best.
June 6, 20179 yr Author 6 minutes ago, johnnie.black said: If you don't know why disks11 and 12 got disable (like touching a cable, etc) probably best. How would new config help with the drives being unmountable? (I believe cables got touches, most likely)
June 6, 20179 yr Community Expert 10 minutes ago, johnnie.black said: OK, I understand what's happening now, both disk 11 and 12 where disable at some point in the past, and since disk18 is also disable you have 3 invalid disks that can't be emulated by dual parity, hence the errors. So shouldn't that really be 3 "redballed" disks instead of one? newoski, do you have Notifications setup? Even with dual parity you should attend to problems immediately and not let them accumulate.
June 6, 20179 yr Author Just now, trurl said: So shouldn't that really be 3 "redballed" disks instead of one? newoski, do you have Notifications setup? Even with dual parity you should attend to problems immediately and not let them accumulate. I had 1 redballed. I was responding immediately, when I rebooted and followed johnnie.black's original instrucitons, the other 2 drives became unmountable. I'm sure the cables got bumped Which is to say, yes i have notifiactions, yes i responded immediately. I would never let more than 1 drive sit redballed ( :
June 6, 20179 yr Community Expert 6 minutes ago, newoski said: the other 2 drives became unmountable They became unmountable because they are disable and no way unRAID can emulate their data since you have 3 disks disable total. Edited June 6, 20179 yr by johnnie.black
June 6, 20179 yr Author 1 minute ago, johnnie.black said: They became unmountable because they are disable and know way unRAID can emulate their data since you have 3 disks disable total. I understand that Unraid can only emulate 1 drive, per parity drive... but if the sata cable was bumped and disconnected... wouldn't they show up as Missing? They showed up as Unmountable, which to me, meant that they were seen but Unraid wasn't able to read them or something... I have 2 parity drives. Therefor, if they were both truly missing, wouldn't unraid have emulated at least one of the drives?
June 6, 20179 yr Community Expert A disk is only disable if a write fails, so if a cable was bumped it was with the array mounted, and in that case the disk is disabled. You are one disk past redundancy, so no disk can be emulated.
June 6, 20179 yr Community Expert If by any change you have it I would really like to see the syslog when disks 11 and 12 got disabled, I suspect they where disable during disk18 rebuild, possibly as a result of a controller issue. When errors on multiple disks happen, say unRAID loses contact with 8 disks on the same controller, the 1st one (or the 1st two if you have dual parity) that it can't write to get disable, the remaining disks will show read errors but don't get disabled past current redundancy, but if a disk is rebuilding, I suspect that it's not considered as disabled and since you have dual parity 2 disks got disabled, adding the third invalid disk leaves the user in more complicated situation, so if this is what happened maybe it could be improved, but no way to know for sure without the logs.
June 6, 20179 yr Author 3 minutes ago, johnnie.black said: If by any change you have it I would really like to see the syslog when disks 11 and 12 got disabled, I suspect they where disable during disk18 rebuild, possibly as a result of a controller issue. When errors on multiple disks happen, say unRAID loses contact with 8 disks on the same controller, the 1st one (or the 1st two if you have dual parity) that it can't write to get disable, the remaining disks will show read errors but don't get disabled past current redundancy, but if a disk is rebuilding, I suspect that it's not considered as disabled and since you have dual parity 2 disks got disabled, adding the third invalid disk leaves the user in more complicated situation, so if this is what happened maybe it could be improved, but no way to know for sure without the logs. I believe you're asking for a syslog from the moment those drives disappeared. At this point I'm not sure I have one or which of the many it would but, but in the spirit of gratitude, here are all the syslogs I have from today for your perusing tower-diagnostics-20170606-1328.zip tower-smart-20170606-0721.zip tower-smart-20170606-0725.zip tower-smart-20170606-0836.zip
June 6, 20179 yr Community Expert Unfortunately it's not there, but thanks anyway, I can simulate this on a test server, so I'll do that when I get the chance.
June 6, 20179 yr Author 2 hours ago, johnnie.black said: Unfortunately it's not there, but thanks anyway, I can simulate this on a test server, so I'll do that when I get the chance. Sorry and also... Hmmmmm... So I followed those steps, and everything seemed to work for a bit... then I started getting lots of errors in the SysLog and drive18 has totally disappeared from Explorer. It still shows up as Emulated in GUI, but it's not exporting in Windows tower-diagnostics-20170606-1817.zip
June 6, 20179 yr Community Expert You need to check filesystem on the emulated disk18, do it before rebuilding but actual disk18 has the same issues. https://wiki.lime-technology.com/Check_Disk_Filesystems#Drives_formatted_with_XFS
June 6, 20179 yr Author 7 minutes ago, johnnie.black said: You need to check filesystem on the emulated disk18, do it before rebuilding but actual disk18 has the same issues. https://wiki.lime-technology.com/Check_Disk_Filesystems#Drives_formatted_with_XFS Thoughts? root@Tower:~# xfs_repair -v /dev/md18 Phase 1 - find and verify superblock... - block cache size set to 2965960 entries Phase 2 - using internal log - zero log... zero_log: head block 769204 tail block 769200 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.
June 9, 20179 yr Author On 6/6/2017 at 6:48 PM, johnnie.black said: Use -L So using -L worked. I replaced the SATA breakout cables and rebuilt the drive. Everything's been going fine for the last 7 hours. While there are no red balled drives, I happened to walk past my NAS and saw a bunch of metadata errors pop up on the screen... Looks like there might still be a few issues leftover on the data side? How should I address? Should I run XFS check on each drive on my array? tower-diagnostics-20170608-2038.zip Edited June 9, 20179 yr by newoski
Archived
This topic is now archived and is closed to further replies.