1 Drive Redballed and 2 Drives Unmountable

June 6, 20179 yr

Hi Guys,

Starting a new thread, as my original post was about one redballed drive, and things seem to have spiraled.

1 Drive is Red Balled

2 Drives are Unmountable

All 3 drives are on the same SAS card, but across two sets of breakout cables

I've tried to run XFS check on the unmountable drives, but they both come back with the following error:

Phase 1 - find and verify superblock...
superblock read failed, offset 0, size 524288, ag 0, rval -1
fatal error -- Input/output error

I've tried plugging working SATA cables into the drives -- cables that are working fine going directly to MOBO -- but the XFS check comes back the same.

I've ordered new breakout cables for both slots on my card, just in case. I will have them tomorrow.

Not sure how to proceed

Diagnostics attached

tower-diagnostics-20170606-1328.zip

Edited June 6, 20179 yr by newoski

Quote

June 6, 20179 yr

Community Expert

12 minutes ago, newoski said:

I've tried to run XFS check on the unmountable drives

Did you do this from the webUI or from the command line? If from the command line, what was the exact command?

Quote

June 6, 20179 yr

Author

WebGUI. I tried the default parameter, as well as -v

Same result with both

Quote

June 6, 20179 yr

Community Expert

There's something strange going here:

Jun  6 13:13:04 Tower kernel: Buffer I/O error on dev md11, logical block 1953506608, async page read
Jun  6 13:13:04 Tower kernel: Buffer I/O error on dev md12, logical block 1953506608, async page read
Jun  6 13:13:05 Tower kernel: Buffer I/O error on dev md18, logical block 1953506608, async page read

Like these 3 disks are not online, can you reboot and post new diags?

Quote

June 6, 20179 yr

Author

Welcome to my nightmare!

New Diagnostics attached, following reboot

tower-diagnostics-20170606-1404.zip

Quote

June 6, 20179 yr

Community Expert

OK, I understand what's happening now, both disk 11 and 12 where disable at some point in the past, and since disk18 is also disable you have 3 invalid disks that can't be emulated by dual parity, hence the errors.

Quote

June 6, 20179 yr

Author

Just now, johnnie.black said:

OK, I understand what's happening now, both disk 11 and 12 where disable at some point in the past, and since disk18 is also disable you have 3 invalid disks that can't be emulated by dual parity, hence the errors.

What order or operations would recommend to rectify?

Quote

June 6, 20179 yr

Community Expert

You best bet is doing a new config, I assume disk18 was the 1st disk to get disable.

-Tools -> New Config
-assign all disks, previous disk order needs to be maintained, double check all disks are in the correct slots
-check both "parity is already valid" and "maintenance mode" before starting the array
-start the array
-stop array, unassign disk18
-start array, check emulated disk18 mounts and contents look correct (check that disks 11 and 12 mount also)
-if all looks good, stop array, reassign disk18
-start array to begin rebuild

Quote

June 6, 20179 yr

Author

Shou

1 minute ago, johnnie.black said:

You best bet is doing a new config, I assume disk18 was the 1st disk to get disable.

-Tools -> New Config
-assign all disks, previous disk order needs to be maintained, double check all disks are in the correct slots
-check both "parity is already valid" and "maintenance mode" before starting the array
-start the array
-stop array, unassign disk18
-start array, check emulated disk18 mounts and contents look correct (check that disks 11 and 12 mount also)
-if all looks good, stop array, reassign disk18
-start array to begin rebuild

I should wait until I get the new SATA breakout cables, and replace those first, correct?

(THANK YOU)

Quote

June 6, 20179 yr

Community Expert

If you don't know why disks11 and 12 got disable (like touching a cable, etc) probably best.

Quote

June 6, 20179 yr

Author

6 minutes ago, johnnie.black said:

If you don't know why disks11 and 12 got disable (like touching a cable, etc) probably best.

How would new config help with the drives being unmountable?

(I believe cables got touches, most likely)

Quote

June 6, 20179 yr

Community Expert

10 minutes ago, johnnie.black said:

OK, I understand what's happening now, both disk 11 and 12 where disable at some point in the past, and since disk18 is also disable you have 3 invalid disks that can't be emulated by dual parity, hence the errors.

So shouldn't that really be 3 "redballed" disks instead of one?

newoski, do you have Notifications setup? Even with dual parity you should attend to problems immediately and not let them accumulate.

Quote

June 6, 20179 yr

Author

Just now, trurl said:

So shouldn't that really be 3 "redballed" disks instead of one?

newoski, do you have Notifications setup? Even with dual parity you should attend to problems immediately and not let them accumulate.

I had 1 redballed. I was responding immediately, when I rebooted and followed johnnie.black's original instrucitons, the other 2 drives became unmountable. I'm sure the cables got bumped

Which is to say, yes i have notifiactions, yes i responded immediately. I would never let more than 1 drive sit redballed

( :

Quote

June 6, 20179 yr

Community Expert

6 minutes ago, newoski said:

the other 2 drives became unmountable

They became unmountable because they are disable and no way unRAID can emulate their data since you have 3 disks disable total.

Edited June 6, 20179 yr by johnnie.black

Quote

June 6, 20179 yr

Author

1 minute ago, johnnie.black said:

They became unmountable because they are disable and know way unRAID can emulate their data since you have 3 disks disable total.

I understand that Unraid can only emulate 1 drive, per parity drive... but if the sata cable was bumped and disconnected... wouldn't they show up as Missing?

They showed up as Unmountable, which to me, meant that they were seen but Unraid wasn't able to read them or something...

I have 2 parity drives. Therefor, if they were both truly missing, wouldn't unraid have emulated at least one of the drives?

Quote

June 6, 20179 yr

Community Expert

A disk is only disable if a write fails, so if a cable was bumped it was with the array mounted, and in that case the disk is disabled.

You are one disk past redundancy, so no disk can be emulated.

Quote

June 6, 20179 yr

Community Expert

If by any change you have it I would really like to see the syslog when disks 11 and 12 got disabled, I suspect they where disable during disk18 rebuild, possibly as a result of a controller issue.

When errors on multiple disks happen, say unRAID loses contact with 8 disks on the same controller, the 1st one (or the 1st two if you have dual parity) that it can't write to get disable, the remaining disks will show read errors but don't get disabled past current redundancy, but if a disk is rebuilding, I suspect that it's not considered as disabled and since you have dual parity 2 disks got disabled, adding the third invalid disk leaves the user in more complicated situation, so if this is what happened maybe it could be improved, but no way to know for sure without the logs.

Quote

June 6, 20179 yr

Author

3 minutes ago, johnnie.black said:

If by any change you have it I would really like to see the syslog when disks 11 and 12 got disabled, I suspect they where disable during disk18 rebuild, possibly as a result of a controller issue.

When errors on multiple disks happen, say unRAID loses contact with 8 disks on the same controller, the 1st one (or the 1st two if you have dual parity) that it can't write to get disable, the remaining disks will show read errors but don't get disabled past current redundancy, but if a disk is rebuilding, I suspect that it's not considered as disabled and since you have dual parity 2 disks got disabled, adding the third invalid disk leaves the user in more complicated situation, so if this is what happened maybe it could be improved, but no way to know for sure without the logs.

I believe you're asking for a syslog from the moment those drives disappeared. At this point I'm not sure I have one or which of the many it would but, but in the spirit of gratitude, here are all the syslogs I have from today for your perusing

tower-diagnostics-20170606-1328.zip

tower-smart-20170606-0721.zip

tower-smart-20170606-0725.zip

tower-smart-20170606-0836.zip

Quote

June 6, 20179 yr

Community Expert

Unfortunately it's not there, but thanks anyway, I can simulate this on a test server, so I'll do that when I get the chance.

Quote

June 6, 20179 yr

Author

2 hours ago, johnnie.black said:

Unfortunately it's not there, but thanks anyway, I can simulate this on a test server, so I'll do that when I get the chance.

Sorry and also... Hmmmmm... So I followed those steps, and everything seemed to work for a bit... then I started getting lots of errors in the SysLog and drive18 has totally disappeared from Explorer. It still shows up as Emulated in GUI, but it's not exporting in Windows

tower-diagnostics-20170606-1817.zip

Quote

June 6, 20179 yr

Community Expert

You need to check filesystem on the emulated disk18, do it before rebuilding but actual disk18 has the same issues.

https://wiki.lime-technology.com/Check_Disk_Filesystems#Drives_formatted_with_XFS

Quote

June 6, 20179 yr

Author

7 minutes ago, johnnie.black said:

You need to check filesystem on the emulated disk18, do it before rebuilding but actual disk18 has the same issues.

https://wiki.lime-technology.com/Check_Disk_Filesystems#Drives_formatted_with_XFS

Thoughts?

root@Tower:~# xfs_repair -v /dev/md18
Phase 1 - find and verify superblock...
- block cache size set to 2965960 entries
Phase 2 - using internal log
- zero log...
zero_log: head block 769204 tail block 769200
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Quote

June 6, 20179 yr

Community Expert

Use -L

Quote

June 6, 20179 yr

Author

8 minutes ago, johnnie.black said:

Use -L

Thanks again!!

Quote

June 9, 20179 yr

Author

On 6/6/2017 at 6:48 PM, johnnie.black said:

Use -L

So using -L worked. I replaced the SATA breakout cables and rebuilt the drive. Everything's been going fine for the last 7 hours. While there are no red balled drives, I happened to walk past my NAS and saw a bunch of metadata errors pop up on the screen... Looks like there might still be a few issues leftover on the data side?

How should I address?

Should I run XFS check on each drive on my array?

tower-diagnostics-20170608-2038.zip

Edited June 9, 20179 yr by newoski

Quote

1 Drive Redballed and 2 Drives Unmountable

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)