Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Distributing disks to mitigate single-point failure?

Featured Replies

I believe a software glitch (my fault) caused my SATA card to drop offline taking all 4 of my data disks with it.

When it failed I was writing to Disk 1.

A reboot fixed the card but when the server came back up Disk 1 was emulated.

I ran XFS repair which worked and now I'm rebuilding the disk.

But what if I had been writing to more than 1 data disk when the card failed?

With 1 parity I can only emulate 1 disk so would however many disks I was writing (assuming the "writing to" caused the corruption/emulation) become un-rebuildable?

If so, then is best practice to distribute disks among controllers to minimize the consequences of a single-point failure?

Here's my syslog from the incident, in case it's interesting:

syslog

Mar 17 21:51:36 NAS emhttpd: read SMART /dev/sdb

Mar 17 22:27:21 NAS kernel: usb 2-1: USB disconnect, device number 4

Mar 17 22:27:21 NAS kernel: sd 37:0:0:0: [sdj] Synchronizing SCSI cache

Mar 17 22:27:21 NAS kernel: sd 37:0:0:0: [sdj] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK

Mar 17 22:27:24 NAS usb_manager: Info: rc.usb_manager usb_remove  Samsung_Flash_Drive_FIT_0352519060004610 /dev/bus/usb/002/004 002 004

Mar 17 22:27:24 NAS usb_manager: Info: rc.usb_manager Device Match 002/004 vm:   002 004

Mar 17 22:27:24 NAS usb_manager: Info: rc.usb_manager Removed 002/004 vm:  nostate 002 004

Mar 17 22:27:41 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:27:42 NAS kernel: ata6: failed to resume link (SControl FFFFFFFF)

Mar 17 22:27:42 NAS kernel: ata6: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:27:47 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:27:48 NAS kernel: ata6: failed to resume link (SControl FFFFFFFF)

Mar 17 22:27:48 NAS kernel: ata6: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:27:48 NAS kernel: ata6: limiting SATA link speed to <unknown>

Mar 17 22:27:53 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:27:54 NAS kernel: ata6: failed to resume link (SControl FFFFFFFF)

Mar 17 22:27:54 NAS kernel: ata6: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:27:54 NAS kernel: ata6.00: disable device

Mar 17 22:27:54 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:27:54 NAS kernel: sd 5:0:0:0: rejecting I/O to offline device

Mar 17 22:27:54 NAS kernel: ata6.00: detaching (SCSI 5:0:0:0)

Mar 17 22:27:54 NAS kernel: sd 5:0:0:0: [sdg] Synchronizing SCSI cache

Mar 17 22:27:54 NAS kernel: sd 5:0:0:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK

Mar 17 22:27:54 NAS kernel: sd 5:0:0:0: [sdg] Stopping disk

Mar 17 22:27:54 NAS kernel: sd 5:0:0:0: [sdg] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK

Mar 17 22:27:54 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:27:54 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:27:54 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:27:54 NAS kernel: ata6: failed to stop engine (-19)

Mar 17 22:27:54 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:27:55 NAS kernel: ata6: failed to resume link (SControl FFFFFFFF)

Mar 17 22:27:55 NAS kernel: ata6: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:27:55 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:30 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:31 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:31 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:31 NAS kernel: ata5: failed to resume link (SControl FFFFFFFF)

Mar 17 22:28:31 NAS kernel: ata5: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:28:36 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:37 NAS kernel: ata3: failed to resume link (SControl FFFFFFFF)

Mar 17 22:28:37 NAS kernel: ata3: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:28:42 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:42 NAS kernel: ata4: failed to resume link (SControl FFFFFFFF)

Mar 17 22:28:42 NAS kernel: ata4: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:28:48 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:48 NAS kernel: ata5: failed to resume link (SControl FFFFFFFF)

Mar 17 22:28:48 NAS kernel: ata5: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:28:48 NAS kernel: ata5: limiting SATA link speed to <unknown>

Mar 17 22:28:53 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:54 NAS kernel: ata3: failed to resume link (SControl FFFFFFFF)

Mar 17 22:28:54 NAS kernel: ata3: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:28:54 NAS kernel: ata3: limiting SATA link speed to <unknown>

Mar 17 22:28:59 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:59 NAS kernel: ata5: failed to resume link (SControl FFFFFFFF)

Mar 17 22:28:59 NAS kernel: ata5: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:28:59 NAS kernel: ata5.00: disable device

Mar 17 22:28:59 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:59 NAS kernel: sd 4:0:0:0: rejecting I/O to offline device

Mar 17 22:28:59 NAS kernel: ata5.00: detaching (SCSI 4:0:0:0)

Mar 17 22:28:59 NAS emhttpd: read SMART /dev/sdg

Mar 17 22:28:59 NAS emhttpd: read SMART /dev/sdf

Mar 17 22:28:59 NAS kernel: sd 4:0:0:0: [sdf] Synchronizing SCSI cache

Mar 17 22:28:59 NAS kernel: sd 4:0:0:0: [sdf] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK

Mar 17 22:28:59 NAS kernel: sd 4:0:0:0: [sdf] Stopping disk

Mar 17 22:28:59 NAS kernel: sd 4:0:0:0: [sdf] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK

Mar 17 22:28:59 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:59 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:59 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:59 NAS kernel: ata5: failed to stop engine (-19)

Mar 17 22:28:59 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:28:59 NAS kernel: ata4: failed to resume link (SControl FFFFFFFF)

Mar 17 22:28:59 NAS kernel: ata4: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:28:59 NAS kernel: ata4: limiting SATA link speed to <unknown>

Mar 17 22:29:04 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:29:05 NAS kernel: ata5: failed to resume link (SControl FFFFFFFF)

Mar 17 22:29:05 NAS kernel: ata5: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:29:05 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:29:05 NAS kernel: ata3: failed to resume link (SControl FFFFFFFF)

Mar 17 22:29:05 NAS kernel: ata3: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:29:05 NAS kernel: ata3.00: disable device

Mar 17 22:29:05 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:29:05 NAS kernel: sd 2:0:0:0: rejecting I/O to offline device

Mar 17 22:29:05 NAS kernel: ata3.00: detaching (SCSI 2:0:0:0)

Mar 17 22:29:05 NAS kernel: sd 2:0:0:0: [sdd] Synchronizing SCSI cache

Mar 17 22:29:05 NAS kernel: sd 2:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK

Mar 17 22:29:05 NAS kernel: sd 2:0:0:0: [sdd] Stopping disk

Mar 17 22:29:05 NAS kernel: sd 2:0:0:0: [sdd] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK

Mar 17 22:29:05 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:29:05 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:29:05 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:29:05 NAS kernel: ata3: failed to stop engine (-19)

Mar 17 22:29:05 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:29:06 NAS kernel: ata4: failed to resume link (SControl FFFFFFFF)

Mar 17 22:29:06 NAS kernel: ata4: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)

Mar 17 22:29:06 NAS kernel: ata4.00: disable device

Mar 17 22:29:06 NAS kernel: ahci 0000:01:00.0: AHCI controller unavailable!

Mar 17 22:29:06 NAS kernel: ata4.00: detaching (SCSI 3:0:0:0)

2 hours ago, CS01-HS said:

But what if I had been writing to more than 1 data disk when the card failed?

Unraid only disables as many disks as there are parity drives, so one with single parity, two with dual parity.

  • Author

So if I'd been writing to two disks it would have disabled the whole array or only 1 disk -

or is this writing unrelating to the disabling?

It would disable the first disk that had a failed write. That initial failed write, and any subsequent writes to the disabled disk, would be emulated by updating parity as if the disk had been written, so those writes could be recovered by rebuilding. With dual parity, an additional disk with a failed write would be handled the same way.

If another disk had a failed write when there was already a disabled disk (2 disabled disks for dual parity), it is just a failed write. It is not emulated so that write can't be recovered.

1 minute ago, trurl said:

a failed write

Note that this is actually a hardware error of some kind, a disk problem, or a problem communicating with the disk, such as connection, cable, controller, power.

There are other ways for a write to be unsuccessful, such as out-of-space. Those will not disable a disk, the write is just unsuccessful.

Note that a disabled disk is not used again until it is rebuilt, or it is included when you reset the array assignments (New Config)

  • Author

All 4 of my data disks dropped offline (red X) because the SATA card disconnected.

Only the parity disk (and cache) connected to onboard ports remained.

When I rebooted, after an XFS repair, disk1 was emulated (orange circle).

The other 3 data disks mounted normally (green circle.)

Is that because I was writing to disk 1 when SATA card glitched or for some other reason?

I'm trying to understand how it works to distribute disks among my 4 onboard controllers and 4 from the card to mitigate risk.

Edited by CS01-HS

4 hours ago, CS01-HS said:

When I rebooted, after an XFS repair, disk1 was emulated (orange circle).

The other 3 data disks mounted normally (green circle.)

If you only have one parity drive, only one disk can get disabled, the first one where a write fails. If any writes are done to other disks that dropped after that, the writes will just fail; it cannot disable more disks.

  • Author

So, if I understand (and thank you for bearing with me):

If all data disks go offline, writes to all will fail but because I have 1 parity I have to rebuild 1 disk (the first with a failed write.)

If I had 2 parity's I'd have to rebuild 2 disks (the first 2 with failed writes.)

I must be missing something because I don't see the benefit.

Why not just handle the 1st failed write like all the rest and avoid the rebuild?

Emulating the missing drive allows it to be read and written until it can be replaced.

Availability is what parity is all about in more traditional RAID implementations as well. Unraid just does it as a separate parity disk.

  • Author
2 hours ago, trurl said:

Emulating the missing drive allows it to be read and written until it can be replaced.

Right, "if 1 parity, and 1 disk lost, emulate the 1 lost disk."

Makes sense.

What I question is what I just experienced:

"If 1 parity, and ALL disks lost, require 1 disk to be rebuilt."

Why did that 1 data disk, and not the others, require a rebuild?

Is there some advantage I'm missing or am I not explaining it clearly?

Edited by CS01-HS

The first disk with failed write is disabled. There is no anticipation of what other disks will do.

Instead of rebuilding the disabled data disk, you could New Config and rebuild parity instead, but any writes to the emulated disk would be lost. Something has to be rebuilt to get the array back in-sync.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.