Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Disks Continually going disabled/invalid

Featured Replies

Hi all,

 

I have a Dell PowerEdge 310 (Xeon X3440, 16GB ECC DDR3, 10Gb Asus Nic, 9207-8e SAS interface) connected to three Dell PowerVault MD1200 12 drive units in series, all with 6TB Seagate SAS drives.

 

Everything has been running well the past 2 years on moderate load, recently I have started moving large amounts of data over the the main array. In the past 2 weeks, randomly one parity drive, and the first data drive would go invalid and cause me to have to rebuild. Then it would happen again few days later in the same locations. Then one time it was both parity drives. Note, I'm assigning in new disks each time, only once did I reuse them not fully understanding what was going on.

 

Then today, the worst case scenario, while doing the parity rebuild (takes 3 days) I lost parity 1, 2 and the first data drive at 10% complete. 

 

1) I'm not really sure what to do next to attempt to recover what I can from the first data disk.

2) Something is faulty/broken/etc causing this to happen so frequently. I know the SMART data on some of the disks is concerning, and I've pulled those out of rotation, but the with with an OK status are still causing this to happen as well. No other changes have been made other than I'm starting to put the system under more load as I migrate data over. But still, the amount and frequency I'm moving over is nothing insane.

 

Diagnostics attached from after the 3 drives just failed recently during the parity rebuild (without have restart the system before exporting the data).

 

image.png.53dc8193b8946a9299d4fb82a43ef4d9.png

 

Any ideas or advice are greatly appreciated. As well, let me know if any additional information could be helpful.

 

Thanks.

tower-diagnostics-20240108-2229.zip

Solved by JorgeB

  • Community Expert

Is the enclosure on the same UPS? Disks appear to have dropped after a power failure.

  • Author
6 hours ago, JorgeB said:

Is the enclosure on the same UPS? Disks appear to have dropped after a power failure.

Ohhhh that's interesting. Which log/line indicates that?

 

And no, the computer is direct wall power, and the MD1200s are on two Eaton UPS' which I have had some issues with on their consistency delivering power.

  • Community Expert
  • Solution
Jan  8 08:09:44 Tower kernel: scsi 5:0:25:0: _scsih_block_io_device skip device_block for SES handle(0x0025)
Jan  8 08:09:44 Tower kernel: scsi 5:0:38:0: _scsih_block_io_device skip device_block for SES handle(0x0032)
### [PREVIOUS LINE REPEATED 2 TIMES] ###
Jan  8 08:09:45 Tower apcupsd[4706]: Power failure.
Jan  8 08:09:46 Tower kernel: sd 5:0:1:0: device_unblock and setting to running, handle(0x000b)
Jan  8 08:09:46 Tower kernel: sd 5:0:2:0: device_unblock and setting to running, handle(0x000c)

 

Connection with the disks is lost at the same time as the power fails.

  • Author
6 minutes ago, JorgeB said:
Jan  8 08:09:44 Tower kernel: scsi 5:0:25:0: _scsih_block_io_device skip device_block for SES handle(0x0025)
Jan  8 08:09:44 Tower kernel: scsi 5:0:38:0: _scsih_block_io_device skip device_block for SES handle(0x0032)
### [PREVIOUS LINE REPEATED 2 TIMES] ###
Jan  8 08:09:45 Tower apcupsd[4706]: Power failure.
Jan  8 08:09:46 Tower kernel: sd 5:0:1:0: device_unblock and setting to running, handle(0x000b)
Jan  8 08:09:46 Tower kernel: sd 5:0:2:0: device_unblock and setting to running, handle(0x000c)

 

Connection with the disks is lost at the same time as the power fails.

OK, looks like I'm going to have to take the UPS' out of the picture until I can get them figured out.

 

Now, short term, are there any tips or tricks to bringing those 3 disks back online as I'm confident the disks are fine and have no data loss.

 

Thanks!!

  • Community Expert

Reboot and post new diags after array start., You won't be able to emulated the disks, if the disks are assumed OK doing a new config is probably the best option, you can check parity is already valid then run a parity check.

  • Author

So, just started the array back up after removing the UPS' and restarting.

 

It has started a parity sync it says will take an hour, but says disk 1 is unmountable and needs to be formatted.

 

image.thumb.png.70a9f3c6f571f826542b314b2ce1f6c4.png

 

image.thumb.png.93989af524c6fd4e1f235e4c3fa6d8b9.png

 

Edit: added diagnostics if of any use.

tower-diagnostics-20240109-1137.zip

Edited by bradgoldring

  • Community Expert
10 minutes ago, bradgoldring said:

disk 1 is unmountable and needs to be formatted.

It doesn't say you NEED to format it. It will allow you to format if you check the box. DO NOT format any disk in the array that is supposed to have your data.

  • Community Expert

Format is a write operation. It writes an empty filesystem to the disk. Unraid treats this write operation exactly as it does any other, by updating parity so the array will be in sync. If you format a disk in the array, parity agrees the disk is empty so empty is the only thing parity could make it if you rebuild.

  • Community Expert

There are 3 invalid disks, so disk1 cannot be emulated, since we think disk1 is OK you can do a new config instead.

  • Community Expert

No way 6TB parity can be done in an hour. 12+ hours is more likely.

 

Not sure how you can have 2 disks disabled and the other parity invalid though.

 

As mentioned, New Config is likely the way forward. Then we can see what else might be needed.

 

Syslog seems to indicate multiple disk problems still.

  • Author
50 minutes ago, trurl said:

No way 6TB parity can be done in an hour. 12+ hours is more likely.

 

Not sure how you can have 2 disks disabled and the other parity invalid though.

 

As mentioned, New Config is likely the way forward. Then we can see what else might be needed.

 

Syslog seems to indicate multiple disk problems still.

Normally it's 3 days for a full parity check.

 

So, after this weird mini 1hr parity check just completed, parity disk 1 has come back online and appears fine, so I just have parity disk 2 and data disk 1 as disabled and contents emulated.

 

image.png.2e758ca4e981480b5f014a6bb796f7b4.png

 

I'm going to swap disk 1 for a new disk and start the glorious 3 day parity check once again, but without the UPS concerns we should be good, I hope. Any concerns or comments on that approach before I start it?

 

The number of errors from this weird 1hr parity check is concerning though:

image.png.b00f0edcac222bc0e090cc092939e39a.png

 

Thank you again as well!!

  • Community Expert

Don't do anything yet. You will be rebuilding an unmountable disk1.

 

Post new diagnostics

  • Community Expert

And unlikely parity is valid anyway so you definitely don't want to rebuild disk1 like that.

  • Community Expert

As mentioned, New Config is the likely way forward. This means you have to keep current disk1 and all other disks assigned as is, and rebuild both parity.

 

Parity in its current state can't rebuild disk1, so you have to hope for the best with its current contents. If it really is unmountable we can try to repair its filesystem after parity rebuild.

 

You must have port multipliers if parity takes so long on 6TB.

  • Community Expert

And parity rebuild will tell us if things are working correctly without affecting any data disks. Probably you still have multiple connection problems

1 hour ago, trurl said:

Syslog seems to indicate multiple disk problems still.

Did you do anything about that?

  • Author
1 minute ago, trurl said:

As mentioned, New Config is the likely way forward. This means you have to keep current disk1 and all other disks assigned as is, and rebuild both parity.

 

Parity in its current state can't rebuild disk1, so you have to hope for the best with its current contents. If it really is unmountable we can try to repair its filesystem after parity rebuild.

 

You must have port multipliers if parity takes so long on 6TB.

 

Regarding a new config and rebuilding both parity disks, how do I go about that with the existing Disk 1 being unmountable?

 

Regarding the 3 day rebuild: Is it because the three MD1200's are connected in series? The 9207-8e has 2 ports, I could connect one of the 3 drive units directly, would that benefit any?

 

9 minutes ago, trurl said:

And parity rebuild will tell us if things are working correctly without affecting any data disks. Probably you still have multiple connection problems

Did you do anything about that?

I have not done anything about this because I am not sure what is wrong here or which disks are affected.

  • Community Expert
8 minutes ago, bradgoldring said:

Regarding a new config and rebuilding both parity disks, how do I go about that with the existing Disk 1 being unmountable?

New config will use the actual disk1, not try to emulated it, it cannot be emulated, and actual disk1 is hopefully fine.

  • Community Expert

New Config accepts all assigned disks into the array exactly as they are, unmountable or not. The only thing it will do is make them all enabled again exactly as they are, and (optionally, by default) rebuild parity based on the contents of all the assigned disks. And you do want to let it rebuild parity.

 

Not clear physical disk1 is actually unmountable anyway, it is being emulated by parity, and that is likely not working well since parity is probably not valid. If physical disk1 is unmountable we can try to repair its filesystem after successfully rebuilding parity.

 

Probably still some hardware problems to work through before we successfully rebuild parity, but trying will make that apparent and give us some idea what needs to be fixed.

  • Author

Just to confirm before I click "Apply" this is correct:

 

image.png.1e80fda1c0151002536c786d50cc74e2.png

 

Also Disk 1 was not being emulated by parity when I just had the array started, so I assume the Parity 1 disk is no good per that error count.

 

Thanks!

  • Community Expert

Yes, click apply and then check "parity is already valid" before start the array.

 

 

  • Community Expert
5 minutes ago, bradgoldring said:

Disk 1 was not being emulated by parity

If it was disabled then it was being emulated by parity1, which was almost certainly incorrect. So, the fact that is was showing as unmountable was referring to the emulated disk1.

  • Community Expert
2 minutes ago, JorgeB said:

check "parity is already valid" before start the array.

Why?

  • Community Expert

It should be mostly valid, then run a check, if there are many errors he can always re-sync instead.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.