Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Power Flick? Unable to stop the array

Featured Replies

Hello all,

I think I had a little power flick or something, but I noticed that my server's fans starting spinning a little bit quicker for a second which prompted me to check the server. After pulling up the page disk 8 is red balled and all ther other disks  have 300-400ish errors. Wanting to prevent further errors I tried to stop the array and that's where it's stuck.

 

'

lsof | grep /mnt
lsof: WARNING: can't stat() xfs file system /mnt/disk8
      Output information may be incomplete.

'

Before any of this happened I suspected that disk 10 was on the way out due to 114 read errors that happened yesterday, though an extended SMART diag came back good...

 

And now the parity drive is red balled... HELP!

 

tower-diagnostics-20220729-1405.zip

Edited by mathomas3

  • Replies 51
  • Views 3.9k
  • Created
  • Last Reply

Top Posters In This Topic

Most Popular Posts

  • Controller is OK, only the LSI HBA is being used.   Initial issue appears to have been a power problem, multiple disks dropped at the same time.   Disk10 does appear to be showing

  • mathomas3
    mathomas3

    Both drives cleared on the extended check and thus far zero errors on disk 10... I will keep an eye on that disk...   Parity and disk 8 are being rebuilt now... decided to replace both of th

Posted Images

  • Author

Also there is this error...

 

Jul 29 12:10:01 Tower kernel: Buffer I/O error on dev md11, logical block 1953506608, async page read
Jul 29 12:10:01 Tower kernel: Buffer I/O error on dev md8, logical block 1953522944, async page read
Jul 29 12:10:01 Tower kernel: Buffer I/O error on dev md8, logical block 1953522945, async page read
Jul 29 12:10:01 Tower kernel: Buffer I/O error on dev md8, logical block 1953522946, async page read
Jul 29 12:10:01 Tower kernel: Buffer I/O error on dev md8, logical block 1953522947, async page read
Jul 29 12:10:01 Tower kernel: Buffer I/O error on dev md8, logical block 1953522948, async page read
Jul 29 12:10:01 Tower kernel: Buffer I/O error on dev md8, logical block 1953522949, async page read
Jul 29 12:10:01 Tower kernel: Buffer I/O error on dev md8, logical block 1953522950, async page read
Jul 29 12:10:01 Tower kernel: Buffer I/O error on dev md8, logical block 1953522951, async page read
Jul 29 12:10:01 Tower kernel: Buffer I/O error on dev md6, logical block 1953506608, async page read
Jul 29 12:10:01 Tower kernel: blk_update_request: I/O error, dev loop2, sector 41942912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

 

  • Community Expert

You are going to have to go for the power switch. Then post new diagnostics after rebooting and starting the array.

  • Author
2 minutes ago, trurl said:

You are going to have to go for the power switch. Then post new diagnostics after rebooting and starting the array.

As a sysadmin... that's something you tell users all the time but not something you ever want to be told yourself 🤔

  • Author

tower-diagnostics-20220729-1423.zip

 

Rebooted and posting Diags.

 

Those two disks are still red balled...

 

funny thing is though that disk 8 is a 2 month old ssd...

  • Community Expert
3 minutes ago, mathomas3 said:

disk 8 is a 2 month old ssd

Why do you have SSD in array with HDD parity? SSDs in the array can't be trimmed, and can only be written at parity speed.

  • Author

faster read times while also having the data protected is what I was aiming for... though I have been considering moving it out of the array for sometime now

  • Community Expert

Give me a little while to go through these diagnostics, fortunately we have those from before reboot also.

 

Post a screenshot of Main - Array Devices

 

 

  • Author
1 minute ago, trurl said:

Give me a little while to go through these diagnostics, fortunately we have those from before reboot also.

 

Post a screenshot of Main - Array Devices

 

 

image.thumb.png.3fe0553e8c3d1c5249bca9ce7385cca9.png

  • Author
2 minutes ago, trurl said:

Give me a little while to go through these diagnostics, fortunately we have those from before reboot also.

 

Post a screenshot of Main - Array Devices

 

 

Before I forget to mention it.

 

Thank you for the quick assistance. 

It's been a while since I have had drive failures, but never had more then one die at a time. 

 

I did recently move UNRaid into this hardware, an HP 1u Server and a 24bay DAS, both with dual PSUs. Hasn't given me any issues up till now. 

  • Author

Oh and after posting the most recent Diags I disabled docker to limit writes to the array. Dont think that should effect anything, but I figure it's for the best

  • Community Expert

Never got around to converting filesystem on disk3?

 

Your RAID controller is making it complicated identifying disks, one of many reasons RAID controllers are NOT recommended. And it looks like all the SAS disks have disconnected and reconnected as different devices, adding to the confusion. Are they all on that controller?

 

 

  • Author
3 minutes ago, trurl said:

Never got around to converting filesystem on disk3?

 

Your RAID controller is making it complicated identifying disks, one of many reasons RAID controllers are NOT recommended. And it looks like all the SAS disks have disconnected and reconnected as different devices, adding to the confusion. Are they all on that controller?

 

 

correct. All of the drives are connected via the RAID controller

 

When I moved everything over I had to use a different RAID controller(thought I was using a recommended one from the unraid wiki)

 

If I recall disks 9-11 are SAS and all the rest are SATA drives

  • Community Expert

I don't know how to read SMART reports for SAS drives, or even if that RAID controller is giving useful SMART reports.

 

SMART report for parity looks fine, hasn't had extended self-test recently.

 

SMART report for disk8 looks fine, no self-test run.

 

Not going to look at others since there is so many. Do any have SMART warnings on the Dashboard page?

 

All 14 array data disks mounted including emulated disk8 so that's good.

 

  • Community Expert
1 minute ago, mathomas3 said:

had to use a different RAID controller(thought I was using a recommended one from the unraid wiki)

You must be looking at very old wiki. No RAID controllers are recommended these days. Some can be flashed to IT mode so they are no longer RAID controllers.

 

  • Author
2 minutes ago, trurl said:

I don't know how to read SMART reports for SAS drives, or even if that RAID controller is giving useful SMART reports.

 

SMART report for parity looks fine, hasn't had extended self-test recently.

 

SMART report for disk8 looks fine, no self-test run.

 

Not going to look at others since there is so many. Do any have SMART warnings on the Dashboard page?

 

All 14 array data disks mounted including emulated disk8 so that's good.

 

Negative. Smart reports look good for the ones I have looked at. 

image.png.37f9d259dfa3a288750f0146a667d2b9.png

  • Community Expert
Jul 14 00:00:01 Tower kernel: mdcmd (36): check NOCORRECT
Jul 14 00:00:01 Tower kernel: 
Jul 14 00:00:01 Tower kernel: md: recovery thread: check P Q ...
Jul 15 05:55:20 Tower kernel: sd 2:0:6:0: [sdg] tag#5891 Sense Key : 0x3 [current] [descriptor] 
Jul 15 05:55:20 Tower kernel: sd 2:0:6:0: [sdg] tag#5891 ASC=0x11 ASCQ=0x0 
Jul 15 05:55:20 Tower kernel: sd 2:0:6:0: [sdg] tag#5891 CDB: opcode=0x28 28 00 73 7d 14 85 00 00 80 00
Jul 15 05:55:20 Tower kernel: blk_update_request: critical medium error, dev sdg, sector 15500616856 op 0x0:(READ) flags 0x0 phys_seg 114 prio class 0
Jul 15 05:55:20 Tower kernel: md: disk10 read error, sector=15500616792
Jul 15 05:55:20 Tower kernel: md: disk10 read error, sector=15500616800

That looks like a disk problem. And similar for rest of parity check, but it completes

Jul 15 06:08:55 Tower kernel: md: sync done. time=108534sec
Jul 15 06:08:55 Tower kernel: md: recovery thread: exit status: 0

Some other syslog entries about the controller for disk7 after that while you were trying to shut things down.

  • Community Expert

Some things about your configuration not ideal. Often when I see this in diagnostics I say unrelated to the problem, but in your case it does seem related.

 

Your appdata, domains, and system shares have files on the array. Specifically disk8.

 

Ideally, all files for these shares would be on fast pool (cache) and set to stay there, so docker/VM performance isn't impacted by slower array, and so array disks can spin down, since these files are always open.

  • Community Expert
1 hour ago, mathomas3 said:

had a little power flick

Do you have UPS?

  • Community Expert
36 minutes ago, trurl said:

all the SAS disks have disconnected and reconnected as different devices

That remark applies to both sets of diagnostics.

 

I don't think you can rebuild anything until you resolve that.

  • Author
Just now, trurl said:

Do you have UPS?

I do have a UPS. Though I need to get a larger one, Im pushing this current one on what it can handle. For power I connected 1 PSU to the UPS and the second PSU directly to the wall.

 

Im thinking what happened is that the DAS lost power for 0.01 of a second causing the errors on the disks and since Parity and disk 8 were actively being accessed they got disabled. 

  • Author
2 minutes ago, trurl said:

That remark applies to both sets of diagnostics.

 

I don't think you can rebuild anything until you resolve that.

When you say that the SAS disks have been disconnected and reconnected as different devices... what do you mean?

 

I didnt reassign the disks when everything was moved into the new box. 

  • Community Expert
14 minutes ago, trurl said:

Some other syslog entries about the controller for disk7 after that while you were trying to shut things down.

Actually I don't think you were shutting down yet, there was another syslog after that I needed to go through in those first diagnostics.

 

There are some entries in both of those syslogs about cache full, but it isn't now, I don't know if it was earlier or not.

 

That next syslog has similar controller problems with disk29 (parity2), more for disk8, and then looks like all disks start acting up. Eventually you get a write error on disk0 (parity), disk8, and disk29 (parity2). Write errors always disable disks, but that is one too many, so parity2 doesn't get disabled.

 

 

  • Author
3 minutes ago, trurl said:

Actually I don't think you were shutting down yet, there was another syslog after that I needed to go through in those first diagnostics.

 

There are some entries in both of those syslogs about cache full, but it isn't now, I don't know if it was earlier or not.

 

That next syslog has similar controller problems with disk29 (parity2), more for disk8, and then looks like all disks start acting up. Eventually you get a write error on disk0 (parity), disk8, and disk29 (parity2). Write errors always disable disks, but that is one too many, so parity2 doesn't get disabled.

 

 

So you think both parity drives are hosed? 

  • Community Expert
5 minutes ago, mathomas3 said:

When you say that the SAS disks have been disconnected and reconnected as different devices... what do you mean?

 

I didnt reassign the disks when everything was moved into the new box. 

I mean while the server was running. Hardware issues made them disconnect, and then they reconnected, but linux gave them new device letters so Unraid doesn't know them anymore. That happened in those earlier diagnostics where things went south, and again in these current diagnostics.

 

Do you have any disks showing in Unassigned Devices?

 

Take a look at your latest diagnostics, in the smart folder. None of the SAS disks are assigned and they have different sd designations than in your screenshot. You can also see what I mean about RAID controllers making it difficult to identify disks since you have to actually open up each of those SMART reports to figure out which disk they are.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.